Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]

2022-11-21 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Add a reassocation width for FMAs in per-CPU tuning structures. Keep the
> existing setting for cores with 2 FMA pipes, and use 4 for cores with 4
> FMA pipes.  This improves SPECFP2017 on Neoverse V1 by ~1.5%.
>
> Passes regress/bootstrap, OK for commit?
>
> gcc/
> PR 107413
> * config/aarch64/aarch64.cc (struct tune_params): Add
> fma_reassoc_width to all CPU tuning structures.
> * config/aarch64/aarch64-protos.h (struct tune_params): Add
> fma_reassoc_width.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> a73bfa20acb9b92ae0475794c3f11c67d22feb97..71365a446007c26b906b61ca8b2a68ee06c83037
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -540,6 +540,7 @@ struct tune_params
>const char *loop_align;
>int int_reassoc_width;
>int fp_reassoc_width;
> +  int fma_reassoc_width;
>int vec_reassoc_width;
>int min_div_recip_mul_sf;
>int min_div_recip_mul_df;
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 798363bcc449c414de5bbb4f26b8e1c64a0cf71a..643162cdecd6a8fe5587164cb2d0d62b709a491d
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings =
>"8", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings =
>"8", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings =
>"8", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings =
>"8", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings =
>"8", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings =
>"8", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings =
>"4", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings =
>"8", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings =
>"8", /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings =
>"8",  /* loop_align.  */
>2,/* int_reassoc_width.  */
>4,/* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,/* vec_reassoc_width.  */
>2,/* min_div_recip_mul_sf.  */
>2,/* min_div_recip_mul_df.  */
> @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings =
>"16",/* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
> +  1,   /* fma_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
> @@ -1710,6 +1721,7 @@ static const struct tune_params emag_tunings =
>

Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-21 Thread HAO CHEN GUI via Gcc-patches
Hi Segher,

Thanks for your comments.

在 2022/11/22 7:49, Segher Boessenkool 写道:
> *cbranch_2insn is not a machine insn.  It generates a cror and a branch
> insn.  This makes no sense to have in a cbranchcc: those do a branch
> based on an existing cr field, so based on the *output* of that cror.
> 
> If ifcvt requires differently, ifcvt needs fixing.
> 
I have a question here.
For rs6000, "*cbranch_2insn" should not be generated by cbranch_optab?
I mean it gets icode from cbranch_optab and generates insn from this
icode. If so, the predicate of cbranchcc4 should be checked every time
before insn generation other than just doing an assertion.

> We want to use the output of the cror multiple times, not generate more
> cror insns.
> 
> I don't think the behaviour of ifcvt is correct here at all, no.  It
> also does not consider the cost of the code as far as I can see?  That
> could reduce the impact of this problem at least.
ifcvt tries to generate the converted sequence. Then it compares the cost
of new sequence to the cost of orginial. If it benefits, the conversion
will be done.

Thanks
Gui Haochen


Re: [PATCH] range-op: Implement floating point division fold_range [PR107569]

2022-11-21 Thread Jan-Benedict Glaw
Hi Jakub,

On Fri, 2022-11-11 10:09:42 +0100, Jakub Jelinek via Gcc-patches 
 wrote:
> Here is the floating point division fold_range implementation,
> as I wrote in the last mail, we could outline some of the common parts
> into static methods with descriptive names and share them between
> foperator_div and foperator_mult.
[...]

I'm running a slightly hacked [glibc]/scripts/build-many-glibcs.py to
to CI builds for glibc as well by now (hacked to allow for GCC master
being used) and this GCC commit
(2d5c4a16dd833aa083f13dd3e78e3ef38afe6ebb) triggers glibc's
elf/check-localplt testcase to fail, though just for
sparc64-linux-gnu. (As I just started with glibc checks, it took me a
while to realize this was a real regression and not a flaw in my
setup.)

MfG, JBG

-- 


signature.asc
Description: PGP signature


[PATCH] tree-optimization/107766 - ICE with recent -ffp-contract=off fix

2022-11-21 Thread Richard Biener via Gcc-patches
The following uses *node to check for FP types rather than the
child nodes which could be constant leafs and thus without a
vector type.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107766
* tree-vect-slp-patterns.cc (complex_mul_pattern::matches):
Use *node to check for FP vector types.

* g++.dg/vect/pr107766.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr107766.cc | 23 +++
 gcc/tree-vect-slp-patterns.cc |  2 +-
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr107766.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr107766.cc 
b/gcc/testsuite/g++.dg/vect/pr107766.cc
new file mode 100644
index 000..744bcc900b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr107766.cc
@@ -0,0 +1,23 @@
+// { dg-do compile }
+// { dg-additional-options "-ffp-contract=off" }
+
+typedef double btScalar;
+struct btVector3 {
+  operator btScalar *() const;
+};
+double m_vec[2];
+struct btShapeMatrix {
+  double [](int i) { return m_vec[i]; }
+};
+btScalar shape_function___1pxt1pz, shape_function__fac;
+struct btMiniSDF {
+  void shape_function_(btVector3 const &) const;
+};
+void btMiniSDF::shape_function_(btVector3 const ) const {
+  btShapeMatrix res;
+  btScalar _1m3y = 1.0 - 3.0 * xi[1], _1p3y = 1.0 + 3.0 * xi[1],
+   fact1m3y = shape_function__fac * _1m3y,
+   fact1p3y = shape_function__fac * _1p3y;
+  res[22] = fact1m3y * shape_function___1pxt1pz;
+  res[23] = fact1p3y * shape_function___1pxt1pz;
+}
diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
index 122d697a809..93735008bd1 100644
--- a/gcc/tree-vect-slp-patterns.cc
+++ b/gcc/tree-vect-slp-patterns.cc
@@ -1039,7 +1039,7 @@ complex_mul_pattern::matches (complex_operation_t op,
  with -ffp-contract=fast.  */
   if (!mul0
   && (flag_fp_contract_mode == FP_CONTRACT_FAST
- || !FLOAT_TYPE_P (SLP_TREE_VECTYPE (l0node[0])))
+ || !FLOAT_TYPE_P (SLP_TREE_VECTYPE (*node)))
   && vect_match_expression_p (l0node[0], PLUS_EXPR))
 {
   auto vals = SLP_TREE_CHILDREN (l0node[0]);
-- 
2.35.3


Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-21 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/11/22 13:12, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2022/11/22 11:11, Kewen.Lin 写道:
>> Maybe we can adjust prepare_cmp_insn to fail if the constructed cbranchcc4
>> pattern doesn't satisfy the predicate of operand 0 rather than to assert.
>> It's something like:
>>
>> if (!insn_operand_matches (icode, 0, test))
>>   goto fail;
>>
>> or only assign and return if insn_operand_matches (icode, 0, test).
>>
>> The code makes the assumption that all this kind of cbranchcc4 patterns
>> should match what target defines for cbranchcc4 optab, but unfortunately
>> it's not sure for our port and I don't see how it should be.
> 
> Thanks for your comments.
> 
> I just drafted a patch to let it go to "fail" when predicate of operand 0 is
> not satisfied. It works and passed bootstrap on ppc64le. But my concern is
> prepare_cmp_insn is a generic function and is used to create a cmp rtx. It
> is not only called by emit_conditional* (finally called by ifcvt) but other
> functions (even new functions). If we change the logical in prepare_cmp_insn,
> we may lost some potential optimization. After all, the branch_2insn is a 
> valid
> insn.

I have one assumption that without your proposed have_cbranchcc4 change for
rs6000, for this generic prepare_cmp_insn, it would never be called with CCmode
on rs6000, since we would get ICE with icode CODE_FOR_nothing otherwise.
It means we don't lose anything than before.  Besides, excepting for those
conditional call sites, I doubt CCmode would be used for calling it.  Could
you have a check?

> 
> I think the essential of the problem is we want to exclude those comparisons
> (from cbranchcc4 used in ifcvt) which need two CC bits. So, we can change the
> logical of ifcvt - add an additional check with predicate of operand 0 when
> checking the have_cbranchcc4 flag in ifcvt.

I think that would work.  The only concern is that some use (future) of
prepare_cmp_insn like how it's used in ifcvt would need the same pre checking,
otherwise the ICE happens again.

BR,
Kewen


Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-21 Thread HAO CHEN GUI via Gcc-patches
Hi Kewen,

在 2022/11/22 11:11, Kewen.Lin 写道:
> Maybe we can adjust prepare_cmp_insn to fail if the constructed cbranchcc4
> pattern doesn't satisfy the predicate of operand 0 rather than to assert.
> It's something like:
> 
> if (!insn_operand_matches (icode, 0, test))
>   goto fail;
> 
> or only assign and return if insn_operand_matches (icode, 0, test).
> 
> The code makes the assumption that all this kind of cbranchcc4 patterns
> should match what target defines for cbranchcc4 optab, but unfortunately
> it's not sure for our port and I don't see how it should be.

Thanks for your comments.

I just drafted a patch to let it go to "fail" when predicate of operand 0 is
not satisfied. It works and passed bootstrap on ppc64le. But my concern is
prepare_cmp_insn is a generic function and is used to create a cmp rtx. It
is not only called by emit_conditional* (finally called by ifcvt) but other
functions (even new functions). If we change the logical in prepare_cmp_insn,
we may lost some potential optimization. After all, the branch_2insn is a valid
insn.

I think the essential of the problem is we want to exclude those comparisons
(from cbranchcc4 used in ifcvt) which need two CC bits. So, we can change the
logical of ifcvt - add an additional check with predicate of operand 0 when
checking the have_cbranchcc4 flag in ifcvt.

What's your opinion?

Thanks
Gui Haochen





Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-21 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Thanks for the explanation.

on 2022/11/21 14:18, HAO CHEN GUI wrote:
> Hi Segher,
> 
> 在 2022/11/18 20:18, Segher Boessenkool 写道:
>> I don't think we should pretend we have any conditional jumps the
>> machine does not actually have, in cbranchcc4.  When would this ever be
>> useful?  cror;beq can be quite expensive, compared to the code it would
>> replace anyway.
>>
>> If something generates those here (which then ICEs later), that is
>> wrong, fix *that*?  Is it ifcvt doing it?
> 
> "*cbranch_2insn" is a valid insn for rs6000. So it generates such insn
> at expand pass. The "prepare_cmp_insn" called by ifcvt just wants to verify
> that the following comparison rtx is valid.

Maybe we can adjust prepare_cmp_insn to fail if the constructed cbranchcc4
pattern doesn't satisfy the predicate of operand 0 rather than to assert.
It's something like:

if (!insn_operand_matches (icode, 0, test))
  goto fail;

or only assign and return if insn_operand_matches (icode, 0, test).

The code makes the assumption that all this kind of cbranchcc4 patterns
should match what target defines for cbranchcc4 optab, but unfortunately
it's not sure for our port and I don't see how it should be.

BR,
Kewen


Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-21 Thread Kewen.Lin via Gcc-patches
Hi Richard,

Many thanks for your review comments!

>>> on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
 Hi,

 As discussed in PR98125, -fpatchable-function-entry with
 SECTION_LINK_ORDER support doesn't work well on powerpc64
 ELFv1 because the filled "Symbol" in

   .section name,"flags"o,@type,Symbol

 sits in .opd section instead of in the function_section
 like .text or named .text*.

 Since we already generates one label LPFE* which sits in
 function_section of current_function_decl, this patch is
 to reuse it as the symbol for the linked_to section.  It
 avoids the above ABI specific issue when using the symbol
 concluded from current_function_decl.

 Besides, with this support some previous workarounds for
 powerpc64 ELFv1 can be reverted.

 btw, rs6000_print_patchable_function_entry can be dropped
 but there is another rs6000 patch which needs this rs6000
 specific hook rs6000_print_patchable_function_entry, not
 sure which one gets landed first, so just leave it here.

 Bootstrapped and regtested on below:

   1) powerpc64-linux-gnu P8 with default binutils 2.27
  and latest binutils 2.39.
   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
   4) x86_64-redhat-linux with default binutils 2.30
  and latest binutils 2.39.
   5) aarch64-linux-gnu  with default binutils 2.30
  and latest binutils 2.39.


[snip...]

 diff --git a/gcc/varasm.cc b/gcc/varasm.cc
 index 4db8506b106..d4de6e164ee 100644
 --- a/gcc/varasm.cc
 +++ b/gcc/varasm.cc
 @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char *name, 
 unsigned int flags,
fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
if (flags & SECTION_LINK_ORDER)
{
 -tree id = DECL_ASSEMBLER_NAME (decl);
 -ultimate_transparent_alias_target ();
 -const char *name = IDENTIFIER_POINTER (id);
 -name = targetm.strip_name_encoding (name);
 -fprintf (asm_out_file, ",%s", name);
 +/* For now, only section "__patchable_function_entries"
 +   adopts flag SECTION_LINK_ORDER, internal label LPFE*
 +   was emitted in default_print_patchable_function_entry,
 +   just place it here for linked_to section.  */
 +gcc_assert (!strcmp (name, "__patchable_function_entries"));
> 
> I like the idea of removing the rs600 workaround in favour of making the
> target-independent more robust.  But this seems a bit hackish.  What
> would we do if SECTION_LINK_ORDER was used for something else in future?
> 

Good question!  I think it depends on how we can get the symbol for the
linked_to section, if adopting the name of the decl will suffer the
similar issue which this patch wants to fix, we have to reuse the label
LPFE* or some kind of new artificial label in the related section; or
we can just go with the name of the given decl, or something related to
that decl.  Since we can't predict any future uses, I just placed an
assertion here to ensure that we would revisit and adjust this part at
that time.  Does it sound reasonable to you?

BR,
Kewen


Re: Re: [PATCH] RISC-V: Add RVV registers register spilling

2022-11-21 Thread jiawei



 -原始邮件-
 发件人: "Jeff Law" 
 发送时间: 2022-11-21 23:26:37 (星期一)
 收件人: "juzhe.zh...@rivai.ai" , schwab 

 抄送: gcc-patches , "monk.chiang" 
, "kito.cheng" , jiawei 

 主题: Re: [PATCH] RISC-V: Add RVV registers register spilling
 
 
 On 11/21/22 02:25, juzhe.zh...@rivai.ai wrote:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606523.html
  This patch obviously didn't include scalable size frame.
  So it ICE in offset = 
cfun-machine-frame.gp_sp_offset.to_constant ();
  We can't directly use to_constant if the frame is a scalable.
  Please fix it or revert it. Thanks
 
 We probably just need to reject everything in 
 riscv_get_setparate_components if the offset isn't constant. Something 
 like the attached patch (untested) might be enough to resolve the problem.
 
 
 Jeff
 

I tested this patch and it fix that problem and works well, 
thanks for you works!

Jiawei


[pushed] c++: contracts fixes

2022-11-21 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Fixing -Wunused-parm warnings and link errors depending on where -fcontracts
appears on the command line.

gcc/cp/ChangeLog:

* contracts.cc (build_contract_condition_function):
Set DECL_ARTIFICIAL on return value parm.
* g++spec.cc (lang_specific_driver): Add -lstdc++exp
just before -lstdc++.
---
 gcc/cp/contracts.cc |  1 +
 gcc/cp/g++spec.cc   | 20 
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 26396439361..f3afcc62ba0 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -1434,6 +1434,7 @@ build_contract_condition_function (tree fndecl, bool pre)
   tree name = get_identifier ("__r");
   tree parm = build_lang_decl (PARM_DECL, name, value_type);
   DECL_CONTEXT (parm) = fn;
+  DECL_ARTIFICIAL (parm) = true;
   DECL_ARGUMENTS (fn) = chainon (DECL_ARGUMENTS (fn), parm);
 
   *last = build_tree_list (NULL_TREE, value_type);
diff --git a/gcc/cp/g++spec.cc b/gcc/cp/g++spec.cc
index 257e49b7f3f..e599ac906f6 100644
--- a/gcc/cp/g++spec.cc
+++ b/gcc/cp/g++spec.cc
@@ -128,6 +128,9 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
   /* By default, we throw on the math library if we have one.  */
   int need_math = (MATH_LIBRARY[0] != '\0');
 
+  /* By default, we don't add -lstdc++exp.  */
+  bool need_experimental = false;
+
   /* True if we saw -static.  */
   int static_link = 0;
 
@@ -161,8 +164,7 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
   switch (decoded_options[i].opt_index)
{
case OPT_fcontracts:
- args[i] |= EXPERIMENTAL;
- ++added;
+ need_experimental = true;
  break;
 
case OPT_nostdlib:
@@ -292,7 +294,8 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
 #endif
 
   /* Add one for shared_libgcc or extra static library.  */
-  num_args = argc + added + need_math + (library > 0) * 4 + 1;
+  num_args = (argc + added + need_math + need_experimental
+ + (library > 0) * 4 + 1);
   /* For libc++, on most platforms, the ABI library (usually called libc++abi)
  is provided as a separate DSO, which we must also append.
  However, a platform might have the ability to forward the ABI library
@@ -355,11 +358,6 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
   _decoded_options[j]);
}
 
-  if ((args[i] & EXPERIMENTAL)
- && which_library == USE_LIBSTDCXX)
-   generate_option (OPT_l, "stdc++exp", 1, CL_DRIVER,
-_decoded_options[++j]);
-
   if ((args[i] & SKIPOPT) != 0)
--j;
 
@@ -370,6 +368,12 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
   /* Add `-lstdc++' if we haven't already done so.  */
   if (library > 0)
 {
+  if (need_experimental && which_library == USE_LIBSTDCXX)
+   {
+ generate_option (OPT_l, "stdc++exp", 1, CL_DRIVER,
+  _decoded_options[j++]);
+ ++added_libraries;
+   }
 #ifdef HAVE_LD_STATIC_DYNAMIC
   if (library > 1 && !static_link)
{

base-commit: 8b7fee1de9a723ccc24d2de1c89d233f27b16a0a
-- 
2.31.1



Re: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-21 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 22, 2022 at 1:41 AM Jeff Law via Gcc-patches
 wrote:
>
>
> On 11/18/22 23:25, Hongyu Wang via Gcc-patches wrote:
> > Hi,
> >
> > Followed by the discussion in pr107602, -munroll-only-small-loops
> > Does not turns on/off -funroll-loops, and current check in
> > pass_rtl_unroll_loops::gate would cause -funroll-loops do not take
> > effect. Revert the change about targetm.loop_unroll_adjust and apply
> > the backend option change to strictly follow the rule that
> > -funroll-loops takes full control of loop unrolling, and
> > munroll-only-small-loops just change its behavior to unroll small size
> > loops.
> >
> > Bootstrapped and regtested on x86-64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >   PR target/107602
> >   * common/config/i386/i386-common.cc (ix86_optimization_table):
> >   Enable loop unroll O2, disable -fweb and -frename-registers
> >   by default.
> >   * config/i386/i386-options.cc
> >   (ix86_override_options_after_change):
> >   Disable small loop unroll when funroll-loops enabled, reset
> >   cunroll_grow_size when it is not explicitly enabled.
> >   (ix86_option_override_internal): Call
> >   ix86_override_options_after_change instead of calling
> >   ix86_recompute_optlev_based_flags and ix86_default_align
> >   separately.
> >   * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> >   factor if -munroll-only-small-loops enabled.
> >   * loop-init.cc (pass_rtl_unroll_loops::gate): Do not enable
> >   loop unrolling for -O2-speed.
> >   (pass_rtl_unroll_loops::execute): Rmove
> >   targetm.loop_unroll_adjust check.
> The reversion of the loop-init.cc changes is fine.  The x86 maintainers
> will need to chime in on the rest.  Consider installing the loop-init.cc
> reversion immediately as the current state has regressed s390 and
> potentially other targets.
x86 part is ok.
>
>
> jeff
>


-- 
BR,
Hongtao


Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches

Hi,


When dynamically linking a fast enough machine hides the latency, but when
Statically linking or on slower devices this change caused a 5x increase in
Instruction count and 2x increase in cycle count before getting to main.


I have looked at ways to fix that. The problem is that with static 
linking unwinding tables are registered dynamically, and with my patch 
that registration triggers an eager sort of fde lists. While previously 
the lists were sorted when the first exception was thrown. If an 
application throws at least one exception there is no downside in eager 
sorting, but if the application never throws there is overhead.


The obvious way to improve the situation is to make sorting faster. When 
replacing the split+sort+merge logic with a radix sort (which can be 
done without additional memory) we get the following timings for your

#include 
int main() {}
example (with git stat -r 200):

# pre-patch version, fast
  0,06 msec task-clock
   272.286  cycles
   464.754  instructions
# post-patch version, slow
  0,21 msec task-clock
   972.876  cycles
 3.079.515  instructions
# +radix sort, in the middle
  0,13 msec task-clock
   604.697  cycles
 1.702.930  instructions

The radix sort clearly improves things, but it does not fully eliminate 
the overhead.


The question is, how much do we care about that situation (i.e., static 
linking, exceptions registered but never thrown). I could change the 
code to recognize three states instead of two: no exceptions registered, 
exceptions register but never thrown, and full exception mode. But that 
would increase code complexity and it would pessimize applications that 
do throw, as we now need more checks to guard against concurrent changes.


It makes the code more complex and a bit slower, which is why I am 
hesistant. But I can implement that if you think that we need that. Or 
we just replace the sort, which is probably a good idea anyway.


Best

Thomas


[committed] analyzer: fix ICE on 'bind' with non-pointer arg [P107783]

2022-11-21 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4220-g12a4785c9120be.

gcc/analyzer/ChangeLog:
PR analyzer/107783
* region-model-impl-calls.cc (kf_accept::matches_call_types_p):
Require that args 1 and 2 be pointers.
(kf_bind::matches_call_types_p): Require that arg 1 be a pointer.
* region-model.h (call_details::arg_is_pointer_p): New

gcc/testsuite/ChangeLog:
PR analyzer/107783
* gcc.dg/analyzer/fd-bind-pr107783.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model-impl-calls.cc  | 6 --
 gcc/analyzer/region-model.h  | 4 
 gcc/testsuite/gcc.dg/analyzer/fd-bind-pr107783.c | 5 +
 3 files changed, 13 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-bind-pr107783.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index a71eb3de98f..8a44c97eec9 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -595,7 +595,9 @@ class kf_accept : public known_function
 
   bool matches_call_types_p (const call_details ) const final override
   {
-return cd.num_args () == 3;
+return (cd.num_args () == 3
+   && cd.arg_is_pointer_p (1)
+   && cd.arg_is_pointer_p (2));
   }
 
   void impl_call_post (const call_details ) const final override
@@ -633,7 +635,7 @@ public:
 
   bool matches_call_types_p (const call_details ) const final override
   {
-return cd.num_args () == 3;
+return (cd.num_args () == 3 && cd.arg_is_pointer_p (1));
   }
 
   void impl_call_post (const call_details ) const final override
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index c828d739482..244780eb4f4 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -256,6 +256,10 @@ public:
   bool maybe_set_lhs (const svalue *result) const;
 
   unsigned num_args () const;
+  bool arg_is_pointer_p (unsigned idx) const
+  {
+return POINTER_TYPE_P (get_arg_type (idx));
+  }
 
   const gcall *get_call_stmt () const { return m_call; }
   location_t get_location () const;
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-bind-pr107783.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-bind-pr107783.c
new file mode 100644
index 000..36304179b43
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-bind-pr107783.c
@@ -0,0 +1,5 @@
+int
+foo (void)
+{
+  return bind (0, 0, 0); /* { dg-warning "implicit declaration of function 
'bind'" } */
+}
-- 
2.26.3



[committed] analyzer: fix ICE on 'bind' that returns a struct [PR107788]

2022-11-21 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4221-g4e4e45a4fd3411.

gcc/analyzer/ChangeLog:
PR analyzer/107788
* region-model.cc (region_model::update_for_int_cst_return):
Require that the return type be an integer type.
(region_model::update_for_nonzero_return): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/107788
* g++.dg/analyzer/fd-bind-pr107783.C: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc |  4 
 gcc/testsuite/g++.dg/analyzer/fd-bind-pr107783.C | 11 +++
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/fd-bind-pr107783.C

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 1d5b09a6805..e71fd41f62d 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1930,6 +1930,8 @@ region_model::update_for_int_cst_return (const 
call_details ,
 {
   if (!cd.get_lhs_type ())
 return;
+  if (TREE_CODE (cd.get_lhs_type ()) != INTEGER_TYPE)
+return;
   const svalue *result
 = m_mgr->get_or_create_int_cst (cd.get_lhs_type (), retval);
   if (unmergeable)
@@ -1955,6 +1957,8 @@ region_model::update_for_nonzero_return (const 
call_details )
 {
   if (!cd.get_lhs_type ())
 return;
+  if (TREE_CODE (cd.get_lhs_type ()) != INTEGER_TYPE)
+return;
   const svalue *zero
 = m_mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
   const svalue *result
diff --git a/gcc/testsuite/g++.dg/analyzer/fd-bind-pr107783.C 
b/gcc/testsuite/g++.dg/analyzer/fd-bind-pr107783.C
new file mode 100644
index 000..eb5e23c82cf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/fd-bind-pr107783.C
@@ -0,0 +1,11 @@
+// { dg-do compile { target c++11 } }
+
+struct minus {
+} _1;
+int _2;
+struct _Bind {
+  _Bind(_Bind &);
+};
+template 
+_Bind bind(_Func, _BoundArgs &&...);
+void test01() { bind(minus(), _2, _1); }
-- 
2.26.3



[committed] analyzer: fix ICE on writes to errno [PR107777]

2022-11-21 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4219-g358dab90186b30.

gcc/analyzer/ChangeLog:
PR analyzer/10
* call-summary.cc
(call_summary_replay::convert_region_from_summary_1): Handle
RK_THREAD_LOCAL and RK_ERRNO in switch.
* region-model.cc (region_model::get_representative_path_var_1):
Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/10
* gcc.dg/analyzer/call-summaries-errno.c: New test.
* gcc.dg/analyzer/errno-pr10.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/call-summary.cc  |  2 ++
 gcc/analyzer/region-model.cc  |  2 ++
 .../gcc.dg/analyzer/call-summaries-errno.c| 17 
 .../gcc.dg/analyzer/errno-pr10.c  | 20 +++
 4 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/call-summaries-errno.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/errno-pr10.c

diff --git a/gcc/analyzer/call-summary.cc b/gcc/analyzer/call-summary.cc
index ebc7b5028ec..4c4694b5381 100644
--- a/gcc/analyzer/call-summary.cc
+++ b/gcc/analyzer/call-summary.cc
@@ -575,6 +575,7 @@ call_summary_replay::convert_region_from_summary_1 (const 
region *summary_reg)
 case RK_CODE:
 case RK_STACK:
 case RK_HEAP:
+case RK_THREAD_LOCAL:
 case RK_ROOT:
   /* These should never be pointed to by a region_svalue.  */
   gcc_unreachable ();
@@ -582,6 +583,7 @@ call_summary_replay::convert_region_from_summary_1 (const 
region *summary_reg)
 case RK_FUNCTION:
 case RK_LABEL:
 case RK_STRING:
+case RK_ERRNO:
 case RK_UNKNOWN:
   /* We can reuse these regions directly.  */
   return summary_reg;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 81f58a59f4f..1d5b09a6805 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4754,6 +4754,7 @@ region_model::get_representative_path_var_1 (const region 
*reg,
 case RK_CODE:
 case RK_HEAP:
 case RK_STACK:
+case RK_THREAD_LOCAL:
 case RK_ROOT:
/* Regions that represent memory spaces are not expressible as trees.  
*/
   return path_var (NULL_TREE, 0);
@@ -4873,6 +4874,7 @@ region_model::get_representative_path_var_1 (const region 
*reg,
   }
 
 case RK_VAR_ARG:
+case RK_ERRNO:
 case RK_UNKNOWN:
   return path_var (NULL_TREE, 0);
 }
diff --git a/gcc/testsuite/gcc.dg/analyzer/call-summaries-errno.c 
b/gcc/testsuite/gcc.dg/analyzer/call-summaries-errno.c
new file mode 100644
index 000..e4333b30bb7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/call-summaries-errno.c
@@ -0,0 +1,17 @@
+/* { dg-additional-options "-fanalyzer-call-summaries" } */
+
+#include 
+#include "analyzer-decls.h"
+
+void sets_errno (int x)
+{
+  errno = x;
+}
+
+void test_sets_errno (int y)
+{
+  sets_errno (y);
+  sets_errno (y);
+
+  __analyzer_eval (errno == y); /* { dg-warning "TRUE" } */  
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/errno-pr10.c 
b/gcc/testsuite/gcc.dg/analyzer/errno-pr10.c
new file mode 100644
index 000..65687393657
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/errno-pr10.c
@@ -0,0 +1,20 @@
+int *
+__errno_location (void);
+
+long int
+read (int, void *, unsigned long int);
+
+struct IOBUF {
+  int fd;
+};
+
+void
+do_getline_end_data (struct IOBUF *iop, int tree)
+{
+  char end_data;
+
+  if (tree)
+*__errno_location () = 0;
+
+  read (iop->fd, _data, sizeof end_data);
+}
-- 
2.26.3



[committed] analyzer, testsuite: add more examples taken from CWE

2022-11-21 Thread David Malcolm via Gcc-patches
Successfully tested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4218-g9ada45967b4cf5.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/CWE-131-examples.c: New test.
* gcc.dg/analyzer/file-CWE-1341-example.c: New test.
* gcc.dg/analyzer/malloc-CWE-401-example.c: New test.
* gcc.dg/analyzer/malloc-CWE-415-examples.c: New test.
* gcc.dg/analyzer/malloc-CWE-416-examples.c: New test.
* gcc.dg/analyzer/malloc-CWE-590-examples.c: New test.

Signed-off-by: David Malcolm 
---
 .../gcc.dg/analyzer/CWE-131-examples.c| 146 ++
 .../gcc.dg/analyzer/file-CWE-1341-example.c   |  41 +
 .../gcc.dg/analyzer/malloc-CWE-401-example.c  |  37 +
 .../gcc.dg/analyzer/malloc-CWE-415-examples.c |  53 +++
 .../gcc.dg/analyzer/malloc-CWE-416-examples.c |  60 +++
 .../gcc.dg/analyzer/malloc-CWE-590-examples.c |  44 ++
 6 files changed, 381 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/CWE-131-examples.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-CWE-401-example.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-CWE-415-examples.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-CWE-416-examples.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-CWE-590-examples.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/CWE-131-examples.c 
b/gcc/testsuite/gcc.dg/analyzer/CWE-131-examples.c
new file mode 100644
index 000..3bc898cd0cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/CWE-131-examples.c
@@ -0,0 +1,146 @@
+/* Examples adapted from https://cwe.mitre.org/data/definitions/131.html
+   which states "Copyright © 2006–2022, The MITRE Corporation. CWE, CWSS, 
CWRAF, and the CWE logo are trademarks of The MITRE Corporation."
+   and which has this on:
+ https://cwe.mitre.org/about/termsofuse.html
+
+   Terms of Use
+
+   CWE™ is free to use by any organization or individual for any research, 
development, and/or commercial purposes, per these CWE Terms of Use. The MITRE 
Corporation ("MITRE") has copyrighted the CWE List, Top 25, CWSS, and CWRAF for 
the benefit of the community in order to ensure each remains a free and open 
standard, as well as to legally protect the ongoing use of it and any resulting 
content by government, vendors, and/or users. CWE is a trademark of MITRE. 
Please contact c...@mitre.org if you require further clarification on this 
issue.
+
+   LICENSE
+
+   CWE Submissions: By submitting materials to The MITRE Corporation’s 
("MITRE") Common Weakness Enumeration Program (CWE™), you hereby grant to MITRE 
a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable 
copyright license to use, reproduce, prepare derivative works of, publicly 
display, publicly perform, sublicense, and distribute your submitted materials 
and derivative works. Unless otherwise required by applicable law or agreed to 
in writing, it is understood that you are providing such materials on an "AS 
IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied, including, without limitation, any warranties or conditions of TITLE, 
NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.
+
+   CWE Usage: MITRE hereby grants you a non-exclusive, royalty-free license to 
use CWE for research, development, and commercial purposes. Any copy you make 
for such purposes is authorized on the condition that you reproduce MITRE’s 
copyright designation and this license in any such copy.
+
+   DISCLAIMERS
+
+   ALL DOCUMENTS AND THE INFORMATION CONTAINED IN THE CWE ARE PROVIDED ON AN 
"AS IS" BASIS AND THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS 
SPONSORED BY (IF ANY), THE MITRE CORPORATION, ITS BOARD OF TRUSTEES, OFFICERS, 
AGENTS, AND EMPLOYEES, DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL 
NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+
+   IN NO EVENT SHALL THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS 
SPONSORED BY (IF ANY), THE MITRE CORPORATION, ITS BOARD OF TRUSTEES, OFFICERS, 
AGENTS, AND EMPLOYEES BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
CONNECTION WITH THE INFORMATION OR THE USE OR OTHER DEALINGS IN THE CWE.  */
+
+#include 
+#include 
+#include 
+
+/* Support decls for example 1.  */
+
+extern unsigned int GetUntrustedSizeValue();
+extern void ExitError(const char *) __attribute__((noreturn));
+
+typedef struct Widget
+{
+} Widget;
+
+#define MAX_NUM_WIDGETS 100
+
+extern Widget *InitializeWidget();
+extern void showWidgets(Widget **);
+
+void example_1 (void)
+{
+  int i;
+  unsigned int numWidgets;
+  Widget **WidgetList;
+
+  numWidgets = GetUntrustedSizeValue();
+  if 

Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-21 Thread Segher Boessenkool
Hi!

On Mon, Nov 21, 2022 at 02:18:39PM +0800, HAO CHEN GUI wrote:
> 在 2022/11/18 20:18, Segher Boessenkool 写道:
> > I don't think we should pretend we have any conditional jumps the
> > machine does not actually have, in cbranchcc4.  When would this ever be
> > useful?  cror;beq can be quite expensive, compared to the code it would
> > replace anyway.
> > 
> > If something generates those here (which then ICEs later), that is
> > wrong, fix *that*?  Is it ifcvt doing it?
> 
> "*cbranch_2insn" is a valid insn for rs6000. So it generates such insn
> at expand pass. The "prepare_cmp_insn" called by ifcvt just wants to verify
> that the following comparison rtx is valid.

*cbranch_2insn is not a machine insn.  It generates a cror and a branch
insn.  This makes no sense to have in a cbranchcc: those do a branch
based on an existing cr field, so based on the *output* of that cror.

If ifcvt requires differently, ifcvt needs fixing.

We want to use the output of the cror multiple times, not generate more
cror insns.

> (unlt (reg:CCFP 156)
> (const_int 0 [0]))
> 
> It should be valid as it's extracted from an existing insn.

Why is that an argument?  The code is valid, sure, but that doesn't
mean we want to generate it all over the place.

> It hits ICE only
> when the comparison rtx can't pass the predicate check of "cbranchcc4". So
> "cbranchcc4" should include "extra_insn_branch_comparison_operator".
> 
> Then, ifcvt tries to call emit_conditional_move_1 to generates a condition
> move for FP mode. It definitely fails as there is no conditional move insn for
> FP mode in rs6000. The behavior of ifcvt is correct. It tries to do conversion
> but fails. It won't hit ICEs after cbranchcc4 is correctly defined.

I don't think the behaviour of ifcvt is correct here at all, no.  It
also does not consider the cost of the code as far as I can see?  That
could reduce the impact of this problem at least.

> Actually, "*cbranch_2insn" has the same logical as float "*cbranch" in ifcvt.
> Both of them get a final false return from "rs6000_emit_int_cmove" as rs6000
> doesn't have conditional move for FP mode.

I am about to commit patches for that.  But only for p10 and later.
It should eventually work for everything with isel (setbc can often be
optimised to isel after all), but the compiler has to work without isel
as well of course.

> So I think "cbranchcc4" should include "extra_insn_branch_comparison_operator"
> as "*cbranch_2insn" is a valid insn. Just let ifcvt decide a conditional
> move is valid or not.

It makes a bad decision though.  This is not okay.


Segher


[PATCH RFA(configure)] c++: provide strchrnul on targets without it [PR107781]

2022-11-21 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, and also manually changing the HAVE_DECL_STRCHRNUL
flag.  OK for trunk?

-- 8< --

The Contracts implementation uses strchrnul, which is a glibc extension, so
bootstrap broke on non-glibc targets.  I considered unconditionally using a
local definition, but I guess we might as well use the libc version if it
exists.

PR c++/107781

gcc/cp/ChangeLog:

* contracts.cc (strchrnul): Define if needed.

gcc/ChangeLog:

* configure.ac: Check for strchrnul.
* config.in, configure: Regenerate.
---
 gcc/cp/contracts.cc | 12 
 gcc/config.in   |  7 +++
 gcc/configure.ac|  2 +-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 26396439361..8b11f26ca27 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -204,6 +204,18 @@ lookup_concrete_semantic (const char *name)
   return CCS_INVALID;
 }
 
+#if !HAVE_DECL_STRCHRNUL
+/* strchrnul is a glibc extension.  */
+
+static const char *
+strchrnul (const char *s, char c)
+{
+  if (auto p = strchr (s, c))
+return p;
+  return strchr (s, '\0');
+}
+#endif
+
 /* Compare role and name up to either the NUL terminator or the first
occurrence of colon.  */
 
diff --git a/gcc/config.in b/gcc/config.in
index 38ef792bd67..4a5dfb4151c 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1126,6 +1126,13 @@
 #endif
 
 
+/* Define to 1 if we found a declaration for 'strchrnul', otherwise define to 
0.
+   */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DECL_STRCHRNUL
+#endif
+
+
 /* Define to 1 if we found a declaration for 'strnlen', otherwise define to 0.
*/
 #ifndef USED_FOR_TARGET
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 7c55bff6cb0..1124ecfa218 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1581,7 +1581,7 @@ CXXFLAGS="$CXXFLAGS -I${srcdir} -I${srcdir}/../include 
$GMPINC"
 # normal autoconf function for these.  But force definition of
 # HAVE_DECL_BASENAME like gcc_AC_CHECK_DECLS does, to suppress the bizarre
 # basename handling in libiberty.h.
-AC_CHECK_DECLS([basename(const char*), strstr(const char*,const char*)], , ,[
+AC_CHECK_DECLS([basename(const char*), strchrnul(const char*, int), 
strstr(const char*,const char*)], , ,[
 #undef HAVE_DECL_BASENAME
 #define HAVE_DECL_BASENAME 1
 #include "ansidecl.h"

base-commit: 5c0d171f67d082c353ddc319859111d3b9126c17
prerequisite-patch-id: 275d90e1bd8b940c1cca2840bc38dc4fafa0797b
-- 
2.31.1



Activate gcc builder problem emails (Was: [PATCH v2] genmultilib: Add sanity check)

2022-11-21 Thread Mark Wielaard
Hi Christophe,

On Mon, Nov 21, 2022 at 01:35:34PM +0100, Christophe Lyon wrote:
> On 11/21/22 13:32, Mark Wielaard wrote:
> > > I've just sent a fix:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606887.html
> > > 
> > > Hopefully that one is right
> > 
> > The buildbot gcc builds are turning green again!
> > https://builder.sourceware.org/buildbot/#/changes/13736
> 
> Good! I didn't imagine the feedback would be so fast :-)
> 
> BTW, I don't remember receiving an email from the buildbot after the
> breakage, does it send such emails for binutils/gdb only?

You are right, we didn't seem to have any problem report emails for
the gcc builds. The attached (pushed) patch activates email reports
for the quick gcc builders. Whenever they fail the patch author should
get an email now. Also it will now sent email to gcc-testresults if a
build starts failing or becomes successful again.

Note that we don't have good reporting for the full gcc builders
yet. We don't have regression detection yet, so we can only see that
the full testsuite passes or fails. All results are put into the
bunsen database though.

Cheers,

Mark>From adbdb81530f97880facb942afe14781c1006e283 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Mon, 21 Nov 2022 23:47:53 +0100
Subject: [PATCH] Add mail notifiers for gcc builds

Change gcc_fedrawhide_x86_64_builder tag from "gcc" to "gcc-full"
like other "full" builders (debian-amd64, ubuntu-armhf and ubuntu-arm64).

Add mail notifiers for builders tagged "gcc". One to sent email to
patch author if a new problem (failed build) appears. Another to
sent email to gcc-testresults whenever a build changes from success
to failed or the other way around.
---
 builder/master.cfg | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/builder/master.cfg b/builder/master.cfg
index 61bf4f1..b25731d 100644
--- a/builder/master.cfg
+++ b/builder/master.cfg
@@ -2888,7 +2888,7 @@ gcc_fedrawhide_x86_64_builder = util.BuilderConfig(
 name="gcc-fedrawhide-x86_64",
 collapseRequests=True,
 workernames=["fedrawhide-x86_64"],
-tags=["gcc", "fedora", "x86_64"],
+tags=["gcc-full", "fedora", "x86_64"],
 factory=gcc_factory)
 c['builders'].append(gcc_fedrawhide_x86_64_builder)
 
@@ -3857,6 +3857,27 @@ mn_elfutils_try = reporters.MailNotifier(
 generators=[generator_elfutils_try])
 c['services'].append(mn_elfutils_try)
 
+# Problem report for the whole gcc tagged builder set
+# Goes to patch author if a new problem appears
+generator_gcc = reporters.BuildSetStatusGenerator(
+mode=('problem',), tags=['gcc'])
+mn_gcc = reporters.MailNotifier(
+fromaddr="buil...@sourceware.org",
+sendToInterestedUsers=True,
+generators=[generator_gcc])
+c['services'].append(mn_gcc)
+
+# Change report for the whole gcc tagged builder set
+# Goes to the mailinglist if a builder result changes
+generator_gcc_change = reporters.BuildSetStatusGenerator(
+mode=('change',), tags=['gcc'])
+mn_gcc_change = reporters.MailNotifier(
+fromaddr="buil...@sourceware.org",
+sendToInterestedUsers=False,
+extraRecipients=['gcc-testresu...@gcc.gnu.org'],
+generators=[generator_gcc_change])
+c['services'].append(mn_gcc_change)
+
 # Problem report for the whole gccrust tagged builder set
 generator_gccrust = reporters.BuildSetStatusGenerator(
 mode=('problem',), tags=['gccrust'])
-- 
2.30.2



[PATCH] c++: Fix up -fcontract* options

2022-11-21 Thread Jakub Jelinek via Gcc-patches
Hi!

I've noticed
+FAIL: compiler driver --help=c++ option(s): "^ +-.*[^:.]\$" absent from 
output: "  -fcontract-build-level=[off|default|audit] Specify max contract 
level to generate runtime checks for"
error, this is due to missing dot at the end of the description.

The second part of the first hunk should fix that, but while at it,
I find it weird that some options don't have RejectNegative, yet
for options that accept an argument a negative option looks weird
and isn't really handled.

Though, shall we have those [on|off] options at all?
Those are inconsistent with all other boolean options gcc has.
Every other boolean option is -fwhatever for it being on
and -fno-whatever for it being off, shouldn't the options be
without arguments and accept negatives (-fcontract-assumption-mode
vs. -fno-contract-assumption-mode etc.)?

2022-11-21  Jakub Jelinek  

* c.opt (fcontract-assumption-mode=, fcontract-continuation-mode=,
fcontract-role=, fcontract-semantic=): Add RejectNegative.
(fcontract-build-level=): Terminate description with dot.

--- gcc/c-family/c.opt.jj   2022-11-19 09:21:14.31706 +0100
+++ gcc/c-family/c.opt  2022-11-21 23:51:55.605736499 +0100
@@ -1692,12 +1692,12 @@ EnumValue
 Enum(on_off) String(on) Value(1)
 
 fcontract-assumption-mode=
-C++ Joined
+C++ Joined RejectNegative
 -fcontract-assumption-mode=[on|off]Enable or disable treating axiom level 
contracts as assumptions (default on).
 
 fcontract-build-level=
 C++ Joined RejectNegative
--fcontract-build-level=[off|default|audit] Specify max contract level to 
generate runtime checks for
+-fcontract-build-level=[off|default|audit] Specify max contract level to 
generate runtime checks for.
 
 fcontract-strict-declarations=
 C++ Var(flag_contract_strict_declarations) Enum(on_off) Joined Init(0) 
RejectNegative
@@ -1708,15 +1708,15 @@ C++ Var(flag_contract_mode) Enum(on_off)
 -fcontract-mode=[on|off]   Enable or disable all contract facilities 
(default on).
 
 fcontract-continuation-mode=
-C++ Joined
+C++ Joined RejectNegative
 -fcontract-continuation-mode=[on|off]  Enable or disable contract continuation 
mode (default off).
 
 fcontract-role=
-C++ Joined
+C++ Joined RejectNegative
 -fcontract-role=: Specify the semantics for all levels in 
a role (default, review), or a custom contract role with given semantics (ex: 
opt:assume,assume,assume)
 
 fcontract-semantic=
-C++ Joined
+C++ Joined RejectNegative
 -fcontract-semantic=: Specify the concrete semantics for level
 
 fcoroutines

Jakub



Re: [PATCH 2/2] Fortran: add attribute target_clones

2022-11-21 Thread Bernhard Reutner-Fischer via Gcc-patches
On Mon, 21 Nov 2022 20:13:40 +0100
Mikael Morin  wrote:

> Hello,
> 
> Le 09/11/2022 à 20:02, Bernhard Reutner-Fischer via Fortran a écrit :
> > Hi!
> > 
> > Add support for attribute target_clones:
> > !GCC$ ATTRIBUTES target_clones("arch1", "arch3","default") :: mysubroutine

> > +/* Internal helper to parse attribute argument list.
> > +   If REQUIRE_STRING is true, then require a string.
> > +   If ALLOW_MULTIPLE is true, allow more than one arg.
> > +   If multiple arguments are passed, require braces around them.
> > +   Returns a tree_list of arguments or NULL_TREE.  */
> > +static tree
> > +gfc_match_gcc_attribute_args (bool require_string, bool allow_multiple)

> > +   do {

> > +   } while (num_quotes % 2 && gfc_match_eos () != MATCH_YES);  
> The do-while loops are wrongly indented.
> It should be:
>do
>  {
>...
>  }
>while (...)

oops, right.

> > +   tree str = build_string (pos, name);
> > +   /* Compare with c-family/c-common.cc: fix_string_type.  */
> > +   tree i_type = build_index_type (size_int (pos));
> > +   tree a_type = build_array_type (char_type_node, i_type);
> > +   TREE_TYPE (str) = a_type;
> > +   TREE_READONLY (str) = 1;
> > +   TREE_STATIC (str) = 1;
> > +   attr_arg = build_tree_list (NULL_TREE, str);
> > +   attr_args = chainon (attr_args, attr_arg);  
> Same comment as for the flatten attribute:
> please no tree stuff out of the trans-*.cc files.

yes ok, noted. It's a pity in this context, where we purely pass a blob
on to the ME but ok.

> This includes gfortran.h, so the attribute arguments need to be carried 
> around using the front-end structures (gfc_actual_arglist for example).

That's a sane rule of thumb, yes.
Usually, the parser deals with language grammar and not with pure
passthrough remarks, so that's fair. Not so much in the case of such
attribs but i see your point :)
 
> > +  if (allow_multiple && gfc_match_char (')') != MATCH_YES)
> > +{
> > +  gfc_error ("expected ')' at %C");
> > +  return NULL_TREE;
> > +}
> > +
> > +  return attr_args;
> > +}  
> I'm not sure this function need to do all the parsing manually.
> I would rather use gfc_match_actual_arglist, or maybe implement the 
> function as a wrapper around it.
> What is allowed here?  Are non-literal constants allowed, for example 
> parameter variables?  Is line continuation supported ?

Line continuation is supported i think.
Parameter variables supposedly are or should not be supported. Why would
you do that in the context of an attribute target decl?
Either way, if the ME does not find such an fndecl, it will complain
and ignore the attribute.
I don't understand non-literal constants in this context.
This very attribute applies to decls, so the existing code supposedly
matches a comma separated list of identifiers. The usual dollar-ok
caveats apply.

As to gfc_match_actual_arglist, probably.
target_clones has
+  { "target_clones",  1, -1, true, false, false, false,
+ dummy, NULL },
with tree-core.h struct attribute_spec, so
name, min=1, max=unbounded, decl_required=yes, ...ignore...

hence applies to functions and subroutines and the like. It does take an
unbounded list of strings, isa1, isa2, isa4, default. We could add
"default" unless seen, but i'd rather want it spelled out by the user
for the user is supposed to know what she's doing, as in c or c++.
The ME has code to sanity-check the attributes, including conflicting
(ME) attributes.

The reason why i contemplated with a separate parser was that for stuff
like regparm or sseregparm, you would want to require a single number
for the equivalent of

__attribute__((regparm(3),stdcall)

which you would provide in 2 separate !GCC$ attributes i assume.

> 
> Nothing (bad) to say about the rest, but there is enough to change with 
> the above comments.

Yes, many thanks for your comments.
I think there is no other non-intrusive way to pass the data through the
frontend. So for an acceptable way this means touching quite some spots
for every single ME attribute anybody would like to add in the future.
But that's how it is.


[PATCH] Fix autoprofiledbootstrap build

2022-11-21 Thread Eugene Rozenfeld via Gcc-patches
1. Fix gcov version
2. Don't attempt to create an autoprofile file for cc1 since cc1plus
(not cc1) is not invoked when building cc1
3. Fix documentation typo

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* c/Make-lang.in: Don't attempt to create an autoprofile file for cc1
* cp/Make-lang.in: Fix gcov version
* lto/Make-lang.in: Fix gcov version
* doc/install.texi: Fix documentation typo
---
 gcc/c/Make-lang.in   | 15 +--
 gcc/cp/Make-lang.in  |  2 +-
 gcc/doc/install.texi |  2 +-
 gcc/lto/Make-lang.in |  2 +-
 4 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in
index 9bd9c0ea123..ba33ec03bf0 100644
--- a/gcc/c/Make-lang.in
+++ b/gcc/c/Make-lang.in
@@ -62,12 +62,6 @@ c_OBJS = $(C_OBJS) cc1-checksum.o c/gccspec.o
 # Use strict warnings for this front end.
 c-warn = $(STRICT_WARN)
 
-ifeq ($(if $(wildcard ../stage_current),$(shell cat \
-  ../stage_current)),stageautofeedback)
-$(C_OBJS): ALL_COMPILERFLAGS += -fauto-profile=cc1.fda
-$(C_OBJS): cc1.fda
-endif
-
 # compute checksum over all object files and the options
 # re-use the checksum from the prev-final stage so it passes
 # the bootstrap comparison and allows comparing of the cc1 binary
@@ -88,9 +82,6 @@ cc1$(exeext): $(C_OBJS) cc1-checksum.o $(BACKEND) $(LIBDEPS)
  cc1-checksum.o $(BACKEND) $(LIBS) $(BACKENDLIBS)
@$(call LINK_PROGRESS,$(INDEX.c),end)
 
-cc1.fda: ../stage1-gcc/cc1$(exeext) ../prev-gcc/$(PERF_DATA)
-   $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov cc1.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 1
-
 #

 # Build hooks:
 
@@ -180,7 +171,6 @@ c.mostlyclean:
-rm -f cc1$(exeext)
-rm -f c/*$(objext)
-rm -f c/*$(coverageexts)
-   -rm -f cc1.fda
 c.clean:
 c.distclean:
-rm -f c/config.status c/Makefile
@@ -201,7 +191,4 @@ c.stageprofile: stageprofile-start
-mv c/*$(objext) stageprofile/c
 c.stagefeedback: stagefeedback-start
-mv c/*$(objext) stagefeedback/c
-c.autostageprofile: autostageprofile-start
-   -mv c/*$(objext) autostageprofile/c
-c.autostagefeedback: autostagefeedback-start
-   -mv c/*$(objext) autostagefeedback/c
+
diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index 291835d326e..49e5cd66912 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -178,7 +178,7 @@ endif
 cp/name-lookup.o: $(srcdir)/cp/std-name-hint.h
 
 cc1plus.fda: ../stage1-gcc/cc1plus$(exeext) ../prev-gcc/$(PERF_DATA)
-   $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov cc1plus.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 1
+   $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov cc1plus.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 2
 
 #

 # Build hooks:
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index c1876f24a84..61a483bc410 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3059,7 +3059,7 @@ It is recommended to only use GCC for this.
 
 On Linux/x86_64 hosts with some restrictions (no virtualization) it is
 also possible to do autofdo build with @samp{make
-autoprofiledback}. This uses Linux perf to sample branches in the
+autoprofiledbootstrap}. This uses Linux perf to sample branches in the
 binary and then rebuild it with feedback derived from the profile.
 Linux perf and the @code{autofdo} toolkit needs to be installed for
 this.
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index a2dcf0dfc12..3ee748489ac 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -106,7 +106,7 @@ $(LTO_DUMP_EXE): $(LTO_DUMP_OBJS) $(BACKEND) $(LIBDEPS) 
$(lto2.prev)
 lto/lto-dump.o: $(LTO_OBJS)
 
 lto1.fda: ../prev-gcc/lto1$(exeext) ../prev-gcc/$(PERF_DATA)
-   $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov lto1.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 1
+   $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov lto1.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 2
 
 # LTO testing is done as part of C/C++/Fortran etc. testing.
 check-lto:
-- 
2.25.1



[PATCH] Fix count comparison in ipa-cp

2022-11-21 Thread Eugene Rozenfeld via Gcc-patches
The existing comparison was incorrect for non-PRECISE counts
(e.g., AFDO): we could end up with a 0 base_count, which could
lead to asserts, e.g., in good_cloning_opportunity_p.

gcc/ChangeLog:

* ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison.
---
 gcc/ipa-cp.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index d2bcd5e5e69..9df8b456759 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -4225,7 +4225,7 @@ ipcp_propagate_stage (class ipa_topo_info *topo)
for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee)
  {
profile_count count = cs->count.ipa ();
-   if (!(count > profile_count::zero ()))
+   if (!count.nonzero_p ())
  continue;
 
enum availability avail;
-- 
2.25.1



[PATCH] libstdc++: Make chrono::hh_mm_ss more compact

2022-11-21 Thread Jonathan Wakely via Gcc-patches
While finishing the time zone support for C++20, I noticed that the
chrono::hh_mm_ss type is surprisingly large: 40 bytes. That's because
we use duration for each of the four members, _M_h, _M_m,
_M_s and _M_ss. This is very wasteful. The patch below reduces it to 16
bytes (or less for some targets) for most specializations using an
integral type for the hh_mm_ss::precision::rep type.

Patrick, you added hh_mm_ss, but I assume you just implemented what is
shown for the exposition-only members in the standard, rather than
intentionally choosing this 40-byte representation? I don't recall
discussing it at the time, but I might be forgetting something.

Does the patch below make sense? I could remove the partial
specialization that use a single byte for the subseconds, or only do
that on 32-bit targets where alignof(int64_t) is 4. For targets where
alignof(int64_t) is 8 we're always going to be 16 bytes minimum, so
maybe there's no point squeezing everything into 12 bytes if we have a
mandatory 4 bytes of padding.

We could make those members use even less space, something like:

hours _M_h;
unsigned _M_m : 6;
unsigned _M_s : 6;
unsigned _M_is_neg : 1;
unsigned _M_ss : 51;  // good enough for femtoseconds

This would only require 16 bytes even for hh_mm_ss, but
requires extra instructions to read the values. That seems like the
wrong trade off. A compact layout for hh_mm_ss
and hh_mm_ss seems good enough. Paying
extra instructions to access the members just to support incredibly high
precision doesn't make sense to me.

We could also elide the bool _M_is_neg member if _Duration::rep is
unsigned, but it doesn't seem worth the metaprogramming/maintenance
cost.

We could consider using duration> for the
_M_h member instead of chrono::hours (which uses int64_t). It seems
unlikely (but not impossible) anybody will want to use hh_mm_ss for
billions of hours past midnight. If we do that, then using a single byte
for subseconds (when possible) does make sense, because the whole
hh_mm_ss would fit in just 8 bytes (4 for hours, 1 for minutes, 1 for
seconds, 1 for is_neg, and 1 for the subseconds). And then bit-fields
start to look more appealing. Maybe just for the subseconds and is_neg
flag, which would be 8 bytes for precision::period >= ratio<1,10'000>:

uint_least32_t _M_h;
uint_least8_t  _M_m;
uint_least8_t  _M_s;
uint_least16_t _M_ss : 15;
uint_least16_t _M_is_neg : 1;

Has anybody got any other suggestions for optimizing this or other
chrono types? e.g. I think we should also consider using int32_t as the
rep for chrono::{days,weeks,years} because a 64-bit type for years is
unnecessary.

It looks like other implementations don't bother trying to save space
for chrono::hh_mm_ss, which makes me wonder if there's some reason not
to do this. For MSVC it's only 32 bytes, which I think is because they
use a 32-bit type for chrono::hours and chrono::minutes, not because
they're doing anything special to make chrono::hh_mm_ss smaller.


-- >8 --

This uses a single byte for the minutes and seconds members, and places
the bool member next to those single bytes. This means we do not need 40
bytes to store a time that can fit in a single 8-byte integer.

When there is no subsecond precision we can do away with the _M_ss
member altogether. If the subsecond precision is coarse enough, we can
use a smaller representation for _M_ss, e.g. hh_mm_ss only
needs uint_least32_t for _M_ss, and hh_mm_ss>
and hh_mm_ss> each only needs a single byte. In
the latter case the type can only ever represent up to 255ns anyway, so
we don't need a larger representation type (in such cases, we could even
remove the _M_h, _M_m and _M_s members, but it's a very unlikely
scenario that isn't worth optimizing for).

Except for specializations with a floating-point rep or using higher
precision than nanoseconds, hh_mm_ss should now fit in 16 bytes, or even
12 bytes for x86-32 where alignof(long long) == 4.

libstdc++-v3/ChangeLog:

* include/std/chrono (chrono::hh_mm_ss): Do not use 64-bit
representations for all four duration members. Reorder members.
(hh_mm_ss::_S_fractional_width()): Optimize using countr_zero.
(hh_mm_ss::hh_mm_ss()): Define as defaulted.
(hh_mm_ss::hh_mm_ss(Duration)): Delegate to a new private
constructor, instead of calling chrono::abs repeatedly.
* testsuite/std/time/hh_mm_ss/1.cc: Check floating-point
representations. Check default constructor. Check sizes.
---
 libstdc++-v3/include/std/chrono   | 139 +-
 libstdc++-v3/testsuite/std/time/hh_mm_ss/1.cc |  54 ++-
 2 files changed, 158 insertions(+), 35 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 2468023f6c5..4ee163ba762 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -41,6 +41,7 @@
 #include 
 
 #if __cplusplus >= 202002L
+# include 
 # include 
 # include 
 # include 
@@ -2262,6 +2263,7 @@ 

Re: [PATCH 1/2] Fortran: Cleanup struct ext_attr_t

2022-11-21 Thread Bernhard Reutner-Fischer via Gcc-patches
On Mon, 21 Nov 2022 12:08:20 +0100
Mikael Morin  wrote:

> > * gfortran.h (struct ext_attr_t): Remove middle_end_name.
> > * trans-decl.cc (add_attributes_to_decl): Move building
> > tree_list to ...
> > * decl.cc (gfc_match_gcc_attributes): ... here. Add the attribute to
> > the tree_list for the middle end.
> >   
> I prefer to not do any middle-end stuff at parsing time, so I would 
> rather not do this change.
> Not OK.

Ok, that means we should filter-out those bits that we don't want to
write to the module (right?). We've plenty of bits left, more than Dave
Love would want to have added, i hope, so that should not be much of a
concern.

What that table really wants to say is whether or not this attribute
should be passed to the ME. Would it be acceptable to remove these
duplicate strings and just have a bool/char/int that is true if it
should be lowered (in trans-decl, as before)? But now i admit it's just
bikeshedding and we can as well leave it alone for now.. It was just a
though.

thanks,


Re: [PATCH 2/2] Fortran: Add attribute flatten

2022-11-21 Thread Bernhard Reutner-Fischer via Gcc-patches
On Mon, 21 Nov 2022 12:24:11 +0100
Mikael Morin  wrote:

> > --- a/gcc/fortran/decl.cc
> > +++ b/gcc/fortran/decl.cc  
> (...)
> > @@ -11849,7 +11850,9 @@ gfc_match_gcc_attributes (void)
> > if (strcmp (name, ext_attr_list[id].name) == 0)
> >   break;
> >   
> > -  if (id == EXT_ATTR_LAST)
> > +  if (strcmp (name, "flatten") == 0)
> > +   known_attr0args = true; /* Handled below.  We do not need a bit.  */  
> 
> I don't see the point to have all the attributes needing a bit except 
> one that doesn't but needs a specific handling.
> What does it look like without the 1/2 patch and if one bit is also used 
> for flatten, like the other attributes?

I've changed target_clones not to use a bit locally because it's not
needed. From my understanding, we only need the bits for attributes
that change the calling convention or the caller(at least so far, but
that does make sense to me). Remember that we store these bits in the
module. Presumably because we have to make sure that a program/module
uses the correct calling convention for a module function annotated
with such an attribute (think cdecl, stdcall, fastcall, dllimport,
dllexport, or the non-implemented regparm, sseregparm) or for
attributes that otherwise influence the callers or callees (like
deprecated or no_arg_check).

Attributes like target_clones or flatten or (probably) optimize etc, do
not influence the callees, so we really do not need to store them in
the module.

Can you think of a reason to store them nevertheless?

> > +  else if (id == EXT_ATTR_LAST)
> > {
> >   gfc_error ("Unknown attribute in !GCC$ ATTRIBUTES statement at %C");
> >   return MATCH_ERROR;  
> 
> > diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
> > index 06e4c8c00a1..be650f28b62 100644
> > --- a/gcc/fortran/gfortran.texi
> > +++ b/gcc/fortran/gfortran.texi
> > @@ -3280,6 +3280,14 @@ contains
> >   end module mymod
> >   @end smallexample
> >   
> > +@node flatten
> > +
> > +Procedures annotated with the @code{flatten} attribute have their
> > +callees inlined, if possible.  
> I'm not a native speaker, but I find this sentence confusing.
> The words of the gcc manual you are refering to seem more clear: "every 
> call inside the function is inlined, if possible".

Me neither and it was a bit too brief. I've changed this to:
Every call inside a procedure annotated with the @code{flatten} attribute
is inlined, if possible.  Please refer to
@ref{Top,,Common Function Attributes,gcc,Using the GNU Compiler Collection (GCC>
for details about the respective attribute.

Is that better?

That said, i think this whole attribute section in the manual is not
structured too well. I'd prefer to have a list of attributes like in the
"Common Function Attributes" section in the extend.texi.
Maybe it would be better to just start a new list of attributes at the
end of the current @subsection ATTRIBUTES directive, a subsubsection
with "Other attributes" and just itemize the new ones? We'd point
people to the Top docs once for further details and then just briefly
list the attributes we support. Would that be acceptable?

Many thanks for your comments!

> 
> > +Please refer to
> > +@ref{Top,,Common Function Attributes,gcc,Using the GNU Compiler Collection 
> > (GCC)}
> > +for details about the respective attribute.
> > +
> >   The attributes are specified using the syntax
> >   
> >   @code{!GCC$ ATTRIBUTES} @var{attribute-list} @code{::} 
> > @var{variable-list}  
> 



Re: [PATCH 1/2] symtab: also change RTL decl name

2022-11-21 Thread Bernhard Reutner-Fischer via Gcc-patches
On Mon, 21 Nov 2022 20:02:49 +0100
Jan Hubicka  wrote:

> > Hi Honza, Ping.
> > Regtests cleanly for c,fortran,c++,ada,d,go,lto,objc,obj-c++
> > Ok?
> > I'd need this for attribute target_clones for the Fortran FE.  
> Sorry for delay here.
> > >  void
> > > @@ -303,6 +301,10 @@ symbol_table::change_decl_assembler_name (tree decl, 
> > > tree name)
> > >   warning (0, "%qD renamed after being referenced in assembly", decl);
> > >  
> > >SET_DECL_ASSEMBLER_NAME (decl, name);
> > > +  /* Set the new name in rtl.  */
> > > +  if (DECL_RTL_SET_P (decl))
> > > + XSTR (XEXP (DECL_RTL (decl), 0), 0) = IDENTIFIER_POINTER (name);  
> 
> I am not quite sure how safe this is.  We generally produce DECL_RTL
> when we produce assembly file.  So if DECL_RTL is set then we probably
> already output the original function name and it is too late to change
> it.

AFAICS we make_decl_rtl in the fortran FE in trans_function_start.

> 
> Also RTL is shared so changing it in-place is going to rewrite all the
> existing RTL expressions using it.
> 
> Why the DECL_RTL is produced for function you want to rename?

I think the fortran FE sets it quite early when lowering a function.
Later, when the ME creates the target_clones, it wants to rename the
initial function to initial_fun.default for the default target.
That's where the change_decl_assembler_name is called (only on the
decl).
But nobody changes the RTL name, so the ifunc (which should be the
initial, unchanged name) is properly emitted but
assemble_start_function uses the same, unchanged, initial fnname it
later obtains by get_fnname_from_decl which fetches the (wrong) initial
name where it should use the .default target name.
See
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605081.html

I'm open to other suggestions to make this work in a different way, of
course. Maybe we're missing some magic somewhere that might share the
name between the fndecl and the RTL XSTR so the RTL is magically
updated by that single SET_ECL_ASSEMBLER_NAME in
change_decl_assembler_name? But i didn't quite see where that'd be?

thanks,

> Honza
> > > +
> > >if (alias)
> > >   {
> > > IDENTIFIER_TRANSPARENT_ALIAS (name) = 1;  
> >   



Re: [PATCH 2/2] Fortran: add attribute target_clones

2022-11-21 Thread Mikael Morin

Hello,

Le 09/11/2022 à 20:02, Bernhard Reutner-Fischer via Fortran a écrit :

Hi!

Add support for attribute target_clones:
!GCC$ ATTRIBUTES target_clones("arch1", "arch3","default") :: mysubroutine

Bootstrapped and regtested on x86_64-unknown-linux with
--target_board=unix'{-m32,-m64}'.
OK for trunk?

gcc/fortran/ChangeLog:

* decl.cc: Include fold-const.h for size_int.
(gfc_match_gcc_attribute_args): New internal helper function.
(gfc_match_gcc_attributes): Handle target_clones.
* f95-lang.cc (struct attribute_spec): Add target and
target_clones entries.
* gfortran.h (ext_attr_id_t): Add EXT_ATTR_TARGET_CLONES.
(struct symbol_attribute): Add field ext_attr_args.
* trans-decl.cc (add_attributes_to_decl): Also add ext_attr_args
to the decl's attributes.
* gfortran.texi: Document attribute target_clones.

gcc/testsuite/ChangeLog:

* gfortran.dg/attr_target_clones-1.F90: New test.

Cc: gfortran ML 
---
  gcc/fortran/decl.cc   | 104 ++
  gcc/fortran/f95-lang.cc   |   4 +
  gcc/fortran/gfortran.h|   2 +
  gcc/fortran/gfortran.texi |  31 ++
  gcc/fortran/trans-decl.cc |   3 +
  .../gfortran.dg/attr_target_clones-1.F90  |  30 +
  6 files changed, 174 insertions(+)
  create mode 100644 gcc/testsuite/gfortran.dg/attr_target_clones-1.F90

diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index 0f9b2ced4c2..3a619dbdd34 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc

(...)

@@ -11709,6 +11710,96 @@ gfc_match_final_decl (void)
return MATCH_YES;
  }
  
+/* Internal helper to parse attribute argument list.

+   If REQUIRE_STRING is true, then require a string.
+   If ALLOW_MULTIPLE is true, allow more than one arg.
+   If multiple arguments are passed, require braces around them.
+   Returns a tree_list of arguments or NULL_TREE.  */
+static tree
+gfc_match_gcc_attribute_args (bool require_string, bool allow_multiple)
+{
+  tree attr_args = NULL_TREE, attr_arg;
+  char name[GFC_MAX_SYMBOL_LEN + 1];
+  unsigned pos = 0;
+  gfc_char_t c;
+
+  /* When we get here, we already parsed
+ !GCC$ ATTRIBUTES ATTRIBUTE_NAME
+ Now parse the arguments. These could be one of
+   "single_string_literal"
+   ( "str_literal_1" , "str_literal_2" )
+   */
+
+  gfc_gobble_whitespace ();
+
+  if (allow_multiple && gfc_match_char ('(') != MATCH_YES)
+{
+  gfc_error ("expected '(' at %C");
+  return NULL_TREE;
+}
+
+  if (require_string)
+{
+  do {
+   if (pos)
+ {
+   if (!allow_multiple)
+ {
+   gfc_error ("surplus argument at %C");
+   return NULL_TREE;
+ }
+   gfc_next_ascii_char (); /* Consume the comma.  */
+ }
+   pos = 0;
+   gfc_gobble_whitespace ();
+   unsigned char num_quotes = 0;
+   do {
+ c = gfc_next_char_literal (NONSTRING);
+ if (c == '"')
+   {
+ num_quotes++;
+ continue; /* Skip the quote */
+   }
+ name[pos++] = c;
+ if (pos >= GFC_MAX_SYMBOL_LEN)
+   {
+ gfc_error ("attribute argument truncated at %C");
+ return NULL_TREE;
+   }
+   } while (num_quotes % 2 && gfc_match_eos () != MATCH_YES);

The do-while loops are wrongly indented.
It should be:
  do
{
  ...
}
  while (...)


+   if (pos < 1)
+ {
+   gfc_error ("expected argument at %C");
+   return NULL_TREE;
+ }
+   if (num_quotes != 2)
+ {
+   gfc_error ("invalid string literal at %C");
+   return NULL_TREE;
+ }
+   name[pos] = '\0'; /* Redundant wrt build_string.  */
+   tree str = build_string (pos, name);
+   /* Compare with c-family/c-common.cc: fix_string_type.  */
+   tree i_type = build_index_type (size_int (pos));
+   tree a_type = build_array_type (char_type_node, i_type);
+   TREE_TYPE (str) = a_type;
+   TREE_READONLY (str) = 1;
+   TREE_STATIC (str) = 1;
+   attr_arg = build_tree_list (NULL_TREE, str);
+   attr_args = chainon (attr_args, attr_arg);

Same comment as for the flatten attribute:
please no tree stuff out of the trans-*.cc files.
This includes gfortran.h, so the attribute arguments need to be carried 
around using the front-end structures (gfc_actual_arglist for example).



+
+   gfc_gobble_whitespace ();
+  } while (gfc_peek_ascii_char () == ',');
+}
+
+  if (allow_multiple && gfc_match_char (')') != MATCH_YES)
+{
+  gfc_error ("expected ')' at %C");
+  return NULL_TREE;
+}
+
+  return attr_args;
+}

I'm not sure this function need to do all the parsing manually.
I would rather use gfc_match_actual_arglist, or maybe implement the 
function as a wrapper around it.
What is allowed here?  Are 

Re: [PATCH 1/2] symtab: also change RTL decl name

2022-11-21 Thread Jan Hubicka via Gcc-patches
> Hi Honza, Ping.
> Regtests cleanly for c,fortran,c++,ada,d,go,lto,objc,obj-c++
> Ok?
> I'd need this for attribute target_clones for the Fortran FE.
Sorry for delay here.
> >  void
> > @@ -303,6 +301,10 @@ symbol_table::change_decl_assembler_name (tree decl, 
> > tree name)
> > warning (0, "%qD renamed after being referenced in assembly", decl);
> >  
> >SET_DECL_ASSEMBLER_NAME (decl, name);
> > +  /* Set the new name in rtl.  */
> > +  if (DECL_RTL_SET_P (decl))
> > +   XSTR (XEXP (DECL_RTL (decl), 0), 0) = IDENTIFIER_POINTER (name);

I am not quite sure how safe this is.  We generally produce DECL_RTL
when we produce assembly file.  So if DECL_RTL is set then we probably
already output the original function name and it is too late to change
it.

Also RTL is shared so changing it in-place is going to rewrite all the
existing RTL expressions using it.

Why the DECL_RTL is produced for function you want to rename?
Honza
> > +
> >if (alias)
> > {
> >   IDENTIFIER_TRANSPARENT_ALIAS (name) = 1;
> 


[committed] libstdc++: Check static assertions earlier in chrono::duration

2022-11-21 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8--

This ensures that we fail a static assertion before giving any other
errors. Instantiating chrono::duration will now
print this before the other errors caused by it:

error: static assertion failed: period must be a specialization of ratio

libstdc++-v3/ChangeLog:

* include/bits/chrono.h (duration): Check preconditions on
template arguments before using them.
---
 libstdc++-v3/include/bits/chrono.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono.h 
b/libstdc++-v3/include/bits/chrono.h
index 05987ca09df..cabf61264d8 100644
--- a/libstdc++-v3/include/bits/chrono.h
+++ b/libstdc++-v3/include/bits/chrono.h
@@ -433,6 +433,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template
   struct duration
   {
+   static_assert(!__is_duration<_Rep>::value, "rep cannot be a duration");
+   static_assert(__is_ratio<_Period>::value,
+ "period must be a specialization of ratio");
+   static_assert(_Period::num > 0, "period must be positive");
+
   private:
template
  using __is_float = treat_as_floating_point<_Rep2>;
@@ -478,11 +483,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
using rep = _Rep;
using period = typename _Period::type;
 
-   static_assert(!__is_duration<_Rep>::value, "rep cannot be a duration");
-   static_assert(__is_ratio<_Period>::value,
- "period must be a specialization of ratio");
-   static_assert(_Period::num > 0, "period must be positive");
-
// 20.11.5.1 construction / copy / destroy
constexpr duration() = default;
 
-- 
2.38.1



[committed] libstdc++: Reduce size of std::bind_front(F) result

2022-11-21 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8--

When there are no bound arguments to a std::bind_front call we don't
need the overhead of compiling, initializing, and accessing an empty
tuple.

libstdc++-v3/ChangeLog:

* include/std/functional (_Bind_front0): New class template.
(_Bind_front_t): Use _Bind_front0 when there are no bound
arguments.
* testsuite/20_util/function_objects/bind_front/107784.cc:
New test.
---
 libstdc++-v3/include/std/functional   | 62 ++-
 .../function_objects/bind_front/107784.cc | 15 +
 2 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/function_objects/bind_front/107784.cc

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index b396e8dbbdc..22fcd04 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -995,9 +995,69 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   std::tuple<_BoundArgs...> _M_bound_args;
 };
 
+  // Avoid the overhead of an empty tuple<> if there are no bound args.
+  template
+struct _Bind_front0
+{
+  static_assert(is_move_constructible_v<_Fd>);
+
+  // First parameter is to ensure this constructor is never used
+  // instead of the copy/move constructor.
+  template
+   explicit constexpr
+   _Bind_front0(int, _Fn&& __fn)
+   noexcept(is_nothrow_constructible_v<_Fd, _Fn>)
+   : _M_fd(std::forward<_Fn>(__fn))
+   { }
+
+  _Bind_front0(const _Bind_front0&) = default;
+  _Bind_front0(_Bind_front0&&) = default;
+  _Bind_front0& operator=(const _Bind_front0&) = default;
+  _Bind_front0& operator=(_Bind_front0&&) = default;
+  ~_Bind_front0() = default;
+
+  template
+   constexpr
+   invoke_result_t<_Fd&, _CallArgs...>
+   operator()(_CallArgs&&... __call_args) &
+   noexcept(is_nothrow_invocable_v<_Fd&, _CallArgs...>)
+   { return std::invoke(_M_fd, std::forward<_CallArgs>(__call_args)...); }
+
+  template
+   constexpr
+   invoke_result_t
+   operator()(_CallArgs&&... __call_args) const &
+   noexcept(is_nothrow_invocable_v)
+   { return std::invoke(_M_fd, std::forward<_CallArgs>(__call_args)...); }
+
+  template
+   constexpr
+   invoke_result_t<_Fd, _CallArgs...>
+   operator()(_CallArgs&&... __call_args) &&
+   noexcept(is_nothrow_invocable_v<_Fd, _CallArgs...>)
+   {
+ return std::invoke(std::move(_M_fd),
+std::forward<_CallArgs>(__call_args)...);
+   }
+
+  template
+   constexpr
+   invoke_result_t
+   operator()(_CallArgs&&... __call_args) const &&
+   noexcept(is_nothrow_invocable_v)
+   {
+ return std::invoke(std::move(_M_fd),
+std::forward<_CallArgs>(__call_args)...);
+   }
+
+private:
+  _Fd _M_fd;
+};
+
   template
 using _Bind_front_t
-  = _Bind_front, decay_t<_Args>...>;
+  = __conditional_t>,
+   _Bind_front, decay_t<_Args>...>>;
 
   /** Create call wrapper by partial application of arguments to function.
*
diff --git 
a/libstdc++-v3/testsuite/20_util/function_objects/bind_front/107784.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/bind_front/107784.cc
new file mode 100644
index 000..ec255f3ee36
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/function_objects/bind_front/107784.cc
@@ -0,0 +1,15 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+#include 
+
+struct Foo
+{
+  void func() {}
+};
+
+void bar() { }
+
+// PR libstdc++/107784
+static_assert( sizeof(std::bind_front(::func)) == sizeof(::func) );
+static_assert( sizeof(std::bind_front()) == sizeof() );
-- 
2.38.1



[committed] libstdc++: Improve Doxygen comments in

2022-11-21 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8--

libstdc++-v3/ChangeLog:

* include/std/tuple: Add better Doxygen comments.
---
 libstdc++-v3/include/std/tuple | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 26e248431ec..0ac592d8d94 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -1980,6 +1980,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // three_way_comparison
 
   // NB: DR 705.
+  /// Create a tuple containing copies of the arguments
   template
 constexpr tuple::__type...>
 make_tuple(_Elements&&... __args)
@@ -1991,7 +1992,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2275. Why is forward_as_tuple not constexpr?
-  /// std::forward_as_tuple
+  /// Create a tuple of lvalue or rvalue references to the arguments
   template
 constexpr tuple<_Elements&&...>
 forward_as_tuple(_Elements&&... __args) noexcept
@@ -2018,7 +2019,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr const _Tp&&
 get(const array<_Tp, _Nm>&&) noexcept;
 
-
+  /// @cond undocumented
   template
 struct __make_tuple_impl;
 
@@ -2130,8 +2131,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __is_tuple_like_impl> : true_type
 { };
+  /// @endcond
 
-  /// tuple_cat
+  /// Create a `tuple` containing all elements from multiple tuple-like objects
   template...>::value>::type>
 constexpr auto
@@ -2146,13 +2148,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2301. Why is tie not constexpr?
-  /// tie
+  /// Return a tuple of lvalue references bound to the arguments
   template
 constexpr tuple<_Elements&...>
 tie(_Elements&... __args) noexcept
 { return tuple<_Elements&...>(__args...); }
 
-  /// swap
+  /// Exchange the values of two tuples
   template
 _GLIBCXX20_CONSTEXPR
 inline
@@ -2177,6 +2179,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // C++23
 
 #if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
+  /// Exchange the values of two const tuples (if const elements can be 
swapped)
   template
 _GLIBCXX20_CONSTEXPR
 typename enable_if...>::value>::type
@@ -2197,6 +2200,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2773. Making std::ignore constexpr
+  /** Used with `std::tie` to ignore an element of a tuple
+   *
+   * When using `std::tie` to assign the elements of a tuple to variables,
+   * unwanted elements can be ignored by using `std::ignore`. For example:
+   *
+   * ```
+   * int x, y;
+   * std::tie(x, std::ignore, y) = std::make_tuple(1, 2, 3);
+   * ```
+   *
+   * This assignment will perform `x=1; std::ignore=2; y=3;` which results
+   * in the second element being ignored.
+   *
+   * @since C++11
+   */
   _GLIBCXX17_INLINE constexpr _Swallow_assign ignore{};
 
   /// Partial specialization for tuples
-- 
2.38.1



Re: [PATCH 1/2] symtab: also change RTL decl name

2022-11-21 Thread Mikael Morin

Hello,

Le 17/11/2022 à 09:02, Bernhard Reutner-Fischer via Fortran a écrit :

Hi Honza, Ping.
Regtests cleanly for c,fortran,c++,ada,d,go,lto,objc,obj-c++
Ok?
I'd need this for attribute target_clones for the Fortran FE.
thanks,

On Wed,  9 Nov 2022 20:02:24 +0100
Bernhard Reutner-Fischer  wrote:


We were changing the ASSEMBLER_NAME of the function decl
but not the name in DECL_RTL which is used as the function name
fnname in rest_of_handle_final(). This led to using the old, wrong name
for the attribute target default function when using target_clones.

Bootstrapped and regtested cleanly on x86_64-unknown-linux
for c,c++,fortran,lto.
Ok for trunk?

gcc/ChangeLog:

* symtab.cc: Remove stray comment.
(symbol_table::change_decl_assembler_name): Also update the
name in DECL_RTL.


(patch stripped)

Is there a PR about this?  Or a testcase exhibiting the problem at least?


Re: [PATCH 3/6] libcpp: Fix paste error with unknown pragma after macro expansion

2022-11-21 Thread Jeff Law via Gcc-patches



On 11/4/22 07:44, Lewis Hyatt via Gcc-patches wrote:

In directives.cc, do_pragma() contains logic to handle a case such as the new
testcase pragma-omp-unknown.c, where an unknown pragma was the result of macro
expansion (for pragma namespaces that permit expansion). This no longer works
correctly as shown by the testcase, fixed by adding PREV_WHITE to the flags on
the second token to prevent an unwanted paste.  Also fixed the memory leak,
since the temporary tokens are pushed on their own context, nothing prevents
freeing of the buffer that holds them when the context is eventually popped.

libcpp/ChangeLog:

* directives.cc (do_pragma): Fix memory leak in token buffer.  Fix
unwanted paste between two tokens.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/pragma-omp-unknown.c: New test.


OK

jeff




Re: [PATCH] testsuite: Fix missing EFFECTIVE_TARGETS variable errors

2022-11-21 Thread Jeff Law via Gcc-patches



On 11/15/22 10:02, Maciej W. Rozycki wrote:

Permit running vector tests outside `check_vect_support_and_set_flags'
environment, removing errors such as:

ERROR: gcc.dg/analyzer/torture/pr93350.c   -O0 : can't read "EFFECTIVE_TARGETS": no such 
variable for " dg-require-effective-target 1 vect_int "

or:

ERROR: gcc.dg/bic-bitmask-13.c: error executing dg-final: can't read 
"EFFECTIVE_TARGETS": no such variable

with `mips-linux-gnu' target testing.

The EFFECTIVE_TARGETS variable has originated from commit 9b7937cf8a06
("Add support to run auto-vectorization tests for multiple effective
targets."), where arrangements have been made to run vector tests run
within `check_vect_support_and_set_flags' environment iteratively over
all the vector unit variants available in the architecture using extra
compilation flags regardless of whether the target environment arranged
for a particular testsuite run has vector support enabled by default.
So far this has been used for the MIPS target only.

Vector tests have since been added though that run outside environment
set up by `check_vect_support_and_set_flags' just using the current
compilation environment with no extra flags added.  This works for most
targets, however causes problems with the MIPS target, because outside
`check_vect_support_and_set_flags' environment the EFFECTIVE_TARGETS
variable will not have been correctly set up even if it was added to
the particular script invoking the test in question.

Fix this by using just the current compilation environment whenever a
vector feature is requested by `et-is-effective-target' in the absence
of the EFFECTIVE_TARGETS variable.  This required some modification to
individual vector feature tests, which always added the compilation
flags required for the determination of whether the given vector unit
variant can be verified with the current testsuite run (except for the
Loongson MMI variant).  Now explicit flags are only passed in setting up
EFFECTIVE_TARGETS and otherwise the current compilation environment will
determine whether such a vector test is applicable.

This changes how Loongson MMI is handled in that the `-mloongson-mmi'
flag is explicitly passed for the determination of whether this vector
unit variant can be verified, which I gather is how it was supposed to
be arranged anyway because the flag is then added for testing the
Loongson MMI variant.

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_mpaired_single): Add `args' argument and
pass it to `check_no_compiler_messages' replacing
`-mpaired-single'.
(add_options_for_mips_loongson_mmi): Add `args' argument and
pass it to `check_no_compiler_messages'.
(check_effective_target_mips_msa): Add `args' argument and pass
it to `check_no_compiler_messages' replacing `-mmsa'.
(check_effective_target_mpaired_single_runtime)
(add_options_for_mpaired_single): Pass `-mpaired-single' to
`check_effective_target_mpaired_single'.
(check_effective_target_mips_loongson_mmi_runtime)
(add_options_for_mips_loongson_mmi): Pass `-mloongson-mmi' to
`check_effective_target_mips_loongson_mmi'.
(check_effective_target_mips_msa_runtime)
(add_options_for_mips_msa): Pass `-mmsa' to
`check_effective_target_mips_msa'.
(et-is-effective-target): Verify that EFFECTIVE_TARGETS exists
and if not, just check if the current compilation environment
supports the target feature requested.
(check_vect_support_and_set_flags): Pass `-mpaired-single',
`-mloongson-mmi', and `-mmsa' to the respective target feature
checks.


OK.

jeff




Re: [PATCH] Makefile.tpl: pass CXXFLAGS_FOR_BUILD where appropriate

2022-11-21 Thread Jeff Law via Gcc-patches



On 9/9/22 04:10, Ross Burton via Gcc-patches wrote:

If CXXFLAGS contains something unsupported by the build CXX, we see
build failures (e.g. using -fmacro-prefix-map for the target). Ensure
that CXXFLAGS_FOR_BUILD is passed where appropriate so that the correct
flags are used.

ChangeLog:

 * Makefile.in: Regenerate.
 * Makefile.tpl: Add missing CXXFLAGS_FOR_BUILD overrides


OK.  Sorry for the long wait.

jeff




Re: [PATCH] 3/19 modula2 front end: gm2 driver files.

2022-11-21 Thread Gaius Mulley via Gcc-patches
Richard Biener  writes:

[snip]

> It feels like most of the above would usually be handled via lang specific
> specs rather than open-coded in the driver?  Is there a specific reason you
> opted for explicit handling here?

The last time submitting the patches I perhaps went overboard using
lang-specs - and so tried to adopt the style from C++/fortran this time
around.  I'm happy to utilize lang-specs to reduce the driver code though.

> Otherwise the patch looks generally OK, the string builds using strcat & 
> friends
> looks like it neither takes advantage of how obstacks can be used there nor
> that we're now C++ and could use std::string - but that's a matter of taste 
> and
> no objection.

ok thanks!

regards,
Gaius


Re: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-21 Thread Jeff Law via Gcc-patches



On 11/18/22 23:25, Hongyu Wang via Gcc-patches wrote:

Hi,

Followed by the discussion in pr107602, -munroll-only-small-loops
Does not turns on/off -funroll-loops, and current check in
pass_rtl_unroll_loops::gate would cause -funroll-loops do not take
effect. Revert the change about targetm.loop_unroll_adjust and apply
the backend option change to strictly follow the rule that
-funroll-loops takes full control of loop unrolling, and
munroll-only-small-loops just change its behavior to unroll small size
loops.

Bootstrapped and regtested on x86-64-pc-linux-gnu.

Ok for trunk?

gcc/ChangeLog:

PR target/107602
* common/config/i386/i386-common.cc (ix86_optimization_table):
Enable loop unroll O2, disable -fweb and -frename-registers
by default.
* config/i386/i386-options.cc
(ix86_override_options_after_change):
Disable small loop unroll when funroll-loops enabled, reset
cunroll_grow_size when it is not explicitly enabled.
(ix86_option_override_internal): Call
ix86_override_options_after_change instead of calling
ix86_recompute_optlev_based_flags and ix86_default_align
separately.
* config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
factor if -munroll-only-small-loops enabled.
* loop-init.cc (pass_rtl_unroll_loops::gate): Do not enable
loop unrolling for -O2-speed.
(pass_rtl_unroll_loops::execute): Rmove
targetm.loop_unroll_adjust check.
The reversion of the loop-init.cc changes is fine.  The x86 maintainers 
will need to chime in on the rest.  Consider installing the loop-init.cc 
reversion immediately as the current state has regressed s390 and 
potentially other targets.



jeff



Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread H.J. Lu via Gcc-patches
On Mon, Nov 21, 2022 at 3:49 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Mon, Nov 21, 2022 at 12:22:32PM +0100, Thomas Neumann via Gcc-patches 
> wrote:
> > > When dynamically linking a fast enough machine hides the latency, but when
> > > Statically linking or on slower devices this change caused a 5x increase 
> > > in
> > > Instruction count and 2x increase in cycle count before getting to main.
> > >
> > > This has been quite noticeable on smaller devices.  Is there a reason the 
> > > btree
> > > can't be initialized lazily? It seems a bit harsh to pay the cost of 
> > > unwinding at
> > > startup even when you don't throw exceptions..
> >
> > we cannot easily do that lazily because otherwise we need a mutex for lazy
> > initialization, which is exactly what we wanted to get rid of.
> >
> > Having said that, I am surprised that you saw a noticeable difference. On
> > most platforms there should not be dynamic frame registration at all, as the
> > regular frames are directly read from the ELF data.
> >
> > Can you please send me an precise description on how to reproduce the issue?
> > (Platform, tools, a VM if you have one would be great). I will then debug
> > this to improve the startup time.
>
> I can see it being called as well for -static linked binaries.
> -static links in crtbeginT.o which is libgcc/crtstuff.c built with
> CRTSTUFFT_O macro being defined among other things, and that disables
> USE_PT_GNU_EH_FRAME:
> #if defined(OBJECT_FORMAT_ELF) \
> && !defined(OBJECT_FORMAT_FLAT) \
> && defined(HAVE_LD_EH_FRAME_HDR) \
> && !defined(inhibit_libc) && !defined(CRTSTUFFT_O) \
> && defined(__GLIBC__) && __GLIBC__ >= 2
> #include 
> /* uClibc pretends to be glibc 2.2 and DT_CONFIG is defined in its link.h.
>But it doesn't use PT_GNU_EH_FRAME ELF segment currently.  */
> # if !defined(__UCLIBC__) \
>  && (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) \
>  || (__GLIBC__ == 2 && __GLIBC_MINOR__ == 2 && defined(DT_CONFIG)))
> #  define USE_PT_GNU_EH_FRAME
> # endif
> #endif
>
> I think .eh_frame_hdr was never used for statically linked programs,
> see already https://gcc.gnu.org/legacy-ml/gcc-patches/2001-12/msg01383.html
> We don't pass --eh-frame-hdr when linking statically and dl_iterate_phdr
> doesn't handle those.
> Now, if -static -Wl,--eh-frame-hdr is passed when linking to the driver,
> .eh_frame_hdr section is created and __GNU_EH_FRAME_HDR symbol points to
> the start of that section, so at least that section could be found
> if something in the crt files and libgcc is adjusted.  But e.g.
> i?86, nios2, frv and bfin we also need to find the got.  Also, would it
> work even for static PIEs?
>
> Jakub
>

There is

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54568

-- 
H.J.


Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2022-11-21 Thread Philipp Tomsich
On Sun, 20 Nov 2022 at 17:38, Jeff Law  wrote:
>
>
> On 11/9/22 16:10, Philipp Tomsich wrote:
> > The current method of treating shifts of extended values on RISC-V
> > frequently causes sequences of 3 shifts, despite the presence of the
> > 'zero_extendsidi2_shifted' pattern.
> >
> > Consider:
> >  unsigned long f(unsigned int a, unsigned long b)
> >  {
> >  a = a << 1;
> >  unsigned long c = (unsigned long) a;
> >  c = b + (c<<4);
> >  return c;
> >  }
> > which will present at combine-time as:
> >  Trying 7, 8 -> 9:
> >  7: r78:SI=r81:DI#0<<0x1
> >REG_DEAD r81:DI
> >  8: r79:DI=zero_extend(r78:SI)
> >REG_DEAD r78:SI
> >  9: r72:DI=r79:DI<<0x4
> >REG_DEAD r79:DI
> >  Failed to match this instruction:
> >  (set (reg:DI 72 [ _1 ])
> >  (and:DI (ashift:DI (reg:DI 81)
> >  (const_int 5 [0x5]))
> >   (const_int 68719476704 [0xfffe0])))
> > and produce the following (optimized) assembly:
> >  f:
> >   slliw   a5,a0,1
> >   sllia5,a5,32
> >   srlia5,a5,28
> >   add a0,a5,a1
> >   ret
> >
> > The current way of handling this (in 'zero_extendsidi2_shifted')
> > doesn't apply for two reasons:
> > - this is seen before reload, and
> > - (more importantly) the constant mask is not 0xul.
> >
> > To address this, we introduce a generalized version of shifting
> > zero-extended values that supports any mask of consecutive ones as
> > long as the number of training zeros is the inner shift-amount.
> >
> > With this new split, we generate the following assembly for the
> > aforementioned function:
> >  f:
> >   sllia0,a0,33
> >   srlia0,a0,28
> >   add a0,a0,a1
> >   ret
> >
> > Unfortunately, all of this causes some fallout (especially in how it
> > interacts with Zb* extensions and zero_extract expressions formed
> > during combine): this is addressed through additional instruction
> > splitting and handling of zero_extract.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
> >  as an and:DI.
> >   (*andi_add.uw): New pattern.
> >   (*slli_slli_uw): New pattern.
> >   (*shift_then_shNadd.uw): New pattern.
> >   (*slliuw): Rename to riscv_slli_uw.
> >   (riscv_slli_uw): Renamed from *slliuw.
> >   (*zeroextract2_highbits): New pattern.
> >   (*zero_extract): New pattern, which will be split to
> >   shift-left + shift-right.
> >   * config/riscv/predicates.md (dimode_shift_operand):
> >   * config/riscv/riscv.md (*zero_extract_lowbits):
> >   (zero_extendsidi2_shifted): Rename.
> >   (*zero_extendsidi2_shifted): Generalize.
> >   (*shift_truthvalue): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/shift-shift-6.c: New test.
> >   * gcc.target/riscv/shift-shift-7.c: New test.
> >   * gcc.target/riscv/shift-shift-8.c: New test.
> >   * gcc.target/riscv/shift-shift-9.c: New test.
> >   * gcc.target/riscv/snez.c: New test.
> >
> > Commit notes:
> > - Depends on a predicate posted in "RISC-V: Optimize branches testing
> >a bit-range or a shifted immediate".  Depending on the order of
> >applying these, I'll take care to pull that part out of the other
> >patch if needed.
> >
> > Version-changes: 2
> > - refactor
> > - optimise for additional corner cases and deal with fallout
> >
> > Version-changes: 3
> > - removed the [WIP] from the commit message (no other changes)
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> > (no changes since v1)
> >
> >   gcc/config/riscv/bitmanip.md  | 142 ++
> >   gcc/config/riscv/predicates.md|   5 +
> >   gcc/config/riscv/riscv.md |  75 +++--
> >   .../gcc.target/riscv/shift-shift-6.c  |  14 ++
> >   .../gcc.target/riscv/shift-shift-7.c  |  16 ++
> >   .../gcc.target/riscv/shift-shift-8.c  |  20 +++
> >   .../gcc.target/riscv/shift-shift-9.c  |  15 ++
> >   gcc/testsuite/gcc.target/riscv/snez.c |  14 ++
> >   8 files changed, 261 insertions(+), 40 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c
> >
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index 78fdf02c2ec..06126ac4819 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -29,7 +29,20 @@
> > [(set_attr "type" "bitmanip,load")
> >  (set_attr "mode" "DI")])
> >
> > -(define_insn "riscv_shNadd"
> > +;; We may end up forming a slli.uw 

Re: [PATCH] Remove legacy VRP (maybe?)

2022-11-21 Thread Jeff Law via Gcc-patches



On 11/21/22 09:35, Aldy Hernandez via Gcc-patches wrote:

I've been playing around with removing the legacy VRP code for the
next release.  It's a layered onion to get this right, but the first
bit is pretty straightforward and may be useful for this release.
Basically, it entails removing the old VRP pass itself, along with
value_range_equiv which have no producers left.  The current users of
value_range_equiv don't put anything in the equivalence bitmaps, so
they're basically behaving like plain value_range.

I removed as much as possible without having to change any behavior,
and this is what I came up with.  Is this something that would be
useful for this release?  Would it help release managers have less
unused cruft in the tree?

Neither Andrew nor I have any strong feelings here.  We don't foresee
the legacy code changing at all in the offseason, so we can just
accumulate these patches in local trees.


I'd lean towards removal after gcc-13 releases.


jeff



[Ping x4] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-11-21 Thread Chung-Lin Tang via Gcc-patches
Ping x4

On 2022/11/8 12:34 AM, Chung-Lin Tang wrote:
> Ping x3.
> 
> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote:
>> Ping x2.
>>
>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote:
>>> Ping.
>>>
>>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
 Hi Tom,
 I had a patch submitted earlier, where I reported that the current way of 
 implementing
 barriers in libgomp on nvptx created a quite significant performance drop 
 on some SPEChpc2021
 benchmarks:
 https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html>
 That previous patch wasn't accepted well (admittedly, it was kind of a 
 hack).
 So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.

 Basically, instead of trying to have the GPU do CPU-with-OS-like things 
 that it isn't suited for,
 barriers are implemented simplistically with bar.* synchronization 
 instructions.
 Tasks are processed after threads have joined, and only if 
 team->task_count != 0

 (arguably, there might be a little bit of performance forfeited where 
 earlier arriving threads
 could've been used to process tasks ahead of other threads. But that again 
 falls into requiring
 implementing complex futex-wait/wake like behavior. Really, that kind of 
 tasking is not what target
 offloading is usually used for)

 Implementation highlight notes:
 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" 
 in the usual manner)
 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"

 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
 The main synchronization is done using a 'bar.red' instruction. This 
 reduces across all threads
 the condition (team->task_count != 0), to enable the task processing 
 down below if any thread
 created a task. (this bar.red usage required the need of the second 
 GCC patch in this series)

 This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
 using libgomp, ovo, omptests,
 and sollve_vv testsuites, all without regressions. Also verified that the 
 SPEChpc 2021 521.miniswp_t
 and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle 
 has been restored to
 devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?

 (also suggest backporting to GCC12 branch, if performance regression can 
 be considered a defect)

 Thanks,
 Chung-Lin

 libgomp/ChangeLog:

 2022-09-21  Chung-Lin Tang  

* config/nvptx/bar.c (generation_to_barrier): Remove.
(futex_wait,futex_wake,do_spin,do_wait): Remove.
(GOMP_WAIT_H): Remove.
(#include "../linux/bar.c"): Remove.
(gomp_barrier_wait_end): New function.
(gomp_barrier_wait): Likewise.
(gomp_barrier_wait_last): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
(gomp_team_barrier_wait_cancel): Likewise.
(gomp_team_barrier_cancel): Likewise.
* config/nvptx/bar.h (gomp_team_barrier_wake): Remove
prototype, add new static inline function.
> 



Re: [PATCH 5/8] middle-end: Add cltz_complement idiom recognition

2022-11-21 Thread Andrew Carlotti via Gcc-patches
On Mon, Nov 14, 2022 at 04:10:22PM +0100, Richard Biener wrote:
> On Fri, Nov 11, 2022 at 7:53 PM Andrew Carlotti via Gcc-patches
>  wrote:
> >
> > This recognises patterns of the form:
> > while (n) { n >>= 1 }
> >
> > This patch results in improved (but still suboptimal) codegen:
> >
> > foo (unsigned int b) {
> > int c = 0;
> >
> > while (b) {
> > b >>= 1;
> > c++;
> > }
> >
> > return c;
> > }
> >
> > foo:
> > .LFB11:
> > .cfi_startproc
> > cbz w0, .L3
> > clz w1, w0
> > tst x0, 1
> > mov w0, 32
> > sub w0, w0, w1
> > cselw0, w0, wzr, ne
> > ret
> >
> > The conditional is unnecessary. phiopt could recognise a redundant csel
> > (using cond_removal_in_builtin_zero_pattern) when one of the inputs is a
> > clz call, but it cannot recognise the redunancy when the input is (e.g.)
> > (32 - clz).
> >
> > I could perhaps extend this function to recognise this pattern in a later
> > patch, if this is a good place to recognise more patterns.
> >
> > gcc/ChangeLog:
> >
+   PR tree-optimization/94793
> > * tree-scalar-evolution.cc (expression_expensive_p): Add checks
> > for c[lt]z optabs.
> > * tree-ssa-loop-niter.cc (build_cltz_expr): New.
> > (number_of_iterations_cltz_complement): New.
> > (number_of_iterations_bitcount): Add call to the above.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * lib/target-supports.exp (check_effective_target_clz)
> > (check_effective_target_clzl, check_effective_target_clzll)
> > (check_effective_target_ctz, check_effective_target_clzl)
> > (check_effective_target_ctzll): New.
> > * gcc.dg/tree-ssa/cltz-complement-max.c: New test.
> > * gcc.dg/tree-ssa/clz-complement-char.c: New test.
> > * gcc.dg/tree-ssa/clz-complement-int.c: New test.
> > * gcc.dg/tree-ssa/clz-complement-long-long.c: New test.
> > * gcc.dg/tree-ssa/clz-complement-long.c: New test.
> > * gcc.dg/tree-ssa/ctz-complement-char.c: New test.
> > * gcc.dg/tree-ssa/ctz-complement-int.c: New test.
> > * gcc.dg/tree-ssa/ctz-complement-long-long.c: New test.
> > * gcc.dg/tree-ssa/ctz-complement-long.c: New test.
> >
> >
> > --
> >
> >

[snip test diffs]

> > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > index 
> > 7e2a3e986619de87e4ae9daf16198be1f13b917c..1ac9526c69b5fe80c26022f2fa1176d222e2dfb9
> >  100644
> > --- a/gcc/tree-scalar-evolution.cc
> > +++ b/gcc/tree-scalar-evolution.cc
> > @@ -3406,12 +3406,21 @@ expression_expensive_p (tree expr, hash_map > uint64_t> ,
> >  library call for popcount when backend does not have an instruction
> >  to do so.  We consider this to be expensive and generate
> >  __builtin_popcount only when backend defines it.  */
> > +  optab optab;
> >combined_fn cfn = get_call_combined_fn (expr);
> >switch (cfn)
> > {
> > CASE_CFN_POPCOUNT:
> > + optab = popcount_optab;
> > + goto bitcount_call;
> > +   CASE_CFN_CLZ:
> > + optab = clz_optab;
> > + goto bitcount_call;
> > +   CASE_CFN_CTZ:
> > + optab = ctz_optab;
> > +bitcount_call:
> >   /* Check if opcode for popcount is available in the mode 
> > required.  */
> > - if (optab_handler (popcount_optab,
> > + if (optab_handler (optab,
> >  TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 
> > 0
> >   == CODE_FOR_nothing)
> > {
> > @@ -3424,7 +3433,7 @@ expression_expensive_p (tree expr, hash_map > uint64_t> ,
> >  instructions.  */
> >   if (is_a  (mode, _mode)
> >   && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD
> > - && (optab_handler (popcount_optab, word_mode)
> > + && (optab_handler (optab, word_mode)
> >   != CODE_FOR_nothing))
> >   break;
> >   return true;
> > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > index 
> > fece876099c1687569d6351e7d2416ea6acae5b5..16e8e25919d808cea27adbd72f0be01ae2f0e547
> >  100644
> > --- a/gcc/tree-ssa-loop-niter.cc
> > +++ b/gcc/tree-ssa-loop-niter.cc
> > @@ -2198,6 +2198,195 @@ number_of_iterations_popcount (loop_p loop, edge 
> > exit,
> >return true;
> >  }
> >
> > +/* Return an expression that counts the leading/trailing zeroes of src.  */
> 
> Can you expand the comment on how you handle defined_at_zero and how
> that relates to the C[LT]Z_DEFINED_VALUE_AT_ZERO target macros?
> The loop examples you gave above all have a defined value for zero, I'm
> not sure how you'd write a C loop which has that undefined.
> 

How about:

/* Return an expression that counts the leading/trailing zeroes of src.

   If defined_at_zero is true, then the built expression uses a conditonal
 

Re: PING^2 [PATCH] testsuite: Windows paths use \ and not /

2022-11-21 Thread Torbjorn SVENSSON via Gcc-patches

Pushed after off-list approval from Jeff Law.

On 2022-11-17 17:44, Torbjorn SVENSSON via Gcc-patches wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604896.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-11-02 19:16, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604312.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-10-25 17:15, Torbjörn SVENSSON wrote:

Without this patch, the following error is reported on Windows:

In file included from 
t:\build\arm-none-eabi\include\c++\11.3.1\string:54,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\bits\locale_classes.h:40,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\bits\ios_base.h:41,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\ios:42,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\ostream:38,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\iostream:39:
t:\build\arm-none-eabi\include\c++\11.3.1\bits\range_access.h:36:10: 
note: include 
't:\build\arm-none-eabi\include\c++\11.3.1\initializer_list' 
translated to import
arm-none-eabi-g++.exe: warning: 
.../gcc/testsuite/g++.dg/modules/pr99023_b.X: linker input file 
unused because linking not done
FAIL: g++.dg/modules/pr99023_b.X -std=c++2a  dg-regexp 6 not found: 
"[^\n]*: note: include '[^\n]*/initializer_list' translated to import\n"


gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99023_b.X: Match Windows paths too.

Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.dg/modules/pr99023_b.X | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/modules/pr99023_b.X 
b/gcc/testsuite/g++.dg/modules/pr99023_b.X

index 3d82f34868b..ca5f32e5bcc 100644
--- a/gcc/testsuite/g++.dg/modules/pr99023_b.X
+++ b/gcc/testsuite/g++.dg/modules/pr99023_b.X
@@ -3,5 +3,5 @@
  // { dg-prune-output {linker input file unused} }
-// { dg-regexp {[^\n]*: note: include '[^\n]*/initializer_list' 
translated to import\n} }
+// { dg-regexp {[^\n]*: note: include '[^\n]*[/\\]initializer_list' 
translated to import\n} }

  NO DO NOT COMPILE


Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-11-21 Thread Alexander Monakov via Gcc-patches


On Mon, 21 Nov 2022, Jeff Law wrote:

> They're writing assembly code -- in my book that means they'd better have a
> pretty good understanding of the architecture, its limitations and quirks.

That GCC ties together optimization and inline asm interface via its internal
TARGET_MODE_REP_EXTENDED macro is not a quirk of the RISC-V architecture.
It's a quirk of GCC.

Alexander


RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

2022-11-21 Thread Alexander Monakov via Gcc-patches


On Mon, 21 Nov 2022, Joshi, Tejas Sanjay wrote:

> I have addressed all your comments in the patch attached here. I have also
> used znver4-direct for avx512 insns.

Thanks.

> * This patch increased the insn-automata.cc size from 201502 to 214902.

Assuming it's the number of lines of code, I have 102847, perhaps you're
measuring without my patches? You can use 'size -A gcc/insn-automata.o'
to measure binary size growth.

> * Compile time and binary size on my machine remains same.
> * Make check and bootstrap build have no issues.
> * Spec cpu2017 also don't have any issues with this patch.
> 
> Is this ok for trunk?

I cannot approve or reject your patch, this is up to Honza who I believe
was investigating if combining this with older Zen models makes sense.
In the meantime, I see a few more issues that can be easily corrected,
please see below.

> --- /dev/null
> +++ b/gcc/config/i386/znver4.md
> +;; FSQRT
> +(define_insn_reservation "znver4_fsqrt" 22
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "fpspc")
> +(and (eq_attr "mode" "XF")
> + (eq_attr "memory" "none"
> +  "znver4-direct,znver4-fdiv*20")

This should be znver4-fdiv*10 (not *20) according to Agner Fog's measurements.

> +;; FDIV
> +(define_insn_reservation "znver4_fp_div" 15
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "fdiv")
> +(eq_attr "memory" "none")))
> +  "znver4-direct,znver4-fdiv*15")

znver4-fdiv*6 instead of *15 here and in two patterns following this one.

> +(define_insn_reservation "znver4_sse_div_pd" 13
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssediv")
> +(and (eq_attr "mode" "V4DF,V2DF,V1DF")
> + (eq_attr "memory" "none"
> +  "znver4-direct,znver4-fdiv*7")

Agner Fog's measurements indicate fdiv*5 here.

> +
> +(define_insn_reservation "znver4_sse_div_ps" 10
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssediv")
> +(and (eq_attr "mode" "V8SF,V4SF,V2SF,SF")
> + (eq_attr "memory" "none"
> +  "znver4-direct,znver4-fdiv*5")

Agner Fog's measurements indicate fdiv*3 here.

> +
> +(define_insn_reservation "znver4_sse_div_pd_load" 20
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssediv")
> +(and (eq_attr "mode" "V4DF,V2DF,V1DF")
> + (eq_attr "memory" "load"
> +  "znver4-direct,znver4-load,znver4-fdiv*7")

fdiv*5?

> +
> +(define_insn_reservation "znver4_sse_div_ps_load" 17
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssediv")
> +(and (eq_attr "mode" "V8SF,V4SF,V2SF,SF")
> + (eq_attr "memory" "load"
> +  "znver4-direct,znver4-load,znver4-fdiv*5")

fdiv*3?

> +(define_insn_reservation "znver4_sse_div_pd_evex" 13
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssediv")
> +(and (eq_attr "mode" "V8DF")
> + (eq_attr "memory" "none"
> +  "znver4-direct,znver4-fdiv*7")

This should be twice as much as the corresponding SSE/AVX instruction
(fdiv*14 or fdiv*10; Agner Fog measured 9 cycles as reciprocal throughput).

> +
> +(define_insn_reservation "znver4_sse_div_ps_evex" 10
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssediv")
> +(and (eq_attr "mode" "V16SF")
> + (eq_attr "memory" "none"
> +  "znver4-direct,znver4-fdiv*5")

Likewise (fdiv*6).

> +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssediv")
> +(and (eq_attr "mode" "V8DF")
> + (eq_attr "memory" "load"
> +  "znver4-direct,znver4-load,znver4-fdiv*7")

Likewise.

> +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssediv")
> +(and (eq_attr "mode" "V16SF")
> + (eq_attr "memory" "load"
> +  "znver4-direct,znver4-load,znver4-fdiv*5")


Re: [PATCH] RISC-V: Add RVV registers register spilling

2022-11-21 Thread Jeff Law via Gcc-patches


On 11/21/22 02:25, juzhe.zh...@rivai.ai wrote:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606523.html
This patch obviously didn't include scalable size frame.
So it ICE in offset = cfun->machine->frame.gp_sp_offset.to_constant ();
We can't directly use to_constant if the frame is a scalable.
Please fix it or revert it. Thanks


We probably just need to reject everything in 
riscv_get_setparate_components if the offset isn't constant. Something 
like the attached patch (untested) might be enough to resolve the problem.



Jeff


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7ec4ce97e6c..7bfc0e9f595 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5340,7 +5340,8 @@ riscv_get_separate_components (void)
   bitmap_clear (components);
 
   if (riscv_use_save_libcall (>machine->frame)
-  || cfun->machine->interrupt_handler_p)
+  || cfun->machine->interrupt_handler_p
+  || !cfun->machine->frame.gp_sp_offset.is_constant ())
 return components;
 
   offset = cfun->machine->frame.gp_sp_offset.to_constant ();


Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-21 Thread Qing Zhao via Gcc-patches


> On Nov 18, 2022, at 11:31 AM, Kees Cook  wrote:
> 
> On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
>> Hi, Richard,
>> 
>> Honestly, it’s very hard for me to decide what’s the best way to handle the 
>> interaction 
>> between -fstrict-flex-array=M and -Warray-bounds=N. 
>> 
>> Ideally,  -fstrict-flex-array=M should completely control the behavior of 
>> -Warray-bounds.
>> If possible, I prefer this solution.
>> 
>> However, -Warray-bounds is included in -Wall, and has been used extensively 
>> for a long time.
>> It’s not safe to change its default behavior. 
> 
> I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
> it is in -Wall is _good_ for this reason. :) No one is going to add
> -fstrict-flex-arrays (at any level) without understanding what it does
> and wanting those effects on -Warray-bounds.


The major difficulties to let -fstrict-flex-arrays controlling -Warray-bounds 
was discussed in the following threads:

https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604133.html

Please take a look at the discussion and let me know your opinion.

Thanks,

Qing

> 
> -- 
> Kees Cook



Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-21 Thread Stubbs, Andrew via Gcc-patches

On 21/11/2022 13:41, Tobias Burnus wrote:

On 19.11.22 11:46, Tobias Burnus wrote:

+   stacklimit = stackbase + seg_size*64;

(this should be '*seg_size' not 'seg_size' and the name should be
s/seg_size/seg_size_ptr/.)

I have updated the comment and ...

(Reading it, I think it should be '..._MEM(SImode,' and
'..._MULT(SImode' instead of DImode.)

Additionally, there was a problem of bytes vs. bits in:

My understanding is that
dispatch_ptr->private_segment_size == *((char*)dispatch_ptr + 192)


which is wrong - its 192 bits but only 24 bytes!

Finally, in the first_call_this_thread_p() call, I mixed up EQ vs. NE at 
one place.


BTW: It seems as if there is no problem with zero extension, if I look 
at the assembler result.


Updated version. Consists of: GCC patch adding the builtins,
the newlib patch using those (unchanged; used for testing + to be 
submitted), and

a 'test.c' using the builtins and its dump produced with amdgcn's
'cc1 -O2' to show the resulting assembly.

Tested with libgomp on gfx908 offloading and getting only the known fails:
(libgomp.c-c++-common/teams-2.c, libgomp.fortran/async_io_*.f90,
libgomp.oacc-c-c++-common/{deep-copy-10.c,static-variable-1.c,vprop.c})

OK for mainline?


OK, provided it has been tested in both stand-alone and offload modes, 
and the newlib tests too.


Andrew


Re: [PATCH 1/2] Allow subtarget customization of CC1_SPEC

2022-11-21 Thread Jeff Law via Gcc-patches



On 11/21/22 00:26, Sebastian Huber wrote:

On 20/11/2022 17:57, Jeff Law wrote:


On 10/26/22 03:34, Sebastian Huber wrote:

On 04/10/2022 11:47, Sebastian Huber wrote:

On 08/09/2022 07:33, Sebastian Huber wrote:

On 04/08/2022 15:02, Sebastian Huber wrote:

On 22/07/2022 15:02, Sebastian Huber wrote:

gcc/ChangeLog:

* gcc.cc (SUBTARGET_CC1_SPEC): Define if not defined.
(CC1_SPEC): Define to SUBTARGET_CC1_SPEC.
* config/arm/arm.h (CC1_SPEC): Remove.
* config/arc/arc.h (CC1_SPEC): Append SUBTARGET_CC1_SPEC.
* config/cris/cris.h (CC1_SPEC): Likewise.
* config/frv/frv.h (CC1_SPEC): Likewise.
* config/i386/i386.h (CC1_SPEC): Likewise.
* config/ia64/ia64.h (CC1_SPEC): Likewise.
* config/lm32/lm32.h (CC1_SPEC): Likewise.
* config/m32r/m32r.h (CC1_SPEC): Likewise.
* config/mcore/mcore.h (CC1_SPEC): Likewise.
* config/microblaze/microblaze.h: Likewise.
* config/nds32/nds32.h (CC1_SPEC): Likewise.
* config/nios2/nios2.h (CC1_SPEC): Likewise.
* config/pa/pa.h (CC1_SPEC): Likewise.
* config/rs6000/sysv4.h (CC1_SPEC): Likewise.
* config/rx/rx.h (CC1_SPEC): Likewise.
* config/sparc/sparc.h (CC1_SPEC): Likewise.


Could someone please have a look at this patch set?


Ping


Would someone mind having a look at this patch set? If there is a 
better approach to customize the default TLS model, then please let 
me know.


It would be nice if someone could review the patch before the Stage 
1 ends at November 13th.


Just a reminder.  The guidelines are a patch needs to be posted 
before the end of stage1 to make the deadline.  Review & integration 
can happen after the deadline.


I realize the idea here is to allow RTEMS to change the default TLS 
model.  But does it also happen to make it possible to solve Keith 
Packard's issues with picolibc?  See the Aug/Sep gcc-patches archives.


It looks sensible.  I assume you did a "find" to identify all the 
CC1_SPECs to change.



OK for the trunk,


Thanks for having a look at the patch. After looking at the patch 
again, I think it can be simplified to:


https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606841.html


That's fine as well.
Jeff




Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-11-21 Thread Jeff Law via Gcc-patches



On 11/21/22 06:49, Alexander Monakov wrote:

On Sun, 20 Nov 2022, Jeff Law wrote:


The concern, as far as I understand would be the case where the
assembly-sequence leaves an incompatible extension in the register.

Right.  The question in my mind is whether or not the responsibility should be
on the compiler or on the developer to ensure the ASM output is properly
extended.  If someone's writing ASMs, then to a large degree, I consider it
their responsibility to make sure things are properly extended

Right. They should also find out the hard way, with zero documentation telling
them they had to (let alone *why* they had to), and without a hypothetical
-fsanitize=abi that would catch exactly the point of the missing extension
instead of letting the program crash mysteriously at a much later point.
Uphill both ways, in a world where LLVM does not place such burden on the
programmer, and even among GCC targets only mips64 made a precedent (also
without documenting the new requirement).


They're writing assembly code -- in my book that means they'd better 
have a pretty good understanding of the architecture, its limitations 
and quirks.


 I'm happy to give them some diagnostics, guardrails, etc etc, but 
slowing down standard C code to placate someone who doesn't really know 
the architecture, but is trying to write assembly code is a step too far 
for me.






-- even more so
if having the compiler do it results in slower code independent of ASMs.

I think LLVM demonstrates well enough that a compiler can do a better job
than GCC at eliminating redundant extensions without upgrading requirements
for inline asm (in the usual C code, not for sub-word outputs of an asm).


Perhaps.  But it's also the case the LLVM and GCC have some significant 
differences in implementation.  Sometimes those differences in 
impementations have notable impacts on how code is generated.



jeff



Alexander


Re: [PATCH 10/35] arm: improve tests for vabavq*

2022-11-21 Thread Andrea Corallo via Gcc-patches
Kyrylo Tkachov  writes:

>> -Original Message-
>> From: Andrea Corallo 
>> Sent: Thursday, November 17, 2022 4:38 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov ; Richard Earnshaw
>> ; Andrea Corallo 
>> Subject: [PATCH 10/35] arm: improve tests for vabavq*
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/arm/mve/intrinsics/vabavq_p_s16.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_p_s32.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_p_s8.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_p_u16.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_p_u32.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_p_u8.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_s16.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_s32.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_s8.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_u16.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_u32.c:
>>  * gcc.target/arm/mve/intrinsics/vabavq_u8.c:
>
> Missing ChangeLog text?
> Ok with ChangeLog fixed.

Ops! sorry

Thanks

  Andrea
  


Re: [PATCH v2] tree-object-size: Support strndup and strdup

2022-11-21 Thread Siddhesh Poyarekar

On 2022-11-20 10:42, Jeff Law wrote:


On 11/4/22 06:48, Siddhesh Poyarekar wrote:

Use string length of input to strdup to determine the usable size of the
resulting object.  Avoid doing the same for strndup since there's a
chance that the input may be too large, resulting in an unnecessary
overhead or worse, the input may not be NULL terminated, resulting in a
crash where there would otherwise have been none.

gcc/ChangeLog:

* tree-object-size.cc (todo): New variable.
(object_sizes_execute): Use it.
(strdup_object_size): New function.
(call_object_size): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): New tests.
(main): Call them.
* gcc.dg/builtin-dynamic-object-size-1.c: Silence overread
warnings.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.


I'm struggling to see how the SSA updating is correct.  Yes we need to 
update the virtuals due to the introduction of the call to strlen, 
particularly when SRC is not a string constant.  But do we need to do more?


Don't we end up gimplifying the 1 + strlenfn (src) expression? Can that 
possibly create new SSA_NAMEs?  Do those need to be put into SSA form? I 
feel like I'm missing something here...


We do all of that manually in gimplify_size_expressions, the only thing 
left to do is updating virtuals AFAICT.


Thanks,
Sid


Re: [PATCH] genmultilib: Fix sanity check

2022-11-21 Thread Jeff Law via Gcc-patches



On 11/21/22 05:20, Christophe Lyon via Gcc-patches wrote:



On 11/21/22 13:17, Jakub Jelinek wrote:

On Mon, Nov 21, 2022 at 12:59:15PM +0100, Christophe Lyon wrote:

My previous patch to add a sanity check to genmultilib actually
checked the number of dirnames with the number of "sets of options"
rather than the number of options, thus breaking the build on some
targets.

To avoid duplicating once more the loop that constructs the sed
patterns, this patch checks that the current dirname/osdirname is not
empty in the existing loops.

Are there targets where:
if [ "$1" != "${opt}" ]; then
is "legally" executed with an empty $1? (and thus where this patch
would incorrectly trigger an error?)


Dunno, let's try your patch.  And if that triggers on something
valid then the next step would be just to revert the sanity checks
completely.


Agreed.


The first version (as we know) tripped on a few targets in my tester.  
I've got those restarted and we should know in a few hours if there's 
any major issues.



jeff




Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-21 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> Hi,
>
> Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600190.html
>
> Any comments are highly appreciated.
>
> BR,
> Kewen
>
> on 2022/9/28 13:41, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>> 
>> Gentle ping: 
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600190.html
>> 
>> BR,
>> Kewen
>> 
>> on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> As discussed in PR98125, -fpatchable-function-entry with
>>> SECTION_LINK_ORDER support doesn't work well on powerpc64
>>> ELFv1 because the filled "Symbol" in
>>>
>>>   .section name,"flags"o,@type,Symbol
>>>
>>> sits in .opd section instead of in the function_section
>>> like .text or named .text*.
>>>
>>> Since we already generates one label LPFE* which sits in
>>> function_section of current_function_decl, this patch is
>>> to reuse it as the symbol for the linked_to section.  It
>>> avoids the above ABI specific issue when using the symbol
>>> concluded from current_function_decl.
>>>
>>> Besides, with this support some previous workarounds for
>>> powerpc64 ELFv1 can be reverted.
>>>
>>> btw, rs6000_print_patchable_function_entry can be dropped
>>> but there is another rs6000 patch which needs this rs6000
>>> specific hook rs6000_print_patchable_function_entry, not
>>> sure which one gets landed first, so just leave it here.
>>>
>>> Bootstrapped and regtested on below:
>>>
>>>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>>>  and latest binutils 2.39.
>>>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>>>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>>>   4) x86_64-redhat-linux with default binutils 2.30
>>>  and latest binutils 2.39.
>>>   5) aarch64-linux-gnu  with default binutils 2.30
>>>  and latest binutils 2.39.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -
>>>
>>> PR target/99889
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>>> Adjust to call function default_print_patchable_function_entry.
>>> * targhooks.cc (default_print_patchable_function_entry_1): Remove and
>>> move the flags preparation ...
>>> (default_print_patchable_function_entry): ... here, adjust to use
>>> current_function_funcdef_no for label no.
>>> * targhooks.h (default_print_patchable_function_entry_1): Remove.
>>> * varasm.cc (default_elf_asm_named_section): Adjust code for
>>> __patchable_function_entries section support with LPFE label.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.dg/pr93195a.C: Remove the skip on powerpc*-*-* 64-bit.
>>> * gcc.target/aarch64/pr92424-2.c: Adjust LPFE1 with LPFE0.
>>> * gcc.target/aarch64/pr92424-3.c: Likewise.
>>> * gcc.target/i386/pr93492-2.c: Likewise.
>>> * gcc.target/i386/pr93492-3.c: Likewise.
>>> * gcc.target/i386/pr93492-4.c: Likewise.
>>> * gcc.target/i386/pr93492-5.c: Likewise.
>>> ---
>>>  gcc/config/rs6000/rs6000.cc  | 13 +-
>>>  gcc/varasm.cc| 15 ---
>>>  gcc/targhooks.cc | 45 +++-
>>>  gcc/targhooks.h  |  3 --
>>>  gcc/testsuite/g++.dg/pr93195a.C  |  1 -
>>>  gcc/testsuite/gcc.target/aarch64/pr92424-2.c |  4 +-
>>>  gcc/testsuite/gcc.target/aarch64/pr92424-3.c |  4 +-
>>>  gcc/testsuite/gcc.target/i386/pr93492-2.c|  4 +-
>>>  gcc/testsuite/gcc.target/i386/pr93492-3.c|  4 +-
>>>  gcc/testsuite/gcc.target/i386/pr93492-4.c|  4 +-
>>>  gcc/testsuite/gcc.target/i386/pr93492-5.c|  4 +-
>>>  11 files changed, 40 insertions(+), 61 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index df491bee2ea..dba28b8e647 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -14771,18 +14771,9 @@ rs6000_print_patchable_function_entry (FILE *file,
>>>unsigned HOST_WIDE_INT patch_area_size,
>>>bool record_p)
>>>  {
>>> -  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
>>> -  /* When .opd section is emitted, the function symbol
>>> - default_print_patchable_function_entry_1 is emitted into the .opd 
>>> section
>>> - while the patchable area is emitted into the function section.
>>> - Don't use SECTION_LINK_ORDER in that case.  */
>>> -  if (!(TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2)
>>> -  && HAVE_GAS_SECTION_LINK_ORDER)
>>> -flags |= SECTION_LINK_ORDER;
>>> -  default_print_patchable_function_entry_1 (file, patch_area_size, 
>>> record_p,
>>> -   flags);
>>> +  default_print_patchable_function_entry (file, patch_area_size, record_p);
>>>  }
>>> -
>>>
>>> +
>>>  enum rtx_code
>>>  rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
>>>  {
>>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>>> index 4db8506b106..d4de6e164ee 100644
>>> --- 

Re: [Patch] libgomp/gcn: fix/improve struct output (was: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling)

2022-11-21 Thread Andrew Stubbs

On 21/11/2022 13:40, Tobias Burnus wrote:

Working on the builtins, I realized that I mixed up (again) bits and byes.
While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a 
size of 128 bytes.
Thus, there is sufficient space for 16 pointer-size/uin64_t values but I 
only need 6.


This patch now makes use of the available space, avoiding one 
device-to-host memory copy;
additionally, it avoids a 32bit vs 64bit alignment issue which I somehow 
missed :-(


Tested with libgomp on gfx908 offloading and getting only the known fails:
(libgomp.c-c++-common/teams-2.c, libgomp.fortran/async_io_*.f90,
libgomp.oacc-c-c++-common/{deep-copy-10.c,static-variable-1.c,vprop.c})

OK for mainline?


OK, although why not set value64 to 16 entries, even though reverse 
offload only uses 6?


Andrew


[PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-21 Thread Stam Markianos-Wright via Gcc-patches



On 11/20/22 22:49, Ramana Radhakrishnan wrote:

On Fri, Nov 18, 2022 at 4:59 PM Kyrylo Tkachov via Gcc-patches
 wrote:




-Original Message-
From: Andrea Corallo 
Sent: Thursday, November 17, 2022 4:38 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Stam Markianos-Wright 
Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
overloading [PR107515]

From: Stam Markianos-Wright 

This patch adds explicit references to other float types
to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

gcc/ChangeLog:
 PR 107515
 * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.

Argh, I'm looking forward to when we move away from this _Generic business, but 
for now ok.
The ChangeLog should say "PR target/107515" for the git hook to recognize it 
IIRC.

and the PR is against 11.x - is there a plan to back port this and
dependent patches to relevant branches ?


Hi Ramana!


Assuming maintainer approval, we do hope to backport.

And yes, it would have to be the whole patch series, so that we carry

over all the improved testing, as well (and we'll have to run it ofc).


Does that sound Ok?

Thank you,

Stam




Ramana


Thanks,
Kyrill


---
  gcc/config/arm/arm_mve.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index fd1876b57a0..f6b42dc3fab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35582,6 +35582,9 @@ enum {
   short: __ARM_mve_type_int_n, \
   int: __ARM_mve_type_int_n, \
   long: __ARM_mve_type_int_n, \
+ _Float16: __ARM_mve_type_fp_n, \
+ __fp16: __ARM_mve_type_fp_n, \
+ float: __ARM_mve_type_fp_n, \
   double: __ARM_mve_type_fp_n, \
   long long: __ARM_mve_type_int_n, \
   unsigned char: __ARM_mve_type_int_n, \
--
2.25.1


Re: [PATCH RFC] c++: implement P1492 contracts

2022-11-21 Thread David Edelsohn via Gcc-patches
This patch broke bootstrap on AIX due to its use of strchrnul().  It is an
extension available in Linux, but not all targets.

/src/src/gcc/cp/contracts.cc:213:21: error: 'strchrnul' was not declared in
this scope; did you mean 'strchr'?
  213 |   size_t role_len = strchrnul (role, ':') - role;
  | ^
  | strchr

Would you please avoid the extension or provide an implementation?

Thanks David


Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-11-21 Thread Alexander Monakov via Gcc-patches


On Sun, 20 Nov 2022, Jeff Law wrote:

> > The concern, as far as I understand would be the case where the
> > assembly-sequence leaves an incompatible extension in the register.
> 
> Right.  The question in my mind is whether or not the responsibility should be
> on the compiler or on the developer to ensure the ASM output is properly
> extended.  If someone's writing ASMs, then to a large degree, I consider it
> their responsibility to make sure things are properly extended

Right. They should also find out the hard way, with zero documentation telling
them they had to (let alone *why* they had to), and without a hypothetical
-fsanitize=abi that would catch exactly the point of the missing extension
instead of letting the program crash mysteriously at a much later point.
Uphill both ways, in a world where LLVM does not place such burden on the
programmer, and even among GCC targets only mips64 made a precedent (also
without documenting the new requirement).

> -- even more so
> if having the compiler do it results in slower code independent of ASMs.

I think LLVM demonstrates well enough that a compiler can do a better job
than GCC at eliminating redundant extensions without upgrading requirements
for inline asm (in the usual C code, not for sub-word outputs of an asm).

Alexander


Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-21 Thread Tobias Burnus
  v5, s17
v_add_u32   v4, vcc, v3, v4
v_addc_u32  v5, vcc, 0, v5, vcc
s_mov_b64   exec, 3
flat_load_dword v6, v[4:5]
s_waitcnt   0
v_readlane_b32 s14, v6, 0
v_readlane_b32 s15, v6, 1
s_setpc_b64 s[18:19]
.L5:
; 23 "test.c" 1
;false
; 0 "" 2
s_branch.L3
.LFE1:
.size   bar, .-bar
.globl  b
.bss
.align  16
.type   b, @object
.size   b, 4
b:
.zero   4
.globl  ptr
.align  16
.type   ptr, @object
.size   ptr, 8
ptr:
.zero   8
.section.debug_frame,"",@progbits
.Lframe0:
.4byte  .LECIE0-.LSCIE0
.LSCIE0:
.4byte  0x
.byte   0x3
.string ""
.byte   0x1
.byte   0x4
.byte   0x10
.byte   0xf
.byte   0xa
.byte   0x92
.byte   0x31
.byte   0
.byte   0x8
.byte   0x20
.byte   0x24
.byte   0x92
.byte   0x30
.byte   0
.byte   0x22
.byte   0x10
.byte   0x10
.byte   0xa
.byte   0x92
.byte   0x33
.byte   0
.byte   0x8
.byte   0x20
.byte   0x24
.byte   0x92
.byte   0x32
.byte   0
.byte   0x22
.align  8
.LECIE0:
.LSFDE0:
.4byte  .LEFDE0-.LASFDE0
.LASFDE0:
.4byte  .Lframe0
.8byte  .LFB0
.8byte  .LFE0-.LFB0
.byte   0x4
.4byte  .LCFI0-.LFB0
.byte   0xae
.byte   0
.byte   0xaf
.byte   0x1
.byte   0x4
.4byte  .LCFI1-.LCFI0
.byte   0xf
.byte   0xc
.byte   0x92
.byte   0x2f
.byte   0
.byte   0x8
.byte   0x20
.byte   0x24
.byte   0x92
.byte   0x2e
.byte   0
.byte   0x22
.byte   0x38
.byte   0x1c
.align  8
.LEFDE0:
.LSFDE2:
.4byte  .LEFDE2-.LASFDE2
.LASFDE2:
.4byte  .Lframe0
.8byte  .LFB1
.8byte  .LFE1-.LFB1
.byte   0x4
.4byte  .LCFI2-.LFB1
.byte   0xae
.byte   0
.byte   0xaf
.byte   0x1
.byte   0x4
.4byte  .LCFI3-.LCFI2
.byte   0xf
.byte   0xc
.byte   0x92
.byte   0x2f
.byte   0
.byte   0x8
.byte   0x20
.byte   0x24
.byte   0x92
.byte   0x2e
.byte   0
.byte   0x22
.byte   0x38
.byte   0x1c
.align  8
.LEFDE2:
.ident  "GCC: (GNU) 13.0.0 20221121 (experimental)"


[Patch] libgomp/gcn: fix/improve struct output (was: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling)

2022-11-21 Thread Tobias Burnus

Working on the builtins, I realized that I mixed up (again) bits and byes.
While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a size of 
128 bytes.
Thus, there is sufficient space for 16 pointer-size/uin64_t values but I only 
need 6.

This patch now makes use of the available space, avoiding one device-to-host 
memory copy;
additionally, it avoids a 32bit vs 64bit alignment issue which I somehow missed 
:-(

Tested with libgomp on gfx908 offloading and getting only the known fails:
(libgomp.c-c++-common/teams-2.c, libgomp.fortran/async_io_*.f90,
libgomp.oacc-c-c++-common/{deep-copy-10.c,static-variable-1.c,vprop.c})

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp/gcn: fix/improve struct output

output.printf_data.(value union) contains text[128], which has the size
of 128 bytes, sufficient for 16 uint64_t variables; hence value_u64[2]
could be extended to value_u64[6] - sufficient for all required arguments
to gomp_target_rev.  Additionally, next_output.printf_data.(msg union)
contained msg_u64 which then is no longer needed and also caused 32bit
vs 64bit alignment issues.

libgomp/
	* config/gcn/libgomp-gcn.h (struct output):
	Remove 'msg_u64' from the union, change
	value_u64[2] to value_u64[6].
	* config/gcn/target.c (GOMP_target_ext): Update accordingly.
	* plugin/plugin-gcn.c (process_reverse_offload, console_output):
	Likewise.

 libgomp/config/gcn/libgomp-gcn.h |  7 ++-
 libgomp/config/gcn/target.c  | 12 ++--
 libgomp/plugin/plugin-gcn.c  | 17 +++--
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h
index 91560be787f..3933e846a86 100644
--- a/libgomp/config/gcn/libgomp-gcn.h
+++ b/libgomp/config/gcn/libgomp-gcn.h
@@ -37,16 +37,13 @@ struct output
   unsigned int next_output;
   struct printf_data {
 int written;
-union {
-  char msg[128];
-  uint64_t msg_u64[2];
-};
+char msg[128];
 int type;
 union {
   int64_t ivalue;
   double dvalue;
   char text[128];
-  uint64_t value_u64[2];
+  uint64_t value_u64[6];
 };
   } queue[1024];
   unsigned int consumed;
diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index 27854565d40..11ae6ec9833 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -102,12 +102,12 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   asm ("s_sleep 64");
 
   unsigned int slot = index % 1024;
-  uint64_t addrs_sizes_kind[3] = {(uint64_t) hostaddrs, (uint64_t) sizes,
-  (uint64_t) kinds};
-  data->queue[slot].msg_u64[0] = (uint64_t) fn;
-  data->queue[slot].msg_u64[1] = (uint64_t) mapnum;
-  data->queue[slot].value_u64[0] = (uint64_t) _sizes_kind[0];
-  data->queue[slot].value_u64[1] = (uint64_t) GOMP_ADDITIONAL_ICVS.device_num;
+  data->queue[slot].value_u64[0] = (uint64_t) fn;
+  data->queue[slot].value_u64[1] = (uint64_t) mapnum;
+  data->queue[slot].value_u64[2] = (uint64_t) hostaddrs;
+  data->queue[slot].value_u64[3] = (uint64_t) sizes;
+  data->queue[slot].value_u64[4] = (uint64_t) kinds;
+  data->queue[slot].value_u64[5] = (uint64_t) GOMP_ADDITIONAL_ICVS.device_num;
 
   data->queue[slot].type = 4; /* Reverse offload.  */
   __atomic_store_n (>queue[slot].written, 1, __ATOMIC_RELEASE);
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ffe5cf5af2c..388e87b7765 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1919,16 +1919,12 @@ create_kernel_dispatch (struct kernel_info *kernel, int num_teams)
 }
 
 static void
-process_reverse_offload (uint64_t fn, uint64_t mapnum, uint64_t rev_data,
-			 uint64_t dev_num64)
+process_reverse_offload (uint64_t fn, uint64_t mapnum, uint64_t hostaddrs,
+			 uint64_t sizes, uint64_t kinds, uint64_t dev_num64)
 {
   int dev_num = dev_num64;
-  uint64_t addrs_sizes_kinds[3];
-  GOMP_OFFLOAD_host2dev (dev_num, _sizes_kinds, (void *) rev_data,
-			 sizeof (addrs_sizes_kinds));
-  GOMP_PLUGIN_target_rev (fn, mapnum, addrs_sizes_kinds[0],
-			  addrs_sizes_kinds[1], addrs_sizes_kinds[2],
-			  dev_num, NULL, NULL, NULL);
+  GOMP_PLUGIN_target_rev (fn, mapnum, hostaddrs, sizes, kinds, dev_num,
+			  NULL, NULL, NULL);
 }
 
 /* Output any data written to console output from the kernel.  It is expected
@@ -1976,8 +1972,9 @@ console_output (struct kernel_info *kernel, struct kernargs *kernargs,
 	case 2: printf ("%.128s%.128s\n", data->msg, data->text); break;
 	case 3: printf ("%.128s%.128s", data->msg, data->text); break;
 	case 4:
-	  process_reverse_offload (data->msg_u64[0], data->msg_u64[1],
-   data->value_u64[0],data->value_u64[1]);
+	  process_reverse_offload (data->value_u64[0], 

Re: PING^1 [PATCH] cpp/remap: Only override if string matched

2022-11-21 Thread Torbjorn SVENSSON via Gcc-patches

Hi,

On 2022-11-20 21:31, Jeff Law wrote:


On 11/2/22 12:21, Torbjorn SVENSSON via Gcc-patches wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604062.html

Ok for trunk?


OK.  Sorry for the delay.


Thanks for the review.
Pushed.

Kind regards,
Torbjörn



jeff




Re: [PATCH v2] genmultilib: Add sanity check

2022-11-21 Thread Christophe Lyon via Gcc-patches




On 11/21/22 13:32, Mark Wielaard wrote:

Hi Christophe,

On Mon, 2022-11-21 at 13:12 +0100, Christophe Lyon wrote:

On 11/21/22 11:16, Mark Wielaard wrote:

On Fri, Nov 18, 2022 at 12:42:10PM -0700, Jeff Law wrote:

* genmultilib: Add sanity check.


OK. It should be interesting to see if it trips.


It trips up various buildbot setups:
https://builder.sourceware.org/buildbot/#/changes/13720

Error calling ../../gcc/gcc/genmultilib: Number of dirnames (3)
does not match number of options (2)
options: m64/m31
dirnames: 64 32
make[2]: *** [Makefile:2240: s-mlib] Error 1


Indeed, sorry for that.

I've just sent a fix:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606887.html

Hopefully that one is right


The buildbot gcc builds are turning green again!
https://builder.sourceware.org/buildbot/#/changes/13736



Good! I didn't imagine the feedback would be so fast :-)

BTW, I don't remember receiving an email from the buildbot after the 
breakage, does it send such emails for binutils/gdb only?



Thanks,

Mark


Re: [PATCH v2] genmultilib: Add sanity check

2022-11-21 Thread Mark Wielaard
Hi Christophe,

On Mon, 2022-11-21 at 13:12 +0100, Christophe Lyon wrote:
> On 11/21/22 11:16, Mark Wielaard wrote:
> > On Fri, Nov 18, 2022 at 12:42:10PM -0700, Jeff Law wrote:
> > > > * genmultilib: Add sanity check.
> > > 
> > > OK. It should be interesting to see if it trips.
> > 
> > It trips up various buildbot setups:
> > https://builder.sourceware.org/buildbot/#/changes/13720
> > 
> > Error calling ../../gcc/gcc/genmultilib: Number of dirnames (3)
> > does not match number of options (2)
> > options: m64/m31
> > dirnames: 64 32
> > make[2]: *** [Makefile:2240: s-mlib] Error 1
> 
> Indeed, sorry for that.
> 
> I've just sent a fix:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606887.html
> 
> Hopefully that one is right

The buildbot gcc builds are turning green again!
https://builder.sourceware.org/buildbot/#/changes/13736

Thanks,

Mark


Re: [PATCH] genmultilib: Fix sanity check

2022-11-21 Thread Christophe Lyon via Gcc-patches




On 11/21/22 13:17, Jakub Jelinek wrote:

On Mon, Nov 21, 2022 at 12:59:15PM +0100, Christophe Lyon wrote:

My previous patch to add a sanity check to genmultilib actually
checked the number of dirnames with the number of "sets of options"
rather than the number of options, thus breaking the build on some
targets.

To avoid duplicating once more the loop that constructs the sed
patterns, this patch checks that the current dirname/osdirname is not
empty in the existing loops.

Are there targets where:
if [ "$1" != "${opt}" ]; then
is "legally" executed with an empty $1? (and thus where this patch
would incorrectly trigger an error?)


Dunno, let's try your patch.  And if that triggers on something
valid then the next step would be just to revert the sanity checks
completely.


Agreed.





* genmultilib: Fix options and dirnames/osdirnames sanity
   check.


This won't get through the pre-commit hook, the second line
should be indented just by tab and nothing further.


Thanks for catching this.

Pushed.

Christophe


Jakub



Re: [PATCH 2/2] aarch64: Add support for widening LDAPR instructions

2022-11-21 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Sorry for the late reply on this. I was wondering though why the check 
> made sense. The way I see it, SI -> SI mode is either wrong or useless. 
> So why not:
> if it is wrong, error (gcc_assert?) so we know it was generated wrongly 
> somehow and fix it;
> if it is useless, still use this pattern as we avoid an extra 
> instruction (doing useless work).

The comparison doesn't lead to extra instructions.  It's a compile-time
constant, so the invalid patterns will be automatically dropped and not
result in any code.  That is, having the condition reduces the amount
of static data (fewer patterns means fewer data structures for them)
and reduces the code size (less for recog to do).

If we want an assert for invalid extensions (which might be a good thing),
it should be in target-independent code, so that it triggers regardless
of whether the target defines a (bogus) pattern for it.

Thanks,
Richard

>
> Unless, you expect the backend to be 'probing' for this and the way we 
> tell it not to is to not implement any pattern that allows for this? But 
> somehow that doesn't feel like the right approach...
>
> On 17/11/2022 11:30, Kyrylo Tkachov wrote:
>>
>>> -Original Message-
>>> From: Richard Sandiford 
>>> Sent: Tuesday, November 15, 2022 6:05 PM
>>> To: Andre Simoes Dias Vieira 
>>> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
>>> Richard Earnshaw 
>>> Subject: Re: [PATCH 2/2] aarch64: Add support for widening LDAPR
>>> instructions
>>>
>>> "Andre Vieira (lists)"  writes:
 Updated version of the patch to account for the testsuite changes in the
 first patch.

 On 10/11/2022 11:20, Andre Vieira (lists) via Gcc-patches wrote:
> Hi,
>
> This patch adds support for the widening LDAPR instructions.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu.
>
> OK for trunk?
>
> 2022-11-09  Andre Vieira  
>  Kyrylo Tkachov  
>
> gcc/ChangeLog:
>
>  * config/aarch64/atomics.md
> (*aarch64_atomic_load_rcpc_zext): New pattern.
>  (*aarch64_atomic_load_rcpc_zext): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>  * gcc.target/aarch64/ldapr-ext.c: New test.
 diff --git a/gcc/config/aarch64/atomics.md
>>> b/gcc/config/aarch64/atomics.md
 index
>>> dc5f52ee8a4b349c0d8466a16196f83604893cbb..9670bef7d8cb2b32c5146536
>>> d806a7e8bdffb2e3 100644
 --- a/gcc/config/aarch64/atomics.md
 +++ b/gcc/config/aarch64/atomics.md
 @@ -704,6 +704,28 @@
 }
   )

 +(define_insn "*aarch64_atomic_load_rcpc_zext"
 +  [(set (match_operand:GPI 0 "register_operand" "=r")
 +(zero_extend:GPI
 +  (unspec_volatile:ALLX
 +[(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
 + (match_operand:SI 2 "const_int_operand")] ;;
>>> model
 +   UNSPECV_LDAP)))]
 +  "TARGET_RCPC"
 +  "ldapr\t%0, %1"
>>> It would be good to add:
>>>
>>> > 
>>>
>>> to the condition, so that we don't provide bogus SI->SI and DI->DI
>>> extensions.  (They shouldn't be generated, but it's better not to provide
>>> them anyway.)
>>>
>> I agree. I'm pushing the attached patch to trunk.
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/atomics.md 
>> (*aarch64_atomic_load_rcpc_zext):
>>  Add mode size check to condition.
>>  (*aarch64_atomic_load_rcpc_sext): Likewise.
>>
>>> Thanks,
>>> Richard
>>>
 +)
 +
 +(define_insn "*aarch64_atomic_load_rcpc_sext"
 +  [(set (match_operand:GPI  0 "register_operand" "=r")
 +(sign_extend:GPI
 +  (unspec_volatile:ALLX
 +[(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
 + (match_operand:SI 2 "const_int_operand")] ;;
>>> model
 +   UNSPECV_LDAP)))]
 +  "TARGET_RCPC"
 +  "ldaprs\t%0, %1"
 +)
 +
   (define_insn "atomic_store"
 [(set (match_operand:ALLI 0 "aarch64_rcpc_memory_operand" "=Q,Ust")
   (unspec_volatile:ALLI
 diff --git a/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c
>>> b/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c
 new file mode 100644
 index
>>> ..aed27e06235b1d266decf11
>>> 745dacf94cc59e76d
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c
 @@ -0,0 +1,94 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -std=c99" } */
 +/* { dg-final { check-function-bodies "**" "" "" } } */
 +#include 
 +
 +#pragma GCC target "+rcpc"
 +
 +atomic_ullong u64;
 +atomic_llong s64;
 +atomic_uint u32;
 +atomic_int s32;
 +atomic_ushort u16;
 +atomic_short s16;
 +atomic_uchar u8;
 +atomic_schar s8;
 +
 +#define TEST(name, ldsize, rettype)\
 +rettype\
 +test_##name (void) 

Re: [PATCH] genmultilib: Fix sanity check

2022-11-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 21, 2022 at 12:59:15PM +0100, Christophe Lyon wrote:
> My previous patch to add a sanity check to genmultilib actually
> checked the number of dirnames with the number of "sets of options"
> rather than the number of options, thus breaking the build on some
> targets.
> 
> To avoid duplicating once more the loop that constructs the sed
> patterns, this patch checks that the current dirname/osdirname is not
> empty in the existing loops.
> 
> Are there targets where:
> if [ "$1" != "${opt}" ]; then
> is "legally" executed with an empty $1? (and thus where this patch
> would incorrectly trigger an error?)

Dunno, let's try your patch.  And if that triggers on something
valid then the next step would be just to revert the sanity checks
completely.

>   * genmultilib: Fix options and dirnames/osdirnames sanity
>   check.

This won't get through the pre-commit hook, the second line
should be indented just by tab and nothing further.

Jakub



Re: [PATCH v2] genmultilib: Add sanity check

2022-11-21 Thread Christophe Lyon via Gcc-patches




On 11/21/22 11:16, Mark Wielaard wrote:

Hi,

On Fri, Nov 18, 2022 at 12:42:10PM -0700, Jeff Law wrote:

* genmultilib: Add sanity check.


OK.� It should be interesting to see if it trips.


It trips up various buildbot setups:
https://builder.sourceware.org/buildbot/#/changes/13720

Error calling ../../gcc/gcc/genmultilib: Number of dirnames (3) does not match 
number of options (2)
options: m64/m31
dirnames: 64 32
make[2]: *** [Makefile:2240: s-mlib] Error 1

Mark


Indeed, sorry for that.

I've just sent a fix:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606887.html

Hopefully that one is right

Christophe


[PATCH] genmultilib: Fix sanity check

2022-11-21 Thread Christophe Lyon via Gcc-patches
My previous patch to add a sanity check to genmultilib actually
checked the number of dirnames with the number of "sets of options"
rather than the number of options, thus breaking the build on some
targets.

To avoid duplicating once more the loop that constructs the sed
patterns, this patch checks that the current dirname/osdirname is not
empty in the existing loops.

Are there targets where:
if [ "$1" != "${opt}" ]; then
is "legally" executed with an empty $1? (and thus where this patch
would incorrectly trigger an error?)

Sorry for the breakage. Tested on aarch64 by adding an option to
t-aarch64 and no corresponding dirname, and on x86_64.

OK for trunk?

gcc/ChangeLog:

* genmultilib: Fix options and dirnames/osdirnames sanity
  check.
---
 gcc/genmultilib | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/gcc/genmultilib b/gcc/genmultilib
index b5f372c8358..c0c271eddd6 100644
--- a/gcc/genmultilib
+++ b/gcc/genmultilib
@@ -141,20 +141,6 @@ multiarch=$9
 multilib_reuse=${10}
 enable_multilib=${11}
 
-# Sanity check: make sure we have as many dirnames as options
-if [ -n "${dirnames}" ]; then
-set x $options
-nboptions=$#
-set x $dirnames
-nbdirnames=$#
-if [ $nbdirnames -ne $nboptions ]; then
-   echo 1>&2 "Error calling $0: Number of dirnames ($nbdirnames) does not 
match number of options ($nboptions)"
-   echo 1>&2 "options: ${options}"
-   echo 1>&2 "dirnames: ${dirnames}"
-   exit 1
-fi
-fi
-
 echo "static const char *const multilib_raw[] = {"
 
 mkdir tmpmultilib.$$ || exit 1
@@ -264,6 +250,10 @@ if [ -n "${dirnames}" ]; then
 for opts in `echo ${set} | sed -e 's|/| |'g`; do
   patt="/"
   for opt in `echo ${opts} | sed -e 's_|_ _'g`; do
+   if [ -z "$1" ]; then
+ echo 1>&2 "Error calling $0: No dirname for option: $opt"
+ exit 1
+   fi
 if [ "$1" != "${opt}" ]; then
   todirnames="${todirnames} -e s|/${opt}/|/${1}/|g"
  patt="${patt}${1}/"
@@ -320,6 +310,10 @@ if [ -n "${osdirnames}" ]; then
   for opts in `echo ${set} | sed -e 's|/| |'g`; do
 patt="/"
 for opt in `echo ${opts} | sed -e 's_|_ _'g`; do
+   if [ -z "$1" ]; then
+ echo 1>&2 "Error calling $0: No osdirname for option: $opt"
+ exit 1
+   fi
   if [ "$1" != "${opt}" ]; then
 toosdirnames="${toosdirnames} -e s|/${opt}/|/${1}/|g"
patt="${patt}${1}/"
-- 
2.34.1



[PR107786 PATCH COMMITTED] RISC-V: Fix ICE in branch_shiftedarith_equals_zero

2022-11-21 Thread Philipp Tomsich
With the recent improvements to the splitting of special cases of
branch patterns on RISC-V, a dependency on an unmerged/in-discussion
change for branch-equals-zero slipped in: this allowed a non-X mode to
be presented to branch-equals-zero (where only X mode is permissible).

This addresses the issue by wrapping the ANYI operand in a paradoxical
SUBREG:X (the high bits can be safely ignored, as we we perform an
and-immediate before the branch in the pattern).

Tested against the GCC testsuite and committed as obvious.

gcc/ChangeLog:

PR target/107786
* config/riscv/riscv.md
(*branch_shiftedarith_equals_zero): Wrap ANYI
in a subreg, as our branch instructions only supports X.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr107786-2.c: New test.
* gcc.target/riscv/pr107786.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.md   |  8 
 gcc/testsuite/gcc.target/riscv/pr107786-2.c | 17 +
 gcc/testsuite/gcc.target/riscv/pr107786.c   | 17 +
 3 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr107786-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr107786.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b7bb338ac04..df57e2b0b4a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2224,12 +2224,12 @@ (define_insn_and_split 
"*branch_shiftedarith_equals_zero"
(const_int 0)])
 (label_ref (match_operand 0 "" ""))
 (pc)))
-   (clobber (match_scratch:ANYI 4 "="))]
-  "INTVAL (operands[3]) >= 0 || !partial_subreg_p (operands[2])"
+   (clobber (match_scratch:X 4 "="))]
+  "!SMALL_OPERAND (INTVAL (operands[3]))"
   "#"
   "&& reload_completed"
-  [(set (match_dup 4) (lshiftrt:ANYI (match_dup 2) (match_dup 6)))
-   (set (match_dup 4) (and:ANYI (match_dup 4) (match_dup 7)))
+  [(set (match_dup 4) (lshiftrt:X (subreg:X (match_dup 2) 0) (match_dup 6)))
+   (set (match_dup 4) (and:X (match_dup 4) (match_dup 7)))
(set (pc) (if_then_else (match_op_dup 1 [(match_dup 4) (const_int 0)])
   (label_ref (match_dup 0)) (pc)))]
 {
diff --git a/gcc/testsuite/gcc.target/riscv/pr107786-2.c 
b/gcc/testsuite/gcc.target/riscv/pr107786-2.c
new file mode 100644
index 000..ee316a67f87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr107786-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+
+int c;
+
+int main() {
+  for (;;) {
+short h = c * 100;
+if (h & 0x7ff0)
+  break;
+  }
+}
+
+/* { dg-final { scan-assembler-times "andi\t" 1 } } */
+/* { dg-final { scan-assembler-times "srli\t" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/pr107786.c 
b/gcc/testsuite/gcc.target/riscv/pr107786.c
new file mode 100644
index 000..5246ec7d338
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr107786.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+
+int c;
+
+int main() {
+  for (;;) {
+char h = c * 100;
+if (h)
+  break;
+  }
+}
+
+/* { dg-final { scan-assembler-times "andi\t" 1 } } */
+/* { dg-final { scan-assembler-not "srli\t" } } */
+
-- 
2.34.1



Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches


It's easy to reproduce on x86 as well.

As a testcase:

#include 

int main(int argc, char** argv) {
 return 0;
}

And just compile with: g++ -O1 hello.cpp -static -o hello.exe.


thanks, I will take a look.

Best

Thomas



RE: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Thomas Neumann 
> Sent: Monday, November 21, 2022 11:23 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org;
> Jason Merrill 
> Cc: Florian Weimer ; Jakub Jelinek
> ; Jonathan Wakely 
> Subject: Re: [PATCH v4] eliminate mutex in fast path of __register_frame
> 
> Hi,
> 
> > When dynamically linking a fast enough machine hides the latency, but
> > when Statically linking or on slower devices this change caused a 5x
> > increase in Instruction count and 2x increase in cycle count before getting
> to main.
> >
> > This has been quite noticeable on smaller devices.  Is there a reason
> > the btree can't be initialized lazily? It seems a bit harsh to pay the
> > cost of unwinding at startup even when you don't throw exceptions..
> 
> we cannot easily do that lazily because otherwise we need a mutex for lazy
> initialization, which is exactly what we wanted to get rid of.
> 
> Having said that, I am surprised that you saw a noticeable difference.
> On most platforms there should not be dynamic frame registration at all, as
> the regular frames are directly read from the ELF data.
> 
> Can you please send me an precise description on how to reproduce the
> issue? (Platform, tools, a VM if you have one would be great). I will then
> debug this to improve the startup time.

It's easy to reproduce on x86 as well.

As a testcase:

#include 

int main(int argc, char** argv) {
return 0;
}

And just compile with: g++ -O1 hello.cpp -static -o hello.exe.

Before this change on x86 I got:

> perf stat -r 200 ./hello.exe

 Performance counter stats for './hello.exe' (200 runs):

  0.32 msec task-clock#0.326 CPUs utilized  
  ( +-  0.34% )
 0  context-switches  #0.000 K/sec
 0  cpu-migrations#0.000 K/sec
22  page-faults   #0.070 M/sec  
  ( +-  0.13% )
   310,194  cycles#0.984 GHz
  ( +-  0.33% )
   317,310  instructions  #1.02  insn per cycle 
  ( +-  0.18% )
58,885  branches  #  186.710 M/sec  
  ( +-  0.12% )
   931  branch-misses #1.58% of all branches
  ( +-  2.57% )

0.00096799 +- 0.0374 seconds time elapsed  ( +-  0.39% )

And after this change:

> perf stat -r 200 ./hello.exe

 Performance counter stats for './hello.exe' (200 runs):

  1.03 msec task-clock#0.580 CPUs utilized  
  ( +-  0.23% )
 0  context-switches  #0.000 K/sec
 0  cpu-migrations#0.000 K/sec
27  page-faults   #0.026 M/sec  
  ( +-  0.10% )
 1,034,038  cycles#1.002 GHz
  ( +-  0.11% )
 2,485,983  instructions  #2.40  insn per cycle 
  ( +-  0.02% )
   557,567  branches  #  540.215 M/sec  
  ( +-  0.01% )
 4,843  branch-misses #0.87% of all branches
  ( +-  0.53% )

0.00178093 +- 0.0456 seconds time elapsed  ( +-  0.26% )

Regards,
Tamar
> 
> Best
> 
> Thomas



Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 21, 2022 at 12:22:32PM +0100, Thomas Neumann via Gcc-patches wrote:
> > When dynamically linking a fast enough machine hides the latency, but when
> > Statically linking or on slower devices this change caused a 5x increase in
> > Instruction count and 2x increase in cycle count before getting to main.
> > 
> > This has been quite noticeable on smaller devices.  Is there a reason the 
> > btree
> > can't be initialized lazily? It seems a bit harsh to pay the cost of 
> > unwinding at
> > startup even when you don't throw exceptions..
> 
> we cannot easily do that lazily because otherwise we need a mutex for lazy
> initialization, which is exactly what we wanted to get rid of.
> 
> Having said that, I am surprised that you saw a noticeable difference. On
> most platforms there should not be dynamic frame registration at all, as the
> regular frames are directly read from the ELF data.
> 
> Can you please send me an precise description on how to reproduce the issue?
> (Platform, tools, a VM if you have one would be great). I will then debug
> this to improve the startup time.

I can see it being called as well for -static linked binaries.
-static links in crtbeginT.o which is libgcc/crtstuff.c built with
CRTSTUFFT_O macro being defined among other things, and that disables
USE_PT_GNU_EH_FRAME:
#if defined(OBJECT_FORMAT_ELF) \
&& !defined(OBJECT_FORMAT_FLAT) \
&& defined(HAVE_LD_EH_FRAME_HDR) \
&& !defined(inhibit_libc) && !defined(CRTSTUFFT_O) \
&& defined(__GLIBC__) && __GLIBC__ >= 2
#include 
/* uClibc pretends to be glibc 2.2 and DT_CONFIG is defined in its link.h.
   But it doesn't use PT_GNU_EH_FRAME ELF segment currently.  */
# if !defined(__UCLIBC__) \
 && (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) \
 || (__GLIBC__ == 2 && __GLIBC_MINOR__ == 2 && defined(DT_CONFIG)))
#  define USE_PT_GNU_EH_FRAME
# endif
#endif

I think .eh_frame_hdr was never used for statically linked programs,
see already https://gcc.gnu.org/legacy-ml/gcc-patches/2001-12/msg01383.html
We don't pass --eh-frame-hdr when linking statically and dl_iterate_phdr
doesn't handle those.
Now, if -static -Wl,--eh-frame-hdr is passed when linking to the driver,
.eh_frame_hdr section is created and __GNU_EH_FRAME_HDR symbol points to
the start of that section, so at least that section could be found
if something in the crt files and libgcc is adjusted.  But e.g.
i?86, nios2, frv and bfin we also need to find the got.  Also, would it
work even for static PIEs?

Jakub



RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

2022-11-21 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public]

Hi,

> I think instead of (znver4-fpu)*2 there should be
> 
>   znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2
> 
> assuming the instruction occupies the same pipe on both cycles (your variant
> models as if it can move from one pipe to another).

> Also I think it's better to use znver4-direct rather than znver4-double for
> AVX512 instructions, because they are decoded as one uop, not two (it won't
> make a practical difference due to a "Fix me", but it's a simple improvement).

I have addressed all your comments in the patch attached here. I have also used 
znver4-direct for avx512 insns.
* This patch increased the insn-automata.cc size from 201502 to 214902.
* Compile time and binary size on my machine remains same.
* Make check and bootstrap build have no issues.
* Spec cpu2017 also don't have any issues with this patch.

Is this ok for trunk?

Thanks and Regards,
Tejas

gcc/ChangeLog:

* gcc/common/config/i386/i386-common.cc (processor_alias_table):
Use CPU_ZNVER4 for znver4.
* config/i386/i386.md: Add znver4.md.
* config/i386/znver4.md: New.

---
 gcc/common/config/i386/i386-common.cc |2 +-
 gcc/config/i386/i386.md   |1 +
 gcc/config/i386/znver4.md | 1027 +
 3 files changed, 1029 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/i386/znver4.md

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index f66bdd5a2af..4b01c3540e5 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2113,7 +2113,7 @@ const pta processor_alias_table[] =
   {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3,
 PTA_ZNVER3,
 M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2},
-  {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3,
+  {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4,
 PTA_ZNVER4,
 M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F},
   {"btver1", PROCESSOR_BTVER1, CPU_GENERIC,
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8081df76741..c18dfe2af9e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1312,6 +1312,7 @@
 (include "bdver3.md")
 (include "btver2.md")
 (include "znver.md")
+(include "znver4.md")
 (include "geode.md")
 (include "atom.md")
 (include "slm.md")
diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md
new file mode 100644
index 000..74b66caa38b
--- /dev/null
+++ b/gcc/config/i386/znver4.md
@@ -0,0 +1,1027 @@
+;; Copyright (C) 2012-2022 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+;;
+
+
+(define_attr "znver4_decode" "direct,vector,double"
+  (const_string "direct"))
+
+;; AMD znver4 Scheduling
+;; Modeling automatons for zen decoders, integer execution pipes,
+;; AGU pipes, branch, floating point execution and fp store units.
+(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_agu, 
znver4_bru, znver4_fpu, znver4_fp_store")
+
+;; Decoders unit has 4 decoders and all of them can decode fast path
+;; and vector type instructions.
+(define_cpu_unit "znver4-decode0" "znver4")
+(define_cpu_unit "znver4-decode1" "znver4")
+(define_cpu_unit "znver4-decode2" "znver4")
+(define_cpu_unit "znver4-decode3" "znver4")
+
+;; Currently blocking all decoders for vector path instructions as
+;; they are dispatched separetely as microcode sequence.
+(define_reservation "znver4-vector" 
"znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3")
+
+;; Direct instructions can be issued to any of the four decoders.
+(define_reservation "znver4-direct" 
"znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3")
+
+;; Fix me: Need to revisit this later to simulate fast path double behavior.
+(define_reservation "znver4-double" "znver4-direct")
+
+
+;; Integer unit 4 ALU pipes.
+(define_cpu_unit "znver4-ieu0" "znver4_ieu")
+(define_cpu_unit "znver4-ieu1" "znver4_ieu")
+(define_cpu_unit "znver4-ieu2" "znver4_ieu")
+(define_cpu_unit "znver4-ieu3" "znver4_ieu")
+(define_reservation "znver4-ieu" 
"znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3")
+
+;; 3 AGU pipes in znver4
+(define_cpu_unit "znver4-agu0" "znver4_agu")
+(define_cpu_unit "znver4-agu1" "znver4_agu")
+(define_cpu_unit "znver4-agu2" "znver4_agu")
+(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2")
+
+;; Load 

Re: [PATCH 2/2] Fortran: Add attribute flatten

2022-11-21 Thread Mikael Morin

Le 10/11/2022 à 11:20, Bernhard Reutner-Fischer via Fortran a écrit :

Bootstrapped and regtested cleanly on x86_unknown-linux.
The document bits will be rewritten for rst.
Ok for trunk if the prerequisite target_clones patch is approved?

gcc/fortran/ChangeLog:

* decl.cc (gfc_match_gcc_attributes): Handle flatten.
* f95-lang.cc (gfc_attribute_table): Add flatten.
* gfortran.texi: Document attribute flatten.

gcc/testsuite/ChangeLog:

* gfortran.dg/attr_flatten-1.f90: New test.
---
  gcc/fortran/decl.cc  |  8 +++-
  gcc/fortran/f95-lang.cc  |  2 +
  gcc/fortran/gfortran.texi|  8 
  gcc/testsuite/gfortran.dg/attr_flatten-1.f90 | 41 
  4 files changed, 57 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/attr_flatten-1.f90

diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index d312d4812b6..3d210c26eb5 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc

(...)

@@ -11849,7 +11850,9 @@ gfc_match_gcc_attributes (void)
if (strcmp (name, ext_attr_list[id].name) == 0)
  break;
  
-  if (id == EXT_ATTR_LAST)

+  if (strcmp (name, "flatten") == 0)
+   known_attr0args = true; /* Handled below.  We do not need a bit.  */


I don't see the point to have all the attributes needing a bit except 
one that doesn't but needs a specific handling.
What does it look like without the 1/2 patch and if one bit is also used 
for flatten, like the other attributes?



+  else if (id == EXT_ATTR_LAST)
{
  gfc_error ("Unknown attribute in !GCC$ ATTRIBUTES statement at %C");
  return MATCH_ERROR;



diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 06e4c8c00a1..be650f28b62 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -3280,6 +3280,14 @@ contains
  end module mymod
  @end smallexample
  
+@node flatten

+
+Procedures annotated with the @code{flatten} attribute have their
+callees inlined, if possible.

I'm not a native speaker, but I find this sentence confusing.
The words of the gcc manual you are refering to seem more clear: "every 
call inside the function is inlined, if possible".



+Please refer to
+@ref{Top,,Common Function Attributes,gcc,Using the GNU Compiler Collection 
(GCC)}
+for details about the respective attribute.
+
  The attributes are specified using the syntax
  
  @code{!GCC$ ATTRIBUTES} @var{attribute-list} @code{::} @var{variable-list}




Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches

Hi,


When dynamically linking a fast enough machine hides the latency, but when
Statically linking or on slower devices this change caused a 5x increase in
Instruction count and 2x increase in cycle count before getting to main.

This has been quite noticeable on smaller devices.  Is there a reason the btree
can't be initialized lazily? It seems a bit harsh to pay the cost of unwinding 
at
startup even when you don't throw exceptions..


we cannot easily do that lazily because otherwise we need a mutex for 
lazy initialization, which is exactly what we wanted to get rid of.


Having said that, I am surprised that you saw a noticeable difference. 
On most platforms there should not be dynamic frame registration at all, 
as the regular frames are directly read from the ELF data.


Can you please send me an precise description on how to reproduce the 
issue? (Platform, tools, a VM if you have one would be great). I will 
then debug this to improve the startup time.


Best

Thomas



RE: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Tamar Christina via Gcc-patches
Hi,

I couldn't find a Bugzilla account for you Thomas,

I've bisected a slowdown in startup speed for C++ to this change,
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107675 

When dynamically linking a fast enough machine hides the latency, but when
Statically linking or on slower devices this change caused a 5x increase in
Instruction count and 2x increase in cycle count before getting to main.

This has been quite noticeable on smaller devices.  Is there a reason the btree
can't be initialized lazily? It seems a bit harsh to pay the cost of unwinding 
at
startup even when you don't throw exceptions..

Cheers,
Tamar

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Thomas
> Neumann via Gcc-patches
> Sent: Friday, September 16, 2022 11:20 AM
> To: gcc-patches@gcc.gnu.org; Jason Merrill 
> Cc: Florian Weimer ; Jakub Jelinek
> ; Jonathan Wakely 
> Subject: [PATCH v4] eliminate mutex in fast path of __register_frame
> 
> The __register_frame/__deregister_frame functions are used to register
> unwinding frames from JITed code in a sorted list. That list itself is 
> protected
> by object_mutex, which leads to terrible performance in multi-threaded
> code and is somewhat expensive even if single-threaded.
> There was already a fast-path that avoided taking the mutex if no frame was
> registered at all.
> 
> This commit eliminates both the mutex and the sorted list from the atomic
> fast path, and replaces it with a btree that uses optimistic lock coupling 
> during
> lookup. This allows for fully parallel unwinding and is essential to scale
> exception handling to large core counts.
> 
> Changes since v3:
> - Avoid code duplication by adding query mode to classify_object_over_fdes
> - Adjust all comments as requested
> 
> libgcc/ChangeLog:
> 
>  * unwind-dw2-fde.c (release_registered_frames): Cleanup at
> shutdown.
>  (__register_frame_info_table_bases): Use btree in atomic fast path.
>  (__deregister_frame_info_bases): Likewise.
>  (_Unwind_Find_FDE): Likewise.
>  (base_from_object): Make parameter const.
>  (classify_object_over_fdes): Add query-only mode.
>  (get_pc_range): Compute PC range for lookup.
>  * unwind-dw2-fde.h (last_fde): Make parameter const.
>  * unwind-dw2-btree.h: New file.
> ---
>   libgcc/unwind-dw2-btree.h | 953
> ++
>   libgcc/unwind-dw2-fde.c   | 195 ++--
>   libgcc/unwind-dw2-fde.h   |   2 +-
>   3 files changed, 1098 insertions(+), 52 deletions(-)
>   create mode 100644 libgcc/unwind-dw2-btree.h
> 
> diff --git a/libgcc/unwind-dw2-btree.h b/libgcc/unwind-dw2-btree.h new file
> mode 100644 index 000..8853f0eab48
> --- /dev/null
> +++ b/libgcc/unwind-dw2-btree.h
> @@ -0,0 +1,953 @@
> +/* Lock-free btree for manually registered unwind frames.  */
> +/* Copyright (C) 2022 Free Software Foundation, Inc.
> +   Contributed by Thomas Neumann
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and a
> +copy of the GCC Runtime Library Exception along with this program; see
> +the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +.  */
> +
> +#ifndef GCC_UNWIND_DW2_BTREE_H
> +#define GCC_UNWIND_DW2_BTREE_H
> +
> +#include 
> +
> +// Common logic for version locks.
> +struct version_lock
> +{
> +  // The lock itself. The lowest bit indicates an exclusive lock,
> +  // the second bit indicates waiting threads. All other bits are
> +  // used as counter to recognize changes.
> +  // Overflows are okay here, we must only prevent overflow to the
> +  // same value within one lock_optimistic/validate
> +  // range. Even on 32 bit platforms that would require 1 billion
> +  // frame registrations within the time span of a few assembler
> +  // instructions.
> +  uintptr_t version_lock;
> +};
> +
> +#ifdef __GTHREAD_HAS_COND
> +// We should never get contention within the tree as it rarely changes.
> +// But if we ever do get contention we use these for waiting.
> +static __gthread_mutex_t version_lock_mutex =
> __GTHREAD_MUTEX_INIT;
> +static __gthread_cond_t version_lock_cond = __GTHREAD_COND_INIT;
> #endif
> +
> +// Initialize in locked state.
> +static inline void
> 

Re: [PATCH 1/2] Fortran: Cleanup struct ext_attr_t

2022-11-21 Thread Mikael Morin

Hello,

Le 10/11/2022 à 11:20, Bernhard Reutner-Fischer via Fortran a écrit :

Tiny cleanup opportunity since we now have ext_attr_args in
struct symbol_attribute.
Bootstrapped and regtested on x86_64-unknown-linux with no new
regressions.
Ok for trunk if the prerequisite was approved ([PATCH 2/2] Fortran: add
attribute target_clones) ?

gcc/fortran/ChangeLog:

* gfortran.h (struct ext_attr_t): Remove middle_end_name.
* trans-decl.cc (add_attributes_to_decl): Move building
tree_list to ...
* decl.cc (gfc_match_gcc_attributes): ... here. Add the attribute to
the tree_list for the middle end.

I prefer to not do any middle-end stuff at parsing time, so I would 
rather not do this change.

Not OK.


Re: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-21 Thread Stam Markianos-Wright via Gcc-patches



On 11/18/22 16:58, Kyrylo Tkachov wrote:



-Original Message-
From: Andrea Corallo 
Sent: Thursday, November 17, 2022 4:38 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Stam Markianos-Wright 
Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
overloading [PR107515]

From: Stam Markianos-Wright 

This patch adds explicit references to other float types
to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

gcc/ChangeLog:
 PR 107515
 * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.

Argh, I'm looking forward to when we move away from this _Generic business, but 
for now ok.

Oh we all are ;)

The ChangeLog should say "PR target/107515" for the git hook to recognize it 
IIRC.


Agh, thanks for spotting this! Will change and push it with the rest of 
the patch series when ready/


Thank you,

Stam



Thanks,
Kyrill


---
  gcc/config/arm/arm_mve.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index fd1876b57a0..f6b42dc3fab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35582,6 +35582,9 @@ enum {
   short: __ARM_mve_type_int_n, \
   int: __ARM_mve_type_int_n, \
   long: __ARM_mve_type_int_n, \
+ _Float16: __ARM_mve_type_fp_n, \
+ __fp16: __ARM_mve_type_fp_n, \
+ float: __ARM_mve_type_fp_n, \
   double: __ARM_mve_type_fp_n, \
   long long: __ARM_mve_type_int_n, \
   unsigned char: __ARM_mve_type_int_n, \
--
2.25.1


Re: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic

2022-11-21 Thread Stam Markianos-Wright via Gcc-patches



On 11/18/22 16:49, Kyrylo Tkachov wrote:



-Original Message-
From: Andrea Corallo 
Sent: Thursday, November 17, 2022 4:38 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Stam Markianos-Wright 
Subject: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n
intrinsic

From: Stam Markianos-Wright 

It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic
resolution would fall back to the `__ARM_undef` failure state.

This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79`
and
`6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't
identified, because the tests were not checking for this kind of failure.

The above commits changed the definitions of the intrinsics from using
`[u]int[8/16/32]_t` types for the scalar argument to using `int`. This
allowed `int` to be supported in user code through the overloaded
`#defines`, but seems to have broken the `[u]int[8/16/32]_t` types

The solution implemented by this patch is to explicitly use a new
_Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this
change, both `int` and `[u]int[8/16/32]_t` parameters are supported from
user code and are handled by the overloading mechanism correctly.

gcc/ChangeLog:

 * config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types.
 (__arm_vaddq_m_n_s32): Likewise.
 (__arm_vaddq_m_n_s16): Likewise.
 (__arm_vaddq_m_n_u8): Likewise.
 (__arm_vaddq_m_n_u32): Likewise.
 (__arm_vaddq_m_n_u16): Likewise.
 (__arm_vaddq_m): Fix Overloading.
 (__ARM_mve_coerce3): New.

Ok. Wasn't there a PR in Bugzilla about this that we can cite in the commit 
message?
Thanks,
Kyrill


Thanks for the review! Ah yes, there was this one:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96795

which was closed last time around.
It does make sense to add it, though, so we'll do that.

Thanks!




---
  gcc/config/arm/arm_mve.h | 78 
  1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 684f997520f..951dc25374b 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive,
uint16x8_t __a, uint16x8_t __b, mve_pr

  __extension__ extern __inline int8x16_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline int32x4_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline int16x8_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline uint8x16_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline uint32x4_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
__b, mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline uint16x8_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
__b, mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
  }
@@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive,
uint16x8_t __a, uint16x8_t __b, mve_pred16

  __extension__ extern __inline int8x16_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
mve_pred16_t __p)
  {
   return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p);
  

[PATCH] maintainer-scripts: Add gdc to update_web_docs_git

2022-11-21 Thread Iain Buclaw via Gcc-patches
Hi,

This patch adds gdc to maintainer-scripts/update_web_docs_git so that
it's built and uploaded to gcc.gnu.org/onlinedocs.

One half of re-adding the gdc docs that were taken down after the revert
to the Sphinx conversion.

OK?

Regards,
Iain.

---
PR web/107749

maintainer-scripts/ChangeLog:

* update_web_docs_git: Add gdc to MANUALS.
---
 maintainer-scripts/update_web_docs_git | 1 +
 1 file changed, 1 insertion(+)

diff --git a/maintainer-scripts/update_web_docs_git 
b/maintainer-scripts/update_web_docs_git
index 6c38e213562..dee9b1d3b5e 100755
--- a/maintainer-scripts/update_web_docs_git
+++ b/maintainer-scripts/update_web_docs_git
@@ -21,6 +21,7 @@ MANUALS="cpp
   gccgo
   gccint
   gcj
+  gdc
   gfortran
   gfc-internals
   gnat_ugn
-- 
2.37.2



Re: [PATCH v2] genmultilib: Add sanity check

2022-11-21 Thread Mark Wielaard
Hi,

On Fri, Nov 18, 2022 at 12:42:10PM -0700, Jeff Law wrote:
> > * genmultilib: Add sanity check.
> 
> OK.  It should be interesting to see if it trips.

It trips up various buildbot setups:
https://builder.sourceware.org/buildbot/#/changes/13720

Error calling ../../gcc/gcc/genmultilib: Number of dirnames (3) does not match 
number of options (2)
options: m64/m31
dirnames: 64 32
make[2]: *** [Makefile:2240: s-mlib] Error 1

Mark


[COMMITTED] ada: Order pragmas alphabetically in reference manual

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

gcc/ada/

* doc/gnat_rm/implementation_defined_pragmas.rst: Restore
alphabetical ordering.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../implementation_defined_pragmas.rst| 106 ++---
 gcc/ada/gnat_rm.texi  | 148 +-
 gcc/ada/gnat_ugn.texi |   2 +-
 3 files changed, 128 insertions(+), 128 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst 
b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
index 7e5fb700691..ed42d087a25 100644
--- a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
@@ -1438,20 +1438,6 @@ This pragma applies only to protected types and 
specifies the floor
 deadline inherited by a task when the task enters a protected object.
 It is effective only when the EDF scheduling policy is used.
 
-.. _Pragma-Default_Initial_Condition:
-
-Pragma Default_Initial_Condition
-
-
-Syntax:
-
-.. code-block:: ada
-
-  pragma Default_Initial_Condition [ (null | boolean_EXPRESSION) ];
-
-For the semantics of this pragma, see the entry for aspect
-``Default_Initial_Condition`` in the SPARK 2014 Reference Manual, section 
7.3.3.
-
 Pragma Debug
 
 
@@ -1495,6 +1481,20 @@ This pragma is equivalent to a corresponding 
``Check_Policy`` pragma
 with a first argument of ``Debug``. It is retained for historical
 compatibility reasons.
 
+.. _Pragma-Default_Initial_Condition:
+
+Pragma Default_Initial_Condition
+
+
+Syntax:
+
+.. code-block:: ada
+
+  pragma Default_Initial_Condition [ (null | boolean_EXPRESSION) ];
+
+For the semantics of this pragma, see the entry for aspect
+``Default_Initial_Condition`` in the SPARK 2014 Reference Manual, section 
7.3.3.
+
 Pragma Default_Scalar_Storage_Order
 ===
 .. index:: Default_Scalar_Storage_Order
@@ -4640,6 +4640,22 @@ even though
 RM 8.3 (15) stipulates that an overridden operation is not visible within the
 declaration of the overriding operation.
 
+.. _Pragma-Part_Of:
+
+Pragma Part_Of
+==
+
+Syntax:
+
+.. code-block:: ada
+
+  pragma Part_Of (ABSTRACT_STATE);
+
+  ABSTRACT_STATE ::= NAME
+
+For the semantics of this pragma, see the entry for aspect ``Part_Of`` in the
+SPARK 2014 Reference Manual, section 7.2.6.
+
 Pragma Partition_Elaboration_Policy
 ===
 
@@ -4657,22 +4673,6 @@ This pragma is standard in Ada 2005, but is available in 
all earlier
 versions of Ada as an implementation-defined pragma.
 See Ada 2012 Reference Manual for details.
 
-.. _Pragma-Part_Of:
-
-Pragma Part_Of
-==
-
-Syntax:
-
-.. code-block:: ada
-
-  pragma Part_Of (ABSTRACT_STATE);
-
-  ABSTRACT_STATE ::= NAME
-
-For the semantics of this pragma, see the entry for aspect ``Part_Of`` in the
-SPARK 2014 Reference Manual, section 7.2.6.
-
 Pragma Passive
 ==
 
@@ -7297,29 +7297,6 @@ Note that in addition to the checks defined in the Ada 
RM, GNAT recognizes a
 number of implementation-defined check names. See the description of pragma
 ``Suppress`` for full details.
 
-Pragma Use_VADS_Size
-
-.. index:: Size, VADS compatibility
-
-.. index:: Rational profile
-
-
-Syntax:
-
-
-.. code-block:: ada
-
-  pragma Use_VADS_Size;
-
-
-This is a configuration pragma.  In a unit to which it applies, any use
-of the 'Size attribute is automatically interpreted as a use of the
-'VADS_Size attribute.  Note that this may result in incorrect semantic
-processing of valid Ada 95 or Ada 2005 programs.  This is intended to aid in
-the handling of existing code which depends on the interpretation of Size
-as implemented in the VADS compiler.  See description of the VADS_Size
-attribute for further details.
-
 .. _Pragma-Unused:
 
 Pragma Unused
@@ -7354,6 +7331,29 @@ are typically to be used in cases where such warnings 
are expected.
 Thus it is never necessary to use ``pragma Unused`` for such
 variables, though it is harmless to do so.
 
+Pragma Use_VADS_Size
+
+.. index:: Size, VADS compatibility
+
+.. index:: Rational profile
+
+
+Syntax:
+
+
+.. code-block:: ada
+
+  pragma Use_VADS_Size;
+
+
+This is a configuration pragma.  In a unit to which it applies, any use
+of the 'Size attribute is automatically interpreted as a use of the
+'VADS_Size attribute.  Note that this may result in incorrect semantic
+processing of valid Ada 95 or Ada 2005 programs.  This is intended to aid in
+the handling of existing code which depends on the interpretation of Size
+as implemented in the VADS compiler.  See description of the VADS_Size
+attribute for further details.
+
 Pragma Validity_Checks
 ==
 
diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
index 

[COMMITTED] ada: Internal compiler error for Sequential Partition_Elaboration_Policy

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Steve Baird 

In some cases, compilation of a function with a limited class-wide result
type could fail with an internal error if a Sequential
Partition_Elaboration_Policy is specified. To prevent this, we want specifying
a Sequential Partition_Elaboration_Policy to have the side effect of
imposing a No_Task_Hierarchy restriction. But doing that in a straightforward
way leads to problems with incorrectly accepting violations of H.6(6). So
a new restriction, No_Task_Hierarchy_Implicit, is introduced.

gcc/ada/

* libgnat/s-rident.ads: Define a new restriction,
No_Task_Hierarchy_Implicit. This is like the No_Task_Hierarchy
restriction, but with the difference that setting this restriction
does not mean the H.6(6) post-compilation check is satisified.
* exp_ch6.adb (Add_Task_Actuals_To_Build_In_Place_Call): If it is
known that the function result cannot have tasks, then pass in a
null literal for the activation chain actual parameter. This
avoids generating a reference to an entity that
Build_Activation_Chain_Entity may have chosen not to generate a
declaration for.
* gnatbind.adb (List_Applicable_Restrictions): Do not list the
No_Task_Hierarchy_Implicit restriction.
* restrict.adb: Special treatment for the
No_Task_Hierarchy_Implicit restriction in functions
Get_Restriction_Id and Restriction_Active. The former is needed to
disallow the (unlikely) case that a user tries to explicitly
reference the No_Task_Hierarchy_Implicit restriction.
* sem_prag.adb (Analyze_Pragma): If a Sequential
Partition_Elaboration_Policy is specified (and the
No_Task_Hierarchy restriction is not already enabled), then enable
the No_Task_Hierarchy_Implicit restriction.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch6.adb  |  5 -
 gcc/ada/gnatbind.adb |  3 +++
 gcc/ada/libgnat/s-rident.ads |  5 +++--
 gcc/ada/restrict.adb | 12 ++--
 gcc/ada/sem_prag.adb | 19 +++
 5 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index 4cdd98649c8..a5dee38c55f 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -662,7 +662,10 @@ package body Exp_Ch6 is
 
   --  Create the actual which is a pointer to the current activation chain
 
-  if No (Chain) then
+  if Restriction_Active (No_Task_Hierarchy) then
+ Chain_Actual := Make_Null (Loc);
+
+  elsif No (Chain) then
  Chain_Actual :=
Make_Attribute_Reference (Loc,
  Prefix => Make_Identifier (Loc, Name_uChain),
diff --git a/gcc/ada/gnatbind.adb b/gcc/ada/gnatbind.adb
index 475702a755e..509b4d368a8 100644
--- a/gcc/ada/gnatbind.adb
+++ b/gcc/ada/gnatbind.adb
@@ -215,6 +215,9 @@ procedure Gnatbind is
  No_Specification_Of_Aspect  => False,
  --  Requires a parameter value, not a count
 
+ No_Task_Hierarchy_Implicit  => False,
+ --  A compiler implementation artifact, not a documented restriction
+
  No_Use_Of_Attribute => False,
  --  Requires a parameter value, not a count
 
diff --git a/gcc/ada/libgnat/s-rident.ads b/gcc/ada/libgnat/s-rident.ads
index 9d652a4cc3e..1c6f2e7156e 100644
--- a/gcc/ada/libgnat/s-rident.ads
+++ b/gcc/ada/libgnat/s-rident.ads
@@ -107,7 +107,7 @@ package System.Rident is
   No_Dispatching_Calls,  -- GNAT
   No_Dynamic_Accessibility_Checks,   -- GNAT
   No_Dynamic_Attachment, -- Ada 2012 (RM E.7(10/3))
-  No_Dynamic_CPU_Assignment, -- Ada 202x (RM D.7(10/3))
+  No_Dynamic_CPU_Assignment, -- Ada 2022 (RM D.7(10/3))
   No_Dynamic_Priorities, -- (RM D.9(9))
   No_Enumeration_Maps,   -- GNAT
   No_Entry_Calls_In_Elaboration_Code,-- GNAT
@@ -152,8 +152,9 @@ package System.Rident is
   No_Task_Attributes_Package,-- GNAT
   No_Task_At_Interrupt_Priority, -- GNAT
   No_Task_Hierarchy, -- (RM D.7(3), H.4(3))
+  No_Task_Hierarchy_Implicit,-- GNAT
   No_Task_Termination,   -- Ada 2005 (D.7(15.1/2))
-  No_Tasks_Unassigned_To_CPU,-- Ada 202x (D.7(10.10/4))
+  No_Tasks_Unassigned_To_CPU,-- Ada 2022 (D.7(10.10/4))
   No_Tasking,-- GNAT
   No_Terminate_Alternatives, -- (RM D.7(6))
   No_Unchecked_Access,   -- (RM H.4(18))
diff --git a/gcc/ada/restrict.adb b/gcc/ada/restrict.adb
index 9ef923b186c..9965321c75e 100644
--- a/gcc/ada/restrict.adb
+++ b/gcc/ada/restrict.adb
@@ -897,7 +897,10 @@ package body Restrict is
  declare
 S : constant String := 

[COMMITTED] ada: Small cleanup in Expand_N_Object_Declaration

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This reuses a local constant more consistently, removes a duplicate of this
local constant, renames local variables, alphabetizes declarations, makes a
few consistency tweaks and adjusts a couple of comments.

No functional changes.

gcc/ada/

* exp_ch3.adb (Expand_N_Object_Declaration): Use Typ local
constant throughout, remove Ret_Obj_Typ local constant, rename
Ref_Type into Acc_Typ in a couple of places, remove a useless call
to Set_Etype, use a consistent checks suppression scheme, adjust
comments for the sake of consistencty and alphabetize some local
declarations.
* exp_ch6.adb (Expand_Simple_Function_Return): Remove a couple of
redundant local constants.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 94 ++---
 gcc/ada/exp_ch6.adb |  8 ++--
 2 files changed, 49 insertions(+), 53 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 90f01ca2747..7b194bb9816 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -7758,7 +7758,7 @@ package body Exp_Ch3 is
 if Validity_Checks_On
   and then Comes_From_Source (N)
   and then Validity_Check_Copies
-  and then not Is_Generic_Type (Etype (Def_Id))
+  and then not Is_Generic_Type (Typ)
 then
Ensure_Valid (Expr);
if Safe_To_Capture_Value (N, Def_Id) then
@@ -7876,7 +7876,7 @@ package body Exp_Ch3 is
   end if;
 
   if Nkind (Obj_Def) = N_Access_Definition
-and then not Is_Local_Anonymous_Access (Etype (Def_Id))
+and then not Is_Local_Anonymous_Access (Typ)
   then
  --  An Ada 2012 stand-alone object of an anonymous access type
 
@@ -7988,16 +7988,17 @@ package body Exp_Ch3 is
 
   --if BIPalloc = 1 then
   --   Rxx := BIPaccess;
+  --   Rxx.all := ;
   --elsif BIPalloc = 2 then
-  --   Rxx := new [storage_pool =
+  --   Rxx := new '()[storage_pool =
   -- system__secondary_stack__ss_pool][procedure_to_call =
   -- system__secondary_stack__ss_allocate];
   --elsif BIPalloc = 3 then
-  --   Rxx := new 
+  --   Rxx := new '()
   --elsif BIPalloc = 4 then
   --   Pxx : system__storage_pools__root_storage_pool renames
   -- BIPstoragepool.all;
-  --   Rxx := new [storage_pool =
+  --   Rxx := new '()[storage_pool =
   -- Pxx][procedure_to_call =
   -- system__storage_pools__allocate_any];
   --else
@@ -8005,15 +8006,12 @@ package body Exp_Ch3 is
   --end if;
 
   --Result : T renames Rxx.all;
-  --Result := ;
 
   --  in the unconstrained case.
 
   if Is_Build_In_Place_Return_Object (Def_Id) then
  declare
-Func_Id : constant Entity_Id :=
-  Return_Applies_To (Scope (Def_Id));
-Ret_Obj_Typ : constant Entity_Id := Etype (Def_Id);
+Func_Id : constant Entity_Id := Return_Applies_To (Scope (Def_Id));
 
 Init_Stmt   : Node_Id;
 Obj_Acc_Formal  : Entity_Id;
@@ -8043,9 +8041,9 @@ package body Exp_Ch3 is
 if Present (Expr_Q)
   and then not Is_Delayed_Aggregate (Expr_Q)
   and then not No_Initialization (N)
-  and then not Is_Interface (Etype (Def_Id))
+  and then not Is_Interface (Typ)
 then
-   if Is_Class_Wide_Type (Etype (Def_Id))
+   if Is_Class_Wide_Type (Typ)
  and then not Is_Class_Wide_Type (Etype (Expr_Q))
then
   Init_Stmt :=
@@ -8054,7 +8052,7 @@ package body Exp_Ch3 is
   Expression =>
 Make_Type_Conversion (Loc,
   Subtype_Mark =>
-New_Occurrence_Of (Etype (Def_Id), Loc),
+New_Occurrence_Of (Typ, Loc),
   Expression   => New_Copy_Tree (Expr_Q)));
 
else
@@ -8087,12 +8085,12 @@ package body Exp_Ch3 is
 if Needs_BIP_Alloc_Form (Func_Id) then
declare
   Desig_Typ : constant Entity_Id :=
-(if Ekind (Ret_Obj_Typ) = E_Array_Subtype
- then Etype (Func_Id) else Ret_Obj_Typ);
+(if Ekind (Typ) = E_Array_Subtype
+ then Etype (Func_Id) else Typ);
   --  Ensure that the we use a fat pointer when allocating
   --  an unconstrained array on the heap. In this case the
-  --  result object type is a constrained array type even
-  --  though the function type is unconstrained.
+  --  result object's type is a constrained array type even
+  --  

[COMMITTED] ada: Adjust recent change for returns involving function calls

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) : Revert
latest change.
* gcc-interface/trans.cc (gnat_to_gnu) :
Tweak latest change.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc  | 11 ---
 gcc/ada/gcc-interface/trans.cc | 11 ---
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index e25ce498f2c..c383f9b218a 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -637,17 +637,6 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
   break;
 
 case E_Constant:
-  /* If this is a constant related to a return in a function returning by
-invisible reference without expression, get the return object.  */
-  if (Is_Related_To_Func_Return (gnat_entity)
- && current_function_decl
- && TREE_ADDRESSABLE (TREE_TYPE (current_function_decl))
- && !gnu_expr)
-   {
- gnu_decl = DECL_RESULT (current_function_decl);
- break;
-   }
-
   /* Ignore constant definitions already marked with the error node.  See
 the N_Object_Declaration case of gnat_to_gnu for the rationale.  */
   if (definition
diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index d0ff741585e..1cd621a9377 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -6482,9 +6482,10 @@ gnat_to_gnu (Node_Id gnat_node)
 
 then elide the temporary by forwarding the return object to Func:
 
+  result_type *Rnn = (result_type *) ;
   * = Func (); [return slot optimization]
   [...]
-  return ;
+  return Rnn;
 
 That's necessary if the result type needs finalization because the
 temporary would never be adjusted as Expand_Simple_Function_Return
@@ -6496,8 +6497,12 @@ gnat_to_gnu (Node_Id gnat_node)
  && current_function_decl
  && TREE_ADDRESSABLE (TREE_TYPE (current_function_decl)))
{
- gnu_result = gnat_to_gnu_entity (gnat_temp, NULL_TREE, true);
- gnu_result = build_unary_op (INDIRECT_REF, NULL_TREE, gnu_result);
+ gnat_to_gnu_entity (gnat_temp,
+ DECL_RESULT (current_function_decl),
+ true);
+ gnu_result
+   = build_unary_op (INDIRECT_REF, NULL_TREE,
+ DECL_RESULT (current_function_decl));
  gnu_result
= Call_to_gnu (Prefix (Expression (gnat_node)),
   _result_type, gnu_result,
-- 
2.34.1



[COMMITTED] ada: Move warnings switches

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Bob Duff 

This patch moves warning switches from Opt into Warnsw, fixes some minor
discrepancies, and cleans up the code.

No change in behavior.

gcc/ada/

* warnsw.ads, warnsw.adb: Move warning flags here from package
Opt. Rename Warning_Record to be Warnings_State. Use an array
instead of a record; this simplifies the code. Add renamings of
all the array components for easy reference outside this package.
Pass the "Family" to Set_Warning_Switch. Use more table-driven
code. Misc cleanup and comment fixes.
* opt.ads: Move warning switches to Warnsw.
* gnat1drv.adb
(Adjust_Global_Switches): Expanded names needed.
* inline.ads: Rename Warning_Record to be Warnings_State.
* sem_ch12.adb: Likewise.
* sem_prag.adb: Use new Set_Warning_Switch.
* contracts.adb, errout.adb, exp_aggr.adb, exp_ch11.adb: Adjust
imports for move to Warnsw.
* exp_ch5.adb, exp_prag.adb, exp_util.adb, frontend.adb: Likewise.
* layout.adb, lib-xref.adb, restrict.adb, scn.adb, sem_aggr.adb:
Likewise.
* sem_attr.adb, sem_case.adb, sem_ch10.adb, sem_ch11.adb:
Likewise.
* sem_ch13.adb, sem_ch3.adb, sem_ch4.adb, sem_ch5.adb: Likewise.
* sem_ch6.adb, sem_ch7.adb, sem_ch8.adb, sem_elab.adb: Likewise.
* sem_eval.adb, sem_res.adb, sem_util.adb, sem_warn.adb: Likewise.
* switch-c.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/contracts.adb |   1 +
 gcc/ada/errout.adb|   9 +-
 gcc/ada/exp_aggr.adb  |   1 +
 gcc/ada/exp_ch11.adb  |   1 +
 gcc/ada/exp_ch5.adb   |   1 +
 gcc/ada/exp_prag.adb  |   1 +
 gcc/ada/exp_util.adb  |   3 +-
 gcc/ada/frontend.adb  |   1 +
 gcc/ada/gnat1drv.adb  |   7 +-
 gcc/ada/inline.ads|   2 +-
 gcc/ada/layout.adb|   1 +
 gcc/ada/lib-xref.adb  |   1 +
 gcc/ada/opt.ads   | 271 
 gcc/ada/restrict.adb  |   1 +
 gcc/ada/scn.adb   |   2 +-
 gcc/ada/sem_aggr.adb  |   1 +
 gcc/ada/sem_attr.adb  |   1 +
 gcc/ada/sem_case.adb  |   1 +
 gcc/ada/sem_ch10.adb  |   1 +
 gcc/ada/sem_ch11.adb  |   1 +
 gcc/ada/sem_ch12.adb  |   8 +-
 gcc/ada/sem_ch13.adb  |   2 +-
 gcc/ada/sem_ch3.adb   |   1 +
 gcc/ada/sem_ch4.adb   |   1 +
 gcc/ada/sem_ch5.adb   |   1 +
 gcc/ada/sem_ch6.adb   |   2 +-
 gcc/ada/sem_ch7.adb   |   1 +
 gcc/ada/sem_ch8.adb   |   1 +
 gcc/ada/sem_elab.adb  |   1 +
 gcc/ada/sem_eval.adb  |   1 +
 gcc/ada/sem_prag.adb  |   4 +-
 gcc/ada/sem_res.adb   |   1 +
 gcc/ada/sem_util.adb  |   1 +
 gcc/ada/sem_warn.adb  |   3 +-
 gcc/ada/switch-c.adb  |   6 +-
 gcc/ada/warnsw.adb| 944 +++---
 gcc/ada/warnsw.ads| 527 ++-
 37 files changed, 700 insertions(+), 1113 deletions(-)

diff --git a/gcc/ada/contracts.adb b/gcc/ada/contracts.adb
index 218fd66852f..fef3d24870f 100644
--- a/gcc/ada/contracts.adb
+++ b/gcc/ada/contracts.adb
@@ -59,6 +59,7 @@ with Snames; use Snames;
 with Stand;  use Stand;
 with Stringt;use Stringt;
 with Tbuild; use Tbuild;
+with Warnsw; use Warnsw;
 
 package body Contracts is
 
diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
index dcd21778db3..afa30674fa3 100644
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -53,8 +53,7 @@ with Stand;  use Stand;
 with Stylesw;use Stylesw;
 with System.OS_Lib;
 with Uname;  use Uname;
-with Warnsw; pragma Unreferenced (Warnsw);
---  Will be referenced when various flags are moved to Warnsw.
+with Warnsw; pragma Unreferenced (Warnsw); -- disable spurious warning
 
 package body Errout is
 
@@ -989,14 +988,14 @@ package body Errout is
  --  after fixing the error, the use clause no longer looks like it was
  --  unused.
 
- Check_Unreferenced := False;
- Check_Unreferenced_Formals := False;
+ Warnsw.Check_Unreferenced := False;
+ Warnsw.Check_Unreferenced_Formals := False;
   end Handle_Serious_Error;
 
--  Start of processing for Error_Msg_Internal
 
begin
-  --  Detect common mistake of prefixing or suffing the message with a
+  --  Detect common mistake of prefixing or suffixing the message with a
   --  space character.
 
   pragma Assert (Msg (Msg'First) /= ' ' and then Msg (Msg'Last) /= ' ');
diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index dde49d1e289..3f51ed6b457 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -71,6 +71,7 @@ with Stringt;use Stringt;
 with Tbuild; use Tbuild;
 with Uintp;  use Uintp;
 with Urealp; use Urealp;
+with Warnsw; use Warnsw;
 
 package body Exp_Aggr is
 
diff --git a/gcc/ada/exp_ch11.adb b/gcc/ada/exp_ch11.adb
index 98ce886c71c..5b83035ebd8 100644
--- a/gcc/ada/exp_ch11.adb
+++ b/gcc/ada/exp_ch11.adb
@@ -53,6 +53,7 @@ with Stringt;use Stringt;
 with Targparm;   use Targparm;
 with Tbuild; use 

Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant

2022-11-21 Thread Philipp Tomsich
On Mon, 21 Nov 2022 at 04:11, Kito Cheng  wrote:
>
> > @@ -464,6 +464,60 @@
> >[(set_attr "type" "arith")
> > (set_attr "mode" "DI")])
> >
> > +(define_expand "add3"
> > +  [(set (match_operand:GPR   0 "register_operand"  "=r,r")
> > +   (plus:GPR (match_operand:GPR 1 "register_operand"  " r,r")
> > + (match_operand:GPR 2 "addi_operand"  " r,I")))]
>
> Is it possible to just define a predicate that accepts
> register_operand and CONST_INT_P,
> and then handle all cases in add3 pattern?


Great suggestion.

>
> My point is put all check in one place:
>
> e.g.
> check TARGET_ZBA && const_arith_shifted123_operand (operands[2],
> mode) in add3
> rather than check TARGET_ZBA in addi_operand and use sh[123]add in
> add3 without check.
>
> and that also means we need to sync addi_opearnad and add3
> once we have extension XX could improve addi codegen.
>
>
> > +  ""
> > +{
> > +  if (arith_operand (operands[2], mode))
> > +emit_insn (gen_riscv_add3 (operands[0], operands[1], 
> > operands[2]));
> > +  else if (const_arith_2simm12_operand (operands[2], mode))
>
> const_arith_2simm12_operand only used once, could you inline the condition 
> here?


If we handle all cases in a single pattern, we'll punt this to riscv.cc anyway.
So let's see how the code looks once we have a single predicate and do
the inlining there...

>
> > +{
> > +  /* Split into two immediates that add up to the desired value:
> > +   * e.g., break up "a + 2445" into:
> > +   * addi  a0,a0,2047
> > +   *addi   a0,a0,398
> > +   */
> > +
> > +  HOST_WIDE_INT val = INTVAL (operands[2]);
> > +  HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1);
> > +
> > +  if (val >= 0)
> > +saturated = ~saturated;
> > +
> > +  val -= saturated;
> > +
> > +  rtx tmp = gen_reg_rtx (mode);
> > +  emit_insn (gen_riscv_add3 (tmp, operands[1], GEN_INT 
> > (saturated)));
> > +  emit_insn (gen_riscv_add3 (operands[0], tmp, GEN_INT (val)));
> > +}
> > +  else if (mode == word_mode
> > +  && const_arith_shifted123_operand (operands[2], mode))
>
> Same for const_arith_shifted123_operand.
>
> > +{
> > +  /* Use a sh[123]add and an immediate shifted down by 1, 2, or 3. */
> > +
> > +  HOST_WIDE_INT val = INTVAL (operands[2]);
> > +  int shamt = ctz_hwi (val);
> > +
> > +  if (shamt > 3)
> > +   shamt = 3;
> > +
> > +  rtx tmp = gen_reg_rtx (mode);
> > +  emit_insn (gen_rtx_SET (tmp, GEN_INT (val >> shamt)));
> > +
> > +  /* We don't use gen_riscv_shNadd here, as it will only exist for
> > +.  Instead we build up its canonical form directly.  */
> > +  rtx shifted_imm = gen_rtx_ASHIFT (mode, tmp, GEN_INT (shamt));
> > +  rtx shNadd = gen_rtx_PLUS (mode, shifted_imm, operands[1]);
> > +  emit_insn (gen_rtx_SET (operands[0], shNadd));
> > +}
> > +  else
> > +FAIL;
>
> Seems add3 FAIL will cause problems, we need either add something like:
>
>   operands[2] = force_reg (mode, operands[2]);
>   emit_insn (gen_rtx_SET (operands[0],
>  gen_rtx_PLUS (mode,
>operands[1], operands[2])));
>
> Or just gcc_unreachable () if we keep using addi_operand to guard this 
> pattern.


This is a case for "gcc_unreachable ();".
The change will be in v2.

Thanks,
Philipp.


[COMMITTED] ada: Disable subprogram call validation in CodePeer mode

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Ghjuvan Lacambre 

CodePeer builds with assertions enabled started failing when this
validation was introduced. We temporarily disable this validation for
CodePeer in order to buy time before fixing the underlying issue.

gcc/ada/

* frontend.adb (Frontend): Disable subprogram call validation.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/frontend.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/frontend.adb b/gcc/ada/frontend.adb
index 033ecf3b7be..bc3da30b0cf 100644
--- a/gcc/ada/frontend.adb
+++ b/gcc/ada/frontend.adb
@@ -531,7 +531,7 @@ begin
--  formals). It is invoked using pragma Debug to avoid adding any cost
--  when the compiler is built with assertions disabled.
 
-   if not Debug_Flag_Underscore_XX then
+   if not Debug_Flag_Underscore_XX and then not CodePeer_Mode then
   pragma Debug (Exp_Ch6.Validate_Subprogram_Calls (Cunit (Main_Unit)));
end if;
 
-- 
2.34.1



[COMMITTED] ada: Minor tweak in assertion

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

For an array subtype, being definite is the same as being constrained.

gcc/ada/

* sem_util.adb (Needs_Secondary_Stack): Test Is_Constrained
directly instead of Is_Definite_Subtype for an array subtype.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 67baf7abfad..f331b4b78ba 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -23634,7 +23634,7 @@ package body Sem_Util is
   --  Unconstrained array type
 
   else
- pragma Assert (Is_Array_Type (Typ) and not Is_Definite_Subtype (Typ));
+ pragma Assert (Is_Array_Type (Typ) and then not Is_Constrained (Typ));
  return True;
   end if;
end Needs_Secondary_Stack;
-- 
2.34.1



[COMMITTED] ada: Do not share Packed Array Type if sizes of types differ

2022-11-21 Thread Marc Poulhiès via Gcc-patches
If a subtype has a Size attribute value different than the size of its
ancestor, then the Packed Array Type can't be shared and a new one must
be created.

gcc/ada/

* exp_pakd.adb (Create_Packed_Array_Impl_Type): Do not share PAT
if sizes of types differ.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_pakd.adb | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/exp_pakd.adb b/gcc/ada/exp_pakd.adb
index d4a62ace9c2..29735c07a88 100644
--- a/gcc/ada/exp_pakd.adb
+++ b/gcc/ada/exp_pakd.adb
@@ -671,11 +671,11 @@ package body Exp_Pakd is
  return;
   end if;
 
-  --  If our immediate ancestor subtype is constrained, and it already
-  --  has a packed array type, then just share the same type, since the
-  --  bounds must be the same. If the ancestor is not an array type but
-  --  a private type, as can happen with multiple instantiations, create
-  --  a new packed type, to avoid privacy issues.
+  --  If our immediate ancestor subtype is constrained, and it already has
+  --  a packed array type, and it has the same size, then just share the
+  --  same type, since the bounds must be the same. If the ancestor is not
+  --  an array type but a private type, as can happen with multiple
+  --  instantiations, create a new packed type, to avoid privacy issues.
 
   if Ekind (Typ) = E_Array_Subtype then
  Ancest := Ancestor_Subtype (Typ);
@@ -684,6 +684,9 @@ package body Exp_Pakd is
and then Is_Array_Type (Ancest)
and then Is_Constrained (Ancest)
and then Present (Packed_Array_Impl_Type (Ancest))
+   and then Known_Esize (Typ)
+   and then Known_Esize (Ancest)
+   and then Esize (Typ) = Esize (Ancest)
  then
 Set_Packed_Array_Impl_Type (Typ, Packed_Array_Impl_Type (Ancest));
 return;
-- 
2.34.1



[COMMITTED] ada: Reject nonconfirming Size attribute value for aliased object

2022-11-21 Thread Marc Poulhiès via Gcc-patches
Only confirming Size must be supported for aliased object of elementary
type (see RM 13.1 in the "Implementation Advice").

   -- size is 1-byte
   type Y is range 0 .. 20;
   type Ay is access all Y;

   -- Var size is 8-bytes
   Var : aliased Y := 5 with Size => 64;

   --  JP.all is a 1-byte reference to an 8-bytes objects.
   JP : Ay := Var'Access;

The above JP.all references the first byte of the 8-byte Var object,
which is, for example, not correct on little-endian systems.

This change rejects nonconfirming Size attribute on such objects
instead of miscompiling it.

gcc/ada/

* sem_ch13.adb (Check_One_Attr): produce error when Size attribute
used on aliased object of elementary types with nonconfirming
value.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 5507353136b..bf84a10ded6 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -7310,6 +7310,21 @@ package body Sem_Ch13 is
  Set_Esize (U_Ent, Size);
   end if;
 
+  --  As of RM 13.1, only confirming size
+  --  (i.e. (Size = Esize (Etyp))) for aliased object of
+  --  elementary type must be supported.
+  --  GNAT rejects nonconfirming size for such object.
+
+  if Is_Aliased (U_Ent)
+and then Is_Elementary_Type (Etyp)
+and then Known_Esize (U_Ent)
+and then Size /= Esize (Etyp)
+  then
+ Error_Msg_N
+   ("nonconfirming Size for aliased object is not "
+& "supported", N);
+  end if;
+
   Set_Has_Size_Clause (U_Ent);
end;
 end if;
-- 
2.34.1



[COMMITTED] ada: Improve documentation for -gnatw.h warnings

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Steve Baird 

The -gnatw.h option enables warnings about "gaps" in record layout
specifications. In the case of a "partial" layout specification, where the
locations of some components are left unspecified, the resulting warnings
may be incomplete or incorrect. Document this implementation limitation.

gcc/ada/

* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Improve
the description of how the -gnatw.h switch interacts with
"partial" record layout specifications (i.e., specifications where
the locations of some components are left unspecified).
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../building_executable_programs_with_gnat.rst|  9 +++--
 gcc/ada/gnat_ugn.texi | 11 ---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst 
b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
index 87fb1087e42..fe0b567c2b9 100644
--- a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
+++ b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
@@ -3221,8 +3221,13 @@ of the pragma in the :title:`GNAT_Reference_manual`).
 
   This switch activates warnings on component clauses in record
   representation clauses that leave holes (gaps) in the record layout.
-  If this warning option is active, then record representation clauses
-  should specify a contiguous layout, adding unused fill fields if needed.
+  If a record representation clause does not specify a location for
+  every component of the record type, then the warnings generated (or not
+  generated) are unspecified. For example, there may be gaps for which
+  either no warning is generated or a warning is generated that
+  incorrectly describes the location of the gap. This undesirable situation
+  can sometimes be avoided by adding (and specifying the location for) unused
+  fill fields.
 
 
 .. index:: -gnatw.H  (gcc)
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 2f43b4f71c8..12eb494d40d 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-GNAT User's Guide for Native Platforms , Nov 14, 2022
+GNAT User's Guide for Native Platforms , Nov 18, 2022
 
 AdaCore
 
@@ -11305,8 +11305,13 @@ This switch suppresses warnings on hiding declarations.
 
 This switch activates warnings on component clauses in record
 representation clauses that leave holes (gaps) in the record layout.
-If this warning option is active, then record representation clauses
-should specify a contiguous layout, adding unused fill fields if needed.
+If a record representation clause does not specify a location for
+every component of the record type, then the warnings generated (or not
+generated) are unspecified. For example, there may be gaps for which
+either no warning is generated or a warning is generated that
+incorrectly describes the location of the gap. This undesirable situation
+can sometimes be avoided by adding (and specifying the location for) unused
+fill fields.
 @end table
 
 @geindex -gnatw.H (gcc)
-- 
2.34.1



[COMMITTED] ada: Ada 2022 Image attribute bugs

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Steve Baird 

Two issues. First, the two procedures
Ada.Strings.Text_Buffers.Output_Mapping.[Wide_]Wide_Put each correctly
call Encode, but that call was missing from the corresponding Put procedure.
Second, if a record type contains an array-valued Data component as well as
both a Max_Length and Current_Length component, then the slice
Data (Current_Length + 1 .. Max_Length) should usually be treated like
uninitialized data. It should not participate in things like equality
comparisons. In particular, it should not participate in 'Image results.
To accomplish this, such a type usually ought to have a Put_Image aspect
specification. This Put_Image aspect specification was missing for the
three Super_String types declared in the
Ada.Strings.[Wide_[Wide_]]Superbounded packages.

gcc/ada/
* libgnat/a-sttebu.adb (Put): Add missing call to Encode.
* libgnat/a-strsup.ads: Declare new Put_Image procedure and add
Put_Image aspect specification for type Super_String.
* libgnat/a-strsup.adb (Put_Image): New procedure.
* libgnat/a-stwisu.ads: Declare new Put_Image procedure and add
Put_Image aspect specification for type Super_String.
* libgnat/a-stwisu.adb (Put_Image): New procedure.
* libgnat/a-stzsup.ads: Declare new Put_Image procedure and add
Put_Image aspect specification for type Super_String.
* libgnat/a-stzsup.adb (Put_Image): New procedure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-strsup.adb | 11 +++
 gcc/ada/libgnat/a-strsup.ads |  8 +++-
 gcc/ada/libgnat/a-sttebu.adb |  3 ++-
 gcc/ada/libgnat/a-stwisu.adb | 11 +++
 gcc/ada/libgnat/a-stwisu.ads |  8 +++-
 gcc/ada/libgnat/a-stzsup.adb | 11 +++
 gcc/ada/libgnat/a-stzsup.ads |  8 +++-
 7 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/libgnat/a-strsup.adb b/gcc/ada/libgnat/a-strsup.adb
index 831a18e1e19..0210b45c4c7 100644
--- a/gcc/ada/libgnat/a-strsup.adb
+++ b/gcc/ada/libgnat/a-strsup.adb
@@ -303,6 +303,17 @@ package body Ada.Strings.Superbounded with SPARK_Mode is
   return Left <= Super_To_String (Right);
end Less_Or_Equal;
 
+   ---
+   -- Put_Image --
+   ---
+
+   procedure Put_Image
+ (S  : in out Ada.Strings.Text_Buffers.Root_Buffer_Type'Class;
+  Source : Super_String) is
+   begin
+  String'Put_Image (S, Super_To_String (Source));
+   end Put_Image;
+
--
-- Set_Super_String --
--
diff --git a/gcc/ada/libgnat/a-strsup.ads b/gcc/ada/libgnat/a-strsup.ads
index 416fa7bb06a..600f097c2bf 100644
--- a/gcc/ada/libgnat/a-strsup.ads
+++ b/gcc/ada/libgnat/a-strsup.ads
@@ -49,6 +49,7 @@ pragma Assertion_Policy (Pre=> Ignore,
 
 with Ada.Strings.Maps; use type Ada.Strings.Maps.Character_Mapping_Function;
 with Ada.Strings.Search;
+with Ada.Strings.Text_Buffers;
 
 package Ada.Strings.Superbounded with SPARK_Mode is
pragma Preelaborate;
@@ -69,7 +70,8 @@ package Ada.Strings.Superbounded with SPARK_Mode is
with
  Predicate =>
Current_Length <= Max_Length
- and then Data (1 .. Current_Length)'Initialized;
+ and then Data (1 .. Current_Length)'Initialized,
+ Put_Image => Put_Image;
 
--  The subprograms defined for Super_String are similar to those
--  defined for Bounded_String, except that they have different names, so
@@ -2695,6 +2697,10 @@ package Ada.Strings.Superbounded with SPARK_Mode is
- (Item.Max_Length - K) mod Super_Length (Item,
  Global => null;
 
+   procedure Put_Image
+ (S  : in out Ada.Strings.Text_Buffers.Root_Buffer_Type'Class;
+  Source : Super_String);
+
 private
   --  Pragma Inline declarations
 
diff --git a/gcc/ada/libgnat/a-sttebu.adb b/gcc/ada/libgnat/a-sttebu.adb
index d992fee9f04..acca2923443 100644
--- a/gcc/ada/libgnat/a-sttebu.adb
+++ b/gcc/ada/libgnat/a-sttebu.adb
@@ -29,6 +29,7 @@
 --  --
 --
 
+with Ada.Strings.UTF_Encoding.Strings;
 with Ada.Strings.UTF_Encoding.Wide_Strings;
 with Ada.Strings.UTF_Encoding.Wide_Wide_Strings;
 
@@ -59,7 +60,7 @@ package body Ada.Strings.Text_Buffers is
 
   procedure Put (Buffer : in out Buffer_Type; Item : String) is
   begin
- Put_UTF_8 (Buffer, Item);
+ Put_UTF_8 (Buffer, UTF_Encoding.Strings.Encode (Item));
   end Put;
 
   procedure Wide_Put (Buffer : in out Buffer_Type; Item : Wide_String) is
diff --git a/gcc/ada/libgnat/a-stwisu.adb b/gcc/ada/libgnat/a-stwisu.adb
index d325676edf9..cf27ca9f190 100644
--- a/gcc/ada/libgnat/a-stwisu.adb
+++ b/gcc/ada/libgnat/a-stwisu.adb
@@ -297,6 +297,17 @@ package body Ada.Strings.Wide_Superbounded is
   return Left <= Right.Data (1 .. Right.Current_Length);
end Less_Or_Equal;
 
+  

[COMMITTED] ada: Fix gnatmake's parsing of adc files

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

Before this patch, gnatmake's parser for adc files failed to ignore
semicolons located inside comments. This patch fixes that behavior.

gcc/ada/

* sfn_scan.adb (Scan_SFN_Pragmas): Improve handling of comments.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sfn_scan.adb | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ada/sfn_scan.adb b/gcc/ada/sfn_scan.adb
index b428ed7f0cf..8ee9a0968f3 100644
--- a/gcc/ada/sfn_scan.adb
+++ b/gcc/ada/sfn_scan.adb
@@ -583,6 +583,7 @@ package body SFN_Scan is
 
  else
 Skip_Loop : loop
+   Skip_WS;
exit Main_Scan_Loop when At_EOF;
exit Skip_Loop when S (P) = ';';
 
-- 
2.34.1



[COMMITTED] ada: Tweak error messages on misplaced with keywords

2022-11-21 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

Before this patch, with clauses placed in declarative sections were
interpreted by the compiler as incorrect aspect specifications, which
led to confusing error messages.

This patch makes it so more syntax errors involving the with keyword
are diagnosed as intended with clauses instead of aspect
specifications.

gcc/ada/

* par-ch3.adb (P_Declarative_Item): Tweak handling of with keyword.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/par-ch3.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/par-ch3.adb b/gcc/ada/par-ch3.adb
index aac45890c97..483f96d041f 100644
--- a/gcc/ada/par-ch3.adb
+++ b/gcc/ada/par-ch3.adb
@@ -4682,7 +4682,7 @@ package body Ch3 is
  when Tok_With =>
 Check_Bad_Layout;
 
-if Aspect_Specifications_Present then
+if Aspect_Specifications_Present (Strict => True) then
 
--  If we are after a semicolon, complain that it was ignored.
--  But we don't really ignore it, since we dump the aspects,
-- 
2.34.1



Re: [PATCH][PING][sanitizer/106558] asan: fix unsafe optimization of Asan checks.

2022-11-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 21, 2022 at 12:57:15PM +0300, Yuri Gribov wrote:
> From 4729f2db3f1b6b40ef0124e4a645788d7f66f426 Mon Sep 17 00:00:00 2001
> From: Yuri Gribov 
> Date: Sun, 14 Aug 2022 08:42:44 +0300
> Subject: [PATCH] asan: fix unsafe optimization of Asan checks.
> 
> gcc/
> PR sanitizer/106558
> * sanopt.c: Do not optimize out checks for non-SSA addresses.
> 
> gcc/testsuite/
> PR sanitizer/106558
> * c-c++-common/asan/pr106558.c: New test.
> ---
>  gcc/sanopt.cc  | 40 +-
>  gcc/testsuite/c-c++-common/asan/pr106558.c | 23 +
>  2 files changed, 54 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/asan/pr106558.c
> 
> diff --git a/gcc/sanopt.cc b/gcc/sanopt.cc
> index e9d188d7889..13942a0b1da 100644
> --- a/gcc/sanopt.cc
> +++ b/gcc/sanopt.cc
> @@ -80,16 +80,16 @@ struct sanopt_info
>  
>  /* If T has a single definition of form T = T2, return T2.  */
>  
> -static tree
> +static gimple *
>  maybe_get_single_definition (tree t)
>  {
>if (TREE_CODE (t) == SSA_NAME)
>  {
>gimple *g = SSA_NAME_DEF_STMT (t);
>if (gimple_assign_single_p (g))
> - return gimple_assign_rhs1 (g);
> + return g;
>  }
> -  return NULL_TREE;
> +  return NULL;
>  }
>  
>  /* Tree triplet for vptr_check_map.  */
> @@ -618,11 +618,30 @@ maybe_optimize_ubsan_vptr_ifn (class sanopt_ctx *ctx, 
> gimple *stmt)
>return true;
>  }
>  
> +/* Checks whether value of T in CHECK and USE is the same.  */
> +
> +static bool same_value_p (gimple *check, gimple *use, tree t)

Formatting.  Function name should be on another line:
static bool
same_value_p (gimple *check, gimple *use, tree t)

Otherwise LGTM.  Thanks and sorry for the review delay.

Jakub



[PATCH][PING][sanitizer/106558] asan: fix unsafe optimization of Asan checks.

2022-11-21 Thread Yuri Gribov via Gcc-patches
Hi,

This patch fixes incorrect Asan optimization in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106558 . It successfully
passes bootstrap-asan, regular bootstrap and regression testing (on
x86/amd64).

With this patch number of optimizations has reduced only slightly
(146062 -> 145824 on bootstrap-asan) so I decided to skip the more
complicated alias oracle-based approach that was suggested by Jakub in
the PR.

Best regards,
Yuri
From 4729f2db3f1b6b40ef0124e4a645788d7f66f426 Mon Sep 17 00:00:00 2001
From: Yuri Gribov 
Date: Sun, 14 Aug 2022 08:42:44 +0300
Subject: [PATCH] asan: fix unsafe optimization of Asan checks.

gcc/
PR sanitizer/106558
* sanopt.c: Do not optimize out checks for non-SSA addresses.

gcc/testsuite/
PR sanitizer/106558
* c-c++-common/asan/pr106558.c: New test.
---
 gcc/sanopt.cc  | 40 +-
 gcc/testsuite/c-c++-common/asan/pr106558.c | 23 +
 2 files changed, 54 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/asan/pr106558.c

diff --git a/gcc/sanopt.cc b/gcc/sanopt.cc
index e9d188d7889..13942a0b1da 100644
--- a/gcc/sanopt.cc
+++ b/gcc/sanopt.cc
@@ -80,16 +80,16 @@ struct sanopt_info
 
 /* If T has a single definition of form T = T2, return T2.  */
 
-static tree
+static gimple *
 maybe_get_single_definition (tree t)
 {
   if (TREE_CODE (t) == SSA_NAME)
 {
   gimple *g = SSA_NAME_DEF_STMT (t);
   if (gimple_assign_single_p (g))
-	return gimple_assign_rhs1 (g);
+	return g;
 }
-  return NULL_TREE;
+  return NULL;
 }
 
 /* Tree triplet for vptr_check_map.  */
@@ -618,11 +618,30 @@ maybe_optimize_ubsan_vptr_ifn (class sanopt_ctx *ctx, gimple *stmt)
   return true;
 }
 
+/* Checks whether value of T in CHECK and USE is the same.  */
+
+static bool same_value_p (gimple *check, gimple *use, tree t)
+{
+  tree check_vuse = gimple_vuse (check);
+  tree use_vuse = gimple_vuse (use);
+
+  if (TREE_CODE (t) == SSA_NAME
+  || is_gimple_min_invariant (t)
+  || ! use_vuse)
+return true;
+
+  if (check_vuse == use_vuse)
+return true;
+
+  return false;
+}
+
 /* Returns TRUE if ASan check of length LEN in block BB can be removed
if preceded by checks in V.  */
 
 static bool
-can_remove_asan_check (auto_vec , tree len, basic_block bb)
+can_remove_asan_check (auto_vec , tree len, basic_block bb,
+		   gimple *base_stmt, tree base_addr)
 {
   unsigned int i;
   gimple *g;
@@ -674,8 +693,10 @@ can_remove_asan_check (auto_vec , tree len, basic_block bb)
 
 	  last_bb = imm;
 	}
-  if (last_bb == gbb)
-	remove = true;
+  if (last_bb != gbb)
+	break;
+  // In case of base_addr residing in memory we also need to check aliasing
+  remove = ! base_addr || same_value_p (g, base_stmt, base_addr);
   break;
 }
 
@@ -718,7 +739,8 @@ maybe_optimize_asan_check_ifn (class sanopt_ctx *ctx, gimple *stmt)
 
   auto_vec *ptr_checks = >asan_check_map.get_or_insert (ptr);
 
-  tree base_addr = maybe_get_single_definition (ptr);
+  gimple *base_stmt = maybe_get_single_definition (ptr);
+  tree base_addr = base_stmt ? gimple_assign_rhs1 (base_stmt) : NULL_TREE;
   auto_vec *base_checks = NULL;
   if (base_addr)
 {
@@ -747,11 +769,11 @@ maybe_optimize_asan_check_ifn (class sanopt_ctx *ctx, gimple *stmt)
   bool remove = false;
 
   if (ptr_checks)
-remove = can_remove_asan_check (*ptr_checks, len, bb);
+remove = can_remove_asan_check (*ptr_checks, len, bb, NULL, NULL);
 
   if (!remove && base_checks)
 /* Try with base address as well.  */
-remove = can_remove_asan_check (*base_checks, len, bb);
+remove = can_remove_asan_check (*base_checks, len, bb, base_stmt, base_addr);
 
   if (!remove)
 {
diff --git a/gcc/testsuite/c-c++-common/asan/pr106558.c b/gcc/testsuite/c-c++-common/asan/pr106558.c
new file mode 100644
index 000..d82b2dc7a83
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/pr106558.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-w -fpermissive" } */
+/* { dg-shouldfail "asan" } */
+
+int a;
+int *b = 
+int **c = 
+int d[1];
+int *e = [1];
+
+static int f(int *g) {
+  *b = e;
+  *c = e;
+  *b = 2;
+  *g = 2;
+}
+
+int main() {
+f(b);
+return *b;
+}
+
+/* { dg-output "AddressSanitizer: global-buffer-overflow on address" } */
-- 
2.17.1



[r13-4180 Regression] FAIL: g++.dg/other/pr39060.C -std=c++14 (test for excess errors) on Linux/x86_64

2022-11-21 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

183db4fb73a64bc4641604c30cdbbd9d9e8a6ed6 is the first bad commit
commit 183db4fb73a64bc4641604c30cdbbd9d9e8a6ed6
Author: liuhongt 
Date:   Thu Nov 17 16:19:31 2022 +0800

define builtins for "shared" avxneconvert-avx512bf16vl builtins.

caused

FAIL: g++.dg/other/pr39060.C  -std=c++14 (internal compiler error: canonical 
types differ for identical types 'void (A::)(void*)' and 'void (A::)(void*)')
FAIL: g++.dg/other/pr39060.C  -std=c++14 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-4180/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/other/pr39060.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH] changelog: Fix extra space after tab. fix extra spaces after tab

2022-11-21 Thread Martin Liška
On 11/21/22 10:23, Andreas Schwab wrote:
> That should be refilled, every file entry on one line.

Sure, please send a patch for it.

Martin


  1   2   >