Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required

2022-05-05 Thread Alexandre Oliva via Gcc-patches
On May  5, 2022, Segher Boessenkool  wrote:

> On Thu, May 05, 2022 at 03:52:01AM -0300, Alexandre Oliva wrote:
>> +  else if (reg && MEM_P (reg)
>> +   && STRICT_ALIGNMENT && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
>> +return false;

> Please fix the line breaks?  Either do a break before every &&, or put
> as many things as possible on one line?

I was going for conceptual grouping of alignment-related subexprs,
but I don't care enough to fight for it.

> Note that you should never have paradoxical subregs of mem on rs6000 or
> any other target with INSN_SCHEDULING.

Great, that alleviates some of my concerns about overreaching in this patch.

>> +#include "../../gcc.c-torture/compile/pr100106.c"

> It is better to copy the 11 lines of code.

'k

> Please comment what the ilp32 is for (namely, the -mcpu= will barf
> without it)..

Ack

> The testcase is okay with those changes, thanks!

Thanks.  Here's the revised patch.

I'm now testing on several platforms a follow-up patch that introduces
TARGET_ALLOW_SUBREG_OF_MEM.


[PR100106] Reject unaligned subregs when strict alignment is required

From: Alexandre Oliva 

The testcase for pr100106, compiled with optimization for 32-bit
powerpc -mcpu=604 with -mstrict-align expands the initialization of a
union from a float _Complex value into a load from an SCmode
constant pool entry, aligned to 4 bytes, into a DImode pseudo,
requiring 8-byte alignment.

The patch that introduced the testcase modified simplify_subreg to
avoid changing the MEM to outermode, but simplify_gen_subreg still
creates a SUBREG or a MEM that would require stricter alignment than
MEM's, and lra_constraints appears to get confused by that, repeatedly
creating unsatisfiable reloads for the SUBREG until it exceeds the
insn count.

Avoiding the unaligned SUBREG, expand splits the DImode dest into
SUBREGs and loads each SImode word of the constant pool with the
proper alignment.


for  gcc/ChangeLog

PR target/100106
* emit-rtl.cc (validate_subreg): Reject a SUBREG of a MEM that
requires stricter alignment than MEM's.

for  gcc/testsuite/ChangeLog

PR target/100106
* gcc.target/powerpc/pr100106-sa.c: New.
---
 gcc/emit-rtl.cc|4 
 gcc/testsuite/gcc.target/powerpc/pr100106-sa.c |   15 +++
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr100106-sa.c

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 1e02ae254d012..9c03e27894fff 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -982,6 +982,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
 
   return subreg_offset_representable_p (regno, imode, offset, omode);
 }
+  /* Do not allow SUBREG with stricter alignment than the inner MEM.  */
+  else if (reg && MEM_P (reg) && STRICT_ALIGNMENT
+  && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
+return false;
 
   /* The outer size must be ordered wrt the register size, otherwise
  we wouldn't know at compile time how many registers the outer
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c 
b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
new file mode 100644
index 0..87634efa8d0b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
@@ -0,0 +1,15 @@
+/* Require ilp32 because -mcpu=604 won't do 64 bits.  */
+/* { dg-do compile { target { ilp32 } } } */
+/* { dg-options "-mcpu=604 -O -mstrict-align" } */
+
+union a {
+  float _Complex b;
+  long long c;
+};
+
+void g(union a);
+
+void e() {
+  union a f = {1.0f};
+  g(f);
+}


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] Remove non-ANSI C path in ansidecl.h.

2022-05-05 Thread Eric Gallager
On Thu, May 5, 2022 at 8:27 AM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, May 5, 2022 at 2:19 PM Martin Liška  wrote:
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
> > Thanks,
> > Martin
> >
> > include/ChangeLog:
> >
> > * ansidecl.h (PTR): Remove Not ANCI C part.
> > ---
> >  include/ansidecl.h | 16 +---
> >  1 file changed, 1 insertion(+), 15 deletions(-)
> >
> > diff --git a/include/ansidecl.h b/include/ansidecl.h
> > index 4275c9b9cbd..f42c6afc7e9 100644
> > --- a/include/ansidecl.h
> > +++ b/include/ansidecl.h
> > @@ -89,21 +89,7 @@ So instead we use the macro below and test it against 
> > specific values.  */
> >  # endif
> >  #endif
> >
> > -#else  /* Not ANSI C.  */
> > -
> > -#define PTRchar *
> > -
> > -/* some systems define these in header files for non-ansi mode */
> > -#undef const
> > -#undef volatile
> > -#undef signed
> > -#undef inline
> > -#define const
> > -#define volatile
> > -#define signed
> > -#define inline
> > -
> > -#endif /* ANSI C.  */
>
> You'd have to ask the sourceware side as well (binutils), but for sure
> either the
> guarding #if should be removed or the #else path should contain an #error.

Maybe just make it a #warning for one release, and then if no one
complains, turn it into an #error in the following release?

>
> Richard.
>
> > +#endif
> >
> >  /* Define macros for some gcc attributes.  This permits us to use the
> > macros freely, and know that they will come into play for the
> > --
> > 2.36.0
> >


Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-05 Thread Yonghong Song via Gcc-patches




On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

    #define __typetag1 __attribute__((btf_type_tag("tag1")))
    #define __typetag2 __attribute__((btf_type_tag("tag2")))
    #define __typetag3 __attribute__((btf_type_tag("tag3")))

    int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags 'tag2' and 
'tag3',

to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or C2x [[]]
attribute syntax.  In either syntax, __typetag2 __typetag3 should 
apply to

the type to which g points, not to g or its type, just as if you had a
type qualifier there.  You'd need to put the attributes (or qualifier)
after the *, not before, to make them apply to the pointer type.  See
"Attribute Syntax" in the GCC manual for how the syntax is defined for 
GNU
attributes and deduce in turn, for each subsequence of the tokens 
matching

the syntax for some kind of declarator, what the type for "T D1" would be
as defined there and in the C standard, as deduced from the type for 
"T D"

for a sub-declarator D.
 >> But GCC's attribute parsing produces a variable 'g' which is "a 

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the 
pointer

type it would need to go after the * not before.

If you are concerned with the fine details of what construct an attribute
appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of the syntax 
was not correct.


(Actually, I made a bad mistake in paraphrasing this example from the 
discussion of it in the series cover letter. But, the reason why it is 
incorrect is the same.)



Yonghong, is the specific ordering an expectation in BPF programs or 
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, 
annotations: !7)

!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)




This example comes from my testing against clang to check that the BTF 
generated by both toolchains is compatible. In this case we get 
different results when using the GNU attribute syntax.



To avoid confusion, here is the full example (from the cover letter). 
The difference in the results is clear in the DWARF.



Consider the following example:

  #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
  #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
  #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))

  int __typetag1 * __typetag2 __typetag3 * g;

 type 0x774495e8 int>

    asm_written unsigned DI
    size 
    unit-size 
    align:64 warn_if_not_align:0 symtab:0 alias-set -1 
canonical-type 0x77450888

    attributes 
    value     value 0x77509738>

    readonly constant static "type-tag-3\000">>
    chain 

    value     value 

    readonly constant static "type-tag-2\000"
    pointer_to_this >
    asm_written unsigned DI size  
unit-size 
    align:64 warn_if_not_align:0 symtab:0 alias-set -1 
canonical-type 0x77509930
    attributes 0x7753a1e0 btf_type_tag>

    value     value 0x77509738>

    readonly constant static "type-tag-1\000"
    public static unsigned DI defer-output 
/home/dfaust/playpen/btf/annotate.c:29:42 size 0x7743c450 64> unit-size 

    align:64 warn_if_not_align:0>




The current implementation produces the following DWARF:

 <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
    <1f>   DW_AT_name    : g
    <21>   DW_AT_decl_file   : 1
    <22>   DW_AT_decl_line   : 6
    <23>   DW_AT_decl_column : 42
    <24>   DW_AT_type   

Re: Ping #5: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Peter Bergner via Gcc-patches
On 5/5/22 5:51 PM, Michael Meissner wrote:
> On Thu, May 05, 2022 at 02:35:34PM -0500, Segher Boessenkool wrote>> A patch 
> like that is pre-approved, even for trunk.
> 
> And as I said, logically we should do the same for p10 fusion.  I.e.
> 
>callee_isa &= ~(OPTION_MASK_P8_FUSION
>| OPTION_MASK_P10_FUSION);
>explicit_isa &= ~(OPTION_MASK_P8_FUSION
> | OPTION_MASK_P10_FUSION);
> 

I can add that to the simple patch while we wait for the bigger patch.

Peter


Re: Ping #5: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Michael Meissner via Gcc-patches
On Thu, May 05, 2022 at 02:35:34PM -0500, Segher Boessenkool wrote:
> On Thu, May 05, 2022 at 01:59:05PM -0500, Peter Bergner wrote:
> > If we cannot get this in soonish, maybe we can at least get approval for
> > applying Mike's simpler patch to the release branches, specifically GCC 10?
> > 
> >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059#c31
> 
> Just an unconditional
> 
>   callee_isa &= ~OPTION_MASK_P8_FUSION;
>   explicit_isa &= ~OPTION_MASK_P8_FUSION;
> 
> will do, no?  That is fine since these options should never have been
> used to determine if anything can be inlined, in the first place.
> 
> A patch like that is pre-approved, even for trunk.

And as I said, logically we should do the same for p10 fusion.  I.e.

   callee_isa &= ~(OPTION_MASK_P8_FUSION
   | OPTION_MASK_P10_FUSION);
   explicit_isa &= ~(OPTION_MASK_P8_FUSION
| OPTION_MASK_P10_FUSION);

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[COMMITTED] Update libgomp docs to reflect Fortran support for non-rectangular loops

2022-05-05 Thread Sandra Loosemore
I've checked in this one-liner to note that OpenMP support for 
non-rectangular loops is now complete in the feature checklist.  Thanks 
to Tobias for pointing me at this.


-Sandracommit 2d8752c5923e2ed4dc33b95038fed82b46526feb
Author: Sandra Loosemore 
Date:   Thu May 5 14:45:29 2022 -0700

libgomp: Update docs to reflect Fortran support for non-rectangular loops

libgomp/
	* libgomp.texi (OpenMP 5.0): Feature is now fully supported.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 38e0337..414cc50 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -190,7 +190,7 @@ The OpenMP 4.5 specification is fully supported.
   @tab Only fulfillable requirement are @code{atomic_default_mem_order}
   and @code{dynamic_allocators}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
-@item Non-rectangular loop nests @tab P @tab Only C/C++
+@item Non-rectangular loop nests @tab Y @tab
 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
 @item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
   constructs @tab Y @tab


Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Segher Boessenkool
On Thu, May 05, 2022 at 04:21:21PM -0400, Michael Meissner wrote:
> On Thu, May 05, 2022 at 02:12:43PM -0500, Segher Boessenkool wrote:
> > On Tue, Apr 12, 2022 at 09:14:55PM -0400, Michael Meissner wrote:
> > > This is V4 of the patch.  Compared to V3 of the patch, GCC will just
> > > ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.
> > 
> > But incorrectly :-(
> > 
> > > The splitting of signed halfword and word loads into unsigned load and
> > > sign extension is now suppressed with -Os, but it is done normally if we
> > > are not optimizing for space.
> > 
> > I have no idea what that means.  Other than that I asked to remove that.
> > 
> > > This code makes the -mpower8-fusion option a nop.  It is accepted without
> > > warning, but it does nothing.  Power8 fusion is only enabled if we are 
> > > tuning
> > > for a power8.
> > 
> > It should *delete* the option, and have
> > ;; This option existed in the past, but now is always off.
> > mno-power8-fusion
> > Target RejectNegative Undocumented Ignore
> > 
> > > The undocumented -mpower8-fusion-sign option is also made into a nop.
> > 
> > That one should be deleted.
> 
> Sure, but note the customer that asked for the patch, explicitly is using
> -mno-power8-fusion.  I don't want to break their makefiles.

-mno-power8-fusion is retained, and ignored.  -mpower8-fusion is
retained (but as separate option), ignored, and warned for.  Both
-mpower8-fusion-sign and -mno-power8-fusion-sign can be removed, since
nothing uses those options.

> > > +  /* The Power8 fusion option was removed.  We ignore using it in 
> > > #pragma and
> > > + attribute target.  Users may have used the options to suppress 
> > > errors if
> > > + they declare an inline function to be specifically power8 and the 
> > > function
> > > + was included by power9 or power10 which turned off the power8 fusion
> > > + support.  */
> > > +  { "power8-fusion", 0,  false, 
> > > true  },
> > 
> > What does the comment mean?
> 
> One of the suggestions that I made to the customer was to change their code
> from:
> 
> static inline int
> __attribute__ ((always_inline,target("cpu=power8")))
> foo (..)
> {
>   ...
> }
> 
> to:
> 
> static inline int
> __attribute__ ((always_inline,target("cpu=power8,no-power8-fusion")))
> foo (...)
> {
>   ...
> }
> 
> If they used this, it would avoid the issue, and not need to use a new switch.
> Whether they used it, I don't know.  Whether some other customer who ran into
> the problem used it, again I don't know.  If we remove the support for it in
> target pragma and attribute, we can break code.

There is still a -mno-power8-fusion option after you implemented what I
suggested all of this time.  This code will still just work, no?

> > > +  /* Don't print options that exist for backwards compatibility, but 
> > > are
> > > +  ignored now like -mpower8-fusion.  */
> > > +  if (!mask)
> > > + continue;
> > 
> > No.  Such options should not be in the mask at all.
> 
> It is just to support ignoring setting no-power8-fusion in the target pragma
> and attribute options.

If you need to do that separately, there is something else wrong.  And
even then, if you need to do it manually, you should *do* it manually,
not make all kinds of coplications all over the place.

> > > +/* Power8 has special fusion operations that are enabled if we are 
> > > tuning for
> > > +   power8.  This used to be settable with an option (-mpower8-fusion), 
> > > but that
> > > +   option has been removed.  */
> > > +#define TARGET_P8_FUSION (rs6000_tune == PROCESSOR_POWER8)
> > 
> > The plan was to not have p8 fusion at all.  GCC never implemented any of
> > the more useful p8 fusion things anyway, and those were only marginally
> > beneficial anyway.
> 
> No, no, no, no.  That is incorrect.

It was the plan I agreed with.

> And by the way, we likely will have a similar issue with power10 fusion.  I
> specifically have not done anything for that to avoid clouding the issue for
> this bug.  But I imagine we may need to look at that in the future.

Whether anything is fused or not should never be of any influence on
what is inlined or not in what or whatnot.  Fusion is comparable to
core-specific scheduling in almost all ways.

> > >  mpower8-fusion-sign
> > > -Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags)
> > > -Allow sign extension in fusion operations.
> > > +Target Undocumented Ignore
> > 
> > And this one should be completely removed, since no one ever used it.
> 
> Well like all tuning flags, it was used quite a lot when I was doing the
> initial work.

But it should arguably never have made it into any release branch.


Segher


Re: Ping #5: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Peter Bergner via Gcc-patches
On 5/5/22 4:27 PM, Segher Boessenkool wrote:
> On Thu, May 05, 2022 at 02:59:07PM -0500, Peter Bergner wrote:
>> On 5/5/22 2:35 PM, Segher Boessenkool wrote:
>>> A patch like that is pre-approved, even for trunk.
>>
>> That works for me!  I will apply this directly to GCC 10 and regtest and
>> push if clean so we can unblock our customer.
>>
>> As for trunk, GCC 12 & 11, I think we can wait for the backport of Mike's
>> patch that removes the option altogether.
> 
> Please put it on trunk and 12 and 11 as well.  To keep things sane.

Will do.

Peter




Re: Ping #5: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Segher Boessenkool
On Thu, May 05, 2022 at 02:59:07PM -0500, Peter Bergner wrote:
> On 5/5/22 2:35 PM, Segher Boessenkool wrote:
> > On Thu, May 05, 2022 at 01:59:05PM -0500, Peter Bergner wrote:
> >> If we cannot get this in soonish, maybe we can at least get approval for
> >> applying Mike's simpler patch to the release branches, specifically GCC 10?
> >>
> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059#c31
> > 
> > Just an unconditional
> > 
> >   callee_isa &= ~OPTION_MASK_P8_FUSION;
> >   explicit_isa &= ~OPTION_MASK_P8_FUSION;
> > 
> > will do, no?  That is fine since these options should never have been
> > used to determine if anything can be inlined, in the first place.
> >
> > A patch like that is pre-approved, even for trunk.
> 
> That works for me!  I will apply this directly to GCC 10 and regtest and
> push if clean so we can unblock our customer.
> 
> As for trunk, GCC 12 & 11, I think we can wait for the backport of Mike's
> patch that removes the option altogether.

Please put it on trunk and 12 and 11 as well.  To keep things sane.

Thanks,


Segher


Re: [PATCH v2] c++: wrong error with MVP and pushdecl [PR64679]

2022-05-05 Thread Jason Merrill via Gcc-patches

On 5/5/22 16:57, Marek Polacek wrote:

On Wed, May 04, 2022 at 09:29:46PM -0400, Jason Merrill wrote:

On 5/4/22 19:20, Marek Polacek wrote:

On Wed, May 04, 2022 at 05:44:45PM -0400, Jason Merrill wrote:

On 5/4/22 16:03, Marek Polacek wrote:

This patch fixes the second half of 64679.  Here we issue a wrong
"redefinition of 'int x'" for the following:

 struct Bar {
   Bar(int, int, int);
 };

 int x = 1;
 Bar bar(int(x), int(x), int{x}); // #1

cp_parser_parameter_declaration_list does pushdecl every time it sees
a named parameter, so the second "int(x)" causes the error.  That's
premature, since this turns out to be a constructor call after the
third argument!

If the first parameter is parenthesized, we can't push until we've
established we're looking at a function declaration.  Therefore this
could be fixed by some kind of lookahead.  I thought about introducing
a lightweight variant of cp_parser_parameter_declaration_list that would
not have any side effects and would return as soon as it figures out
whether it's looking at a declaration or expression.  Since that would
require fairly nontrivial changes, I wanted something simpler.  Something
like delaying the pushdecl until we've reached the ')' following the
parameter-declaration-clause.  But that doesn't quite cut it: we must
have pushed the parameters before processing a default argument, as in:

 Bar bar(int(a), int(b), int c = sizeof(a));  // valid


I wondered how this would affect

void f(int (i), decltype(i) j = 42);

interestingly, clang and EDG both reject this, but they accept

void f(int (i), bool b = true, decltype(i) j = 42);

which suggests a similar implementation strategy.  MSVC accepts both.


Sigh, C++.

So the former would be rejected with this patch because decltype(i)
is parsed as the declspec.  And I can't play any cute games with decltype
because a decltype doesn't necessarily mean a parameter:

struct F {
F(int, int);
};

void
g ()
{
int x = 42;

F v1(int(x), decltype(x)(42));

F f1(int(i), decltype(i) j = 42);
F f2(int(i), decltype(i) j);
F f3(int(i), decltype(i)(j));
F f4(int(i), decltype(i)(j) = 42);
F f5(int (i), bool b = true, decltype(i) j = 42);
F f6(int(i), decltype(x)(x));
}


But I think there's a way out: we could pushdecl the parameters as we
go and only stash when there would be a clash, if parsing tentatively.
And then push the pending parameters only at the end of the clause, solely
to get the redefinition/redeclaration error.  That is:

Bar b(int(x), int(x), int{x});

would mean:
push x
store x
it's not a decl -> discard it, parse as an expression

Bar b(int(x), int(x), int);

would mean:
push x
store x
it's a decl -> push pending parameters -> error

And then I don't need to push when about to commit, avoiding the need to
change cp_parser_parameter_declaration.  WDYT?


Sounds good to me.


Thanks!  Here's that approach in a patch form:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.


-- >8 --
This patch fixes the second half of 64679.  Here we issue a wrong
"redefinition of 'int x'" for the following:

   struct Bar {
 Bar(int, int, int);
   };

   int x = 1;
   Bar bar(int(x), int(x), int{x}); // #1

cp_parser_parameter_declaration_list does pushdecl every time it sees
a named parameter, so the second "int(x)" causes the error.  That's
premature, since this turns out to be a constructor call after the
third argument!

If the first parameter is parenthesized, we can't push until we've
established we're looking at a function declaration.  Therefore this
could be fixed by some kind of lookahead.  I thought about introducing a
lightweight variant of cp_parser_parameter_declaration_list that would
not have any side effects and would return as soon as it figures out
whether it's looking at a declaration or expression.  Since that would
require fairly nontrivial changes, I wanted something simpler.

Something like delaying the pushdecl until we've reached the ')'
following the parameter-declaration-clause.  But we must push the
parameters before processing a default argument, as in:

   Bar bar(int(a), int(b), int c = sizeof(a));  // valid

Moreover, this code should still be accepted

   Bar f(int(i), decltype(i) j = 42);

so this patch stashes parameters into a vector when parsing tentatively
only when pushdecl-ing a parameter would result in a clash and an error
about redefinition/redeclaration.  The stashed parameters are pushed at
the end of a parameter-declaration-clause if it's followed by a ')', so
that we still diagnose redefining a parameter.

PR c++/64679

gcc/cp/ChangeLog:

* parser.cc (cp_parser_parameter_declaration_clause): Maintain
a vector of parameters that haven't been pushed yet.  Push them at the
end of a valid parameter-declaration-clause.
(cp_parser_parameter_declaration_list): Take a new auto_vec parameter.
Do not pushdecl while 

[PATCH] libsanitizer: cherry-pick commit b226894d475b from upstream

2022-05-05 Thread H.J. Lu via Gcc-patches
cherry-pick:

b226894d475b [sanitizer] [sanitizer] Correct GetTls for x32
---
 libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp 
b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
index d966d857a76..620267cdd02 100644
--- a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
@@ -462,7 +462,11 @@ static void GetTls(uptr *addr, uptr *size) {
 #elif SANITIZER_GLIBC && defined(__x86_64__)
   // For aarch64 and x86-64, use an O(1) approach which requires relatively
   // precise ThreadDescriptorSize. g_tls_size was initialized in InitTlsSize.
+#  if SANITIZER_X32
+  asm("mov %%fs:8,%0" : "=r"(*addr));
+#  else
   asm("mov %%fs:16,%0" : "=r"(*addr));
+#  endif
   *size = g_tls_size;
   *addr -= *size;
   *addr += ThreadDescriptorSize();
-- 
2.35.1



[PATCH v2] c++: wrong error with MVP and pushdecl [PR64679]

2022-05-05 Thread Marek Polacek via Gcc-patches
On Wed, May 04, 2022 at 09:29:46PM -0400, Jason Merrill wrote:
> On 5/4/22 19:20, Marek Polacek wrote:
> > On Wed, May 04, 2022 at 05:44:45PM -0400, Jason Merrill wrote:
> > > On 5/4/22 16:03, Marek Polacek wrote:
> > > > This patch fixes the second half of 64679.  Here we issue a wrong
> > > > "redefinition of 'int x'" for the following:
> > > > 
> > > > struct Bar {
> > > >   Bar(int, int, int);
> > > > };
> > > > 
> > > > int x = 1;
> > > > Bar bar(int(x), int(x), int{x}); // #1
> > > > 
> > > > cp_parser_parameter_declaration_list does pushdecl every time it sees
> > > > a named parameter, so the second "int(x)" causes the error.  That's
> > > > premature, since this turns out to be a constructor call after the
> > > > third argument!
> > > > 
> > > > If the first parameter is parenthesized, we can't push until we've
> > > > established we're looking at a function declaration.  Therefore this
> > > > could be fixed by some kind of lookahead.  I thought about introducing
> > > > a lightweight variant of cp_parser_parameter_declaration_list that would
> > > > not have any side effects and would return as soon as it figures out
> > > > whether it's looking at a declaration or expression.  Since that would
> > > > require fairly nontrivial changes, I wanted something simpler.  
> > > > Something
> > > > like delaying the pushdecl until we've reached the ')' following the
> > > > parameter-declaration-clause.  But that doesn't quite cut it: we must
> > > > have pushed the parameters before processing a default argument, as in:
> > > > 
> > > > Bar bar(int(a), int(b), int c = sizeof(a));  // valid
> > > 
> > > I wondered how this would affect
> > > 
> > >void f(int (i), decltype(i) j = 42);
> > > 
> > > interestingly, clang and EDG both reject this, but they accept
> > > 
> > >void f(int (i), bool b = true, decltype(i) j = 42);
> > > 
> > > which suggests a similar implementation strategy.  MSVC accepts both.
> > 
> > Sigh, C++.
> > 
> > So the former would be rejected with this patch because decltype(i)
> > is parsed as the declspec.  And I can't play any cute games with decltype
> > because a decltype doesn't necessarily mean a parameter:
> > 
> > struct F {
> >F(int, int);
> > };
> > 
> > void
> > g ()
> > {
> >int x = 42;
> > 
> >F v1(int(x), decltype(x)(42));
> > 
> >F f1(int(i), decltype(i) j = 42);
> >F f2(int(i), decltype(i) j);
> >F f3(int(i), decltype(i)(j));
> >F f4(int(i), decltype(i)(j) = 42);
> >F f5(int (i), bool b = true, decltype(i) j = 42);
> >F f6(int(i), decltype(x)(x));
> > }
> > 
> > 
> > But I think there's a way out: we could pushdecl the parameters as we
> > go and only stash when there would be a clash, if parsing tentatively.
> > And then push the pending parameters only at the end of the clause, solely
> > to get the redefinition/redeclaration error.  That is:
> > 
> >Bar b(int(x), int(x), int{x});
> > 
> > would mean:
> > push x
> > store x
> > it's not a decl -> discard it, parse as an expression
> > 
> >Bar b(int(x), int(x), int);
> > 
> > would mean:
> > push x
> > store x
> > it's a decl -> push pending parameters -> error
> > 
> > And then I don't need to push when about to commit, avoiding the need to
> > change cp_parser_parameter_declaration.  WDYT?
> 
> Sounds good to me.

Thanks!  Here's that approach in a patch form:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch fixes the second half of 64679.  Here we issue a wrong
"redefinition of 'int x'" for the following:

  struct Bar {
Bar(int, int, int);
  };

  int x = 1;
  Bar bar(int(x), int(x), int{x}); // #1

cp_parser_parameter_declaration_list does pushdecl every time it sees
a named parameter, so the second "int(x)" causes the error.  That's
premature, since this turns out to be a constructor call after the
third argument!

If the first parameter is parenthesized, we can't push until we've
established we're looking at a function declaration.  Therefore this
could be fixed by some kind of lookahead.  I thought about introducing a
lightweight variant of cp_parser_parameter_declaration_list that would
not have any side effects and would return as soon as it figures out
whether it's looking at a declaration or expression.  Since that would
require fairly nontrivial changes, I wanted something simpler.

Something like delaying the pushdecl until we've reached the ')'
following the parameter-declaration-clause.  But we must push the
parameters before processing a default argument, as in:

  Bar bar(int(a), int(b), int c = sizeof(a));  // valid

Moreover, this code should still be accepted

  Bar f(int(i), decltype(i) j = 42);

so this patch stashes parameters into a vector when parsing tentatively
only when pushdecl-ing a parameter would result in a clash and an error
about redefinition/redeclaration.  The stashed parameters are pushed at
the end of a parameter-declaration-clause if it's followed by a ')', so

Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Michael Meissner via Gcc-patches
On Thu, May 05, 2022 at 02:12:43PM -0500, Segher Boessenkool wrote:
> On Tue, Apr 12, 2022 at 09:14:55PM -0400, Michael Meissner wrote:
> > This is V4 of the patch.  Compared to V3 of the patch, GCC will just
> > ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.
> 
> But incorrectly :-(
> 
> > The splitting of signed halfword and word loads into unsigned load and
> > sign extension is now suppressed with -Os, but it is done normally if we
> > are not optimizing for space.
> 
> I have no idea what that means.  Other than that I asked to remove that.
> 
> > This code makes the -mpower8-fusion option a nop.  It is accepted without
> > warning, but it does nothing.  Power8 fusion is only enabled if we are 
> > tuning
> > for a power8.
> 
> It should *delete* the option, and have
> ;; This option existed in the past, but now is always off.
> mno-power8-fusion
> Target RejectNegative Undocumented Ignore
> 
> > The undocumented -mpower8-fusion-sign option is also made into a nop.
> 
> That one should be deleted.

Sure, but note the customer that asked for the patch, explicitly is using
-mno-power8-fusion.  I don't want to break their makefiles.

> 
> > +  /* The Power8 fusion option was removed.  We ignore using it in #pragma 
> > and
> > + attribute target.  Users may have used the options to suppress errors 
> > if
> > + they declare an inline function to be specifically power8 and the 
> > function
> > + was included by power9 or power10 which turned off the power8 fusion
> > + support.  */
> > +  { "power8-fusion",   0,  false, 
> > true  },
> 
> What does the comment mean?

One of the suggestions that I made to the customer was to change their code
from:

static inline int
__attribute__ ((always_inline,target("cpu=power8")))
foo (..)
{
  ...
}

to:

static inline int
__attribute__ ((always_inline,target("cpu=power8,no-power8-fusion")))
foo (...)
{
  ...
}

If they used this, it would avoid the issue, and not need to use a new switch.
Whether they used it, I don't know.  Whether some other customer who ran into
the problem used it, again I don't know.  If we remove the support for it in
target pragma and attribute, we can break code.

> 
> > +  /* Don't print options that exist for backwards compatibility, but 
> > are
> > +ignored now like -mpower8-fusion.  */
> > +  if (!mask)
> > +   continue;
> 
> No.  Such options should not be in the mask at all.

It is just to support ignoring setting no-power8-fusion in the target pragma
and attribute options.
> 
> > +/* Power8 has special fusion operations that are enabled if we are tuning 
> > for
> > +   power8.  This used to be settable with an option (-mpower8-fusion), but 
> > that
> > +   option has been removed.  */
> > +#define TARGET_P8_FUSION   (rs6000_tune == PROCESSOR_POWER8)
> 
> The plan was to not have p8 fusion at all.  GCC never implemented any of
> the more useful p8 fusion things anyway, and those were only marginally
> beneficial anyway.

No, no, no, no.  That is incorrect.

In the power8 days, we (mostly me) specifically spent a lot of time to add
fusion support.  I ultimately ran out of time in the initial GCC release to do
all of the optimizations planned.  Later, when it became apparent that power9
would not implement this fusion (origianlly it had been in the plan), the code
kind of withered, and it left some warts.

In particular, the fusion work was presented at the 2014 Gnu Cauldron (slides
24-27 in my deck) among the other power8 changes.

And by the way, we likely will have a similar issue with power10 fusion.  I
specifically have not done anything for that to avoid clouding the issue for
this bug.  But I imagine we may need to look at that in the future.

> 
> > +/* Power8 fusion does not fuse loads with sign extends.  If we are not
> > +   optimizing for space, split loads with sign extension to loads with zero
> > +   extension and an explicit sign extend operation, so that the zero 
> > extending
> > +   load can be fused.  */
> > +#define TARGET_P8_FUSION_SIGN  (TARGET_P8_FUSION   
> > \
> > +&& !optimize_function_for_size_p (cfun))
> 
> As I said before, don't do this.  Just remove the whole thing.
> 
> > +; The -mpower8-fusion and -mpower8-fusion-sign options existed in the 
> > past, but
> > +; they are ignored now.
> 
> Don't put them together.  It is much easier for everything if they are
> separate, boring, and exactly like everything else.
> 
> >  mpower8-fusion
> > -Target Mask(P8_FUSION) Var(rs6000_isa_flags)
> > -Fuse certain integer operations together for better performance on power8.
> > +Target Undocumented Ignore
> 
> It should be deleted, instead, and be replaced with
> 
> ;; This option existed in the past, but now is always off.
> mno-power8-fusion
> Target RejectNegative Undocumented Ignore
> 
> mpower8-fusion
> Target RejectNegative Undocumented WarnRemoved
> 
> i.e. just like all 

[committed] libstdc++: Fixes for tests that fail with -fno-rtti

2022-05-05 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux -frtti/-fno-rtti, pushed to trunk.

-- >8 --

This disables a use of dynamic_cast that is not valid for -fno-rtti and
adjusts some tests so they don't FAIL with -fno-rtti. Some tests are
skipped completely, and others just make use of typeid conditional on
the __cpp_rtti macro. A couple of tests were using typeid to verify
typedefs denote the right type, which can be done at compile-time using
templates instead.

libstdc++-v3/ChangeLog:

* include/experimental/memory_resource [!__cpp_rtti]
(__resource_adaptor_imp::do_is_equal): Do not use dynamic_cast
when RTTI is disabled.
* testsuite/17_intro/freestanding.cc: Require RTTI.
* testsuite/18_support/exception/38732.cc: Likewise.
* testsuite/18_support/exception_ptr/rethrow_exception.cc:
Likewise.
* testsuite/18_support/nested_exception/68139.cc: Likewise.
* testsuite/18_support/nested_exception/rethrow_if_nested.cc:
Likewise.
* testsuite/18_support/type_info/103240.cc: Likewise.
* testsuite/18_support/type_info/fundamental.cc: Likewise.
* testsuite/18_support/type_info/hash_code.cc: Likewise.
* testsuite/20_util/any/assign/emplace.cc: Likewise.
* testsuite/20_util/any/cons/in_place.cc: Likewise.
* testsuite/20_util/any/misc/any_cast.cc: Likewise.
* testsuite/20_util/any/observers/type.cc: Likewise.
* testsuite/20_util/function/1.cc: Likewise.
* testsuite/20_util/function/2.cc: Likewise.
* testsuite/20_util/function/3.cc: Likewise.
* testsuite/20_util/function/4.cc: Likewise.
* testsuite/20_util/function/5.cc: Likewise.
* testsuite/20_util/function/6.cc: Likewise.
* testsuite/20_util/function/7.cc: Likewise.
* testsuite/20_util/function/8.cc: Likewise.
* testsuite/20_util/polymorphic_allocator/resource.cc: Likewise.
* testsuite/20_util/shared_ptr/casts/1.cc: Likewise.
* testsuite/20_util/shared_ptr/casts/rval.cc: Likewise.
* testsuite/20_util/shared_ptr/cons/unique_ptr_deleter_ref_2.cc:
Likewise.
* testsuite/20_util/shared_ptr/misc/get_deleter.cc: Likewise.
* testsuite/20_util/typeindex/comparison_operators.cc: Likewise.
* testsuite/20_util/typeindex/comparison_operators_c++20.cc:
Likewise.
* testsuite/20_util/typeindex/hash.cc: Likewise.
* testsuite/20_util/typeindex/hash_code.cc: Likewise.
* testsuite/20_util/typeindex/name.cc: Likewise.
* testsuite/22_locale/ctype/is/string/89728_neg.cc: Likewise.
* testsuite/22_locale/global_templates/standard_facet_hierarchies.cc:
Likewise.
* testsuite/22_locale/global_templates/user_facet_hierarchies.cc:
Likewise.
* testsuite/22_locale/locale/13630.cc: Check type without using
RTTI.
* 
testsuite/23_containers/array/requirements/non_default_constructible.cc:
Require RTTI.
* testsuite/27_io/basic_ostream/emit/1.cc: Likewise.
* testsuite/27_io/fpos/14320-1.cc: Check type without using RTTI.
* testsuite/27_io/fpos/mbstate_t/12065.cc: Require RTTI.
* testsuite/27_io/ios_base/failure/dual_abi.cc: Likewise.
* testsuite/experimental/any/misc/any_cast.cc: Likewise.
* testsuite/experimental/any/observers/type.cc: Likewise.
* testsuite/experimental/memory_resource/resource_adaptor.cc:
Likewise.
* testsuite/lib/libstdc++.exp (check_effective_target_rtti):
Define new proc.
* testsuite/tr1/3_function_objects/function/1.cc: Likewise.
* testsuite/tr1/3_function_objects/function/2.cc: Likewise.
* testsuite/tr1/3_function_objects/function/3.cc: Likewise.
* testsuite/tr1/3_function_objects/function/4.cc: Likewise.
* testsuite/tr1/3_function_objects/function/5.cc: Likewise.
* testsuite/tr1/3_function_objects/function/6.cc: Likewise.
* testsuite/tr1/3_function_objects/function/7.cc: Likewise.
* testsuite/tr1/3_function_objects/function/8.cc: Likewise.
* testsuite/tr2/bases/value.cc: Likewise.
* testsuite/tr2/direct_bases/value.cc: Likewise.
* testsuite/util/exception/safety.h [!__cpp_rtti]: Don't print
types without RTTI.
---
 .../include/experimental/memory_resource  |  5 ++
 .../testsuite/17_intro/freestanding.cc|  4 +-
 .../testsuite/18_support/exception/38732.cc   |  6 ++
 .../exception_ptr/rethrow_exception.cc|  2 +
 .../18_support/nested_exception/68139.cc  |  1 +
 .../nested_exception/rethrow_if_nested.cc |  7 ++-
 .../testsuite/18_support/type_info/103240.cc  |  1 +
 .../18_support/type_info/fundamental.cc   |  9 +--
 .../18_support/type_info/hash_code.cc |  1 +
 .../testsuite/20_util/any/assign/emplace.cc   |  2 +
 .../testsuite/20_util/any/cons/in_place.cc|  2 +
 .../testsuite/20_util/any/misc/any_cast.cc|  6 ++
 

Re: Ping #5: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Peter Bergner via Gcc-patches
On 5/5/22 2:35 PM, Segher Boessenkool wrote:
> On Thu, May 05, 2022 at 01:59:05PM -0500, Peter Bergner wrote:
>> If we cannot get this in soonish, maybe we can at least get approval for
>> applying Mike's simpler patch to the release branches, specifically GCC 10?
>>
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059#c31
> 
> Just an unconditional
> 
>   callee_isa &= ~OPTION_MASK_P8_FUSION;
>   explicit_isa &= ~OPTION_MASK_P8_FUSION;
> 
> will do, no?  That is fine since these options should never have been
> used to determine if anything can be inlined, in the first place.
>
> A patch like that is pre-approved, even for trunk.

That works for me!  I will apply this directly to GCC 10 and regtest and
push if clean so we can unblock our customer.

As for trunk, GCC 12 & 11, I think we can wait for the backport of Mike's
patch that removes the option altogether.

Peter




[PATCH] i386: Cleanup -m32 usage in the testuite.

2022-05-05 Thread Uros Bizjak via Gcc-patches
Use conditional compilation for ia32 target istead.

2022-05-05  Uroš Bizjak  

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103611-2.c (dg-do): Compile for target ia32.
(dg-options): Remove -m32.
* gcc.target/i386/pr105032.c (dg-do): Compile for taget ia32.
(dg-additional-options): Remove.
* gcc.target/i386/pr104732.c (dg-options): Remove -m32.
* gcc.target/i386/pr99753.c (dg-options): Ditto.

Tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr103611-2.c 
b/gcc/testsuite/gcc.target/i386/pr103611-2.c
index 1555e997ec8..d966a41f03e 100644
--- a/gcc/testsuite/gcc.target/i386/pr103611-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr103611-2.c
@@ -1,5 +1,6 @@
-/* { dg-do compile } */
-/* { dg-options "-m32 -O2 -msse4" } */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -msse4" } */
+
 typedef int __v4si __attribute__ ((__vector_size__ (16)));
 
 long long test1(__v4si v) {
diff --git a/gcc/testsuite/gcc.target/i386/pr104732.c 
b/gcc/testsuite/gcc.target/i386/pr104732.c
index c8954366c6d..baa52573c7a 100644
--- a/gcc/testsuite/gcc.target/i386/pr104732.c
+++ b/gcc/testsuite/gcc.target/i386/pr104732.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target ia32 } } */
-/* { dg-options "-O2 -m32 -msse -march=pentiumpro" } */
+/* { dg-options "-O2 -msse -march=pentiumpro" } */
 
 typedef long long v2di __attribute__((vector_size (16)));
 
diff --git a/gcc/testsuite/gcc.target/i386/pr105032.c 
b/gcc/testsuite/gcc.target/i386/pr105032.c
index 57b21d3cd7a..a45e7555f8f 100644
--- a/gcc/testsuite/gcc.target/i386/pr105032.c
+++ b/gcc/testsuite/gcc.target/i386/pr105032.c
@@ -1,6 +1,5 @@
-/* { dg-do compile } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options "-w" } */
-/* { dg-additional-options "-m32" { target x86_64-*-* } } */
 
 typedef unsigned int size_t;   
 __extension__ typedef long int __off_t;
diff --git a/gcc/testsuite/gcc.target/i386/pr99753.c 
b/gcc/testsuite/gcc.target/i386/pr99753.c
index 1b000bd56b6..95ce5916392 100644
--- a/gcc/testsuite/gcc.target/i386/pr99753.c
+++ b/gcc/testsuite/gcc.target/i386/pr99753.c
@@ -1,5 +1,5 @@
 /* PR target/99753 */
 
 /* { dg-do compile } */
-/* { dg-options "-march=amd -m32" } */
+/* { dg-options "-march=amd" } */
 /* { dg-error "bad value 'amd' for '-march=' switch"  "" { target *-*-* } 0 } 
*/


Re: Ping #5: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Segher Boessenkool
On Thu, May 05, 2022 at 01:59:05PM -0500, Peter Bergner wrote:
> If we cannot get this in soonish, maybe we can at least get approval for
> applying Mike's simpler patch to the release branches, specifically GCC 10?
> 
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059#c31

Just an unconditional

  callee_isa &= ~OPTION_MASK_P8_FUSION;
  explicit_isa &= ~OPTION_MASK_P8_FUSION;

will do, no?  That is fine since these options should never have been
used to determine if anything can be inlined, in the first place.

A patch like that is pre-approved, even for trunk.

Thanks,


Segher


[PING^2] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)

2022-05-05 Thread Thomas Schwinge
Hi!

Ping^2.


Grüße
 Thomas


On 2022-04-28T15:48:13+0200, I wrote:
> Hi!
>
> Ping.
>
> On 2022-04-06T11:20:47+0200, I wrote:
>> On 2021-01-14T15:50:23+0100, I wrote:
>>> I'm raising here an issue with HSA libgomp plugin code changes from a
>>> while ago.  While HSA is now no longer relevant for GCC master branch,
>>> the same code has also been copied into the GCN libgomp plugin.
>>
>> Here is another small clean-up patch (to enable further clean-up):
>>
>>> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
>>> build dependence on HSA run-time":
>>>
>>> On 2016-11-22T14:27:44+0100, Martin Jambor  wrote:
 --- a/libgomp/plugin/configfrag.ac
 +++ b/libgomp/plugin/configfrag.ac
>>>
 @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
 tgt_name=hsa
 PLUGIN_HSA=$tgt
 PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
 -   PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
 -   PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
 +   PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
 +   PLUGIN_HSA_LIBS="-ldl"
>>>
>>> So this switched from directly linking against 'libhsa-runtime64.so' to a
>>> 'libdl'-based runtime linking variant.
>>
>> (Not intending to change anything regarding that.)
>>
>>> For avoidance of doubt, [an earlier] change doesn't affect (build-tree) 
>>> testsuite
>>> usage, where we have:
>>>
>>> libgomp/testsuite/libgomp-test-support.exp.in:set hsa_runtime_lib 
>>> "@HSA_RUNTIME_LIB@"
>>>
>>> libgomp/testsuite/lib/libgomp.exp:  append 
>>> always_ld_library_path ":$hsa_runtime_lib"
>>
>> But, as I argue in the attached "libgomp testsuite: Don't amend
>> 'LD_LIBRARY_PATH' for system-provided HSA Runtime library", we should
>> actually clean this up as well.  OK to push that?
>>
>>
>> Grüße
>>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 364d01339883f5276ef09d68a5d9a2e0010ab641 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 6 Apr 2022 10:39:56 +0200
Subject: [PATCH] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for
 system-provided HSA Runtime library

This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can
tell.

'libgomp/testsuite/lib/libgomp.exp:libgomp_init' states:

# For build-tree testing, also consider the library paths used for builing.
# For installed testing, we assume all that to be provided in the sysroot.
if { $blddir != "" } {
[...]
global hsa_runtime_lib
if { $hsa_runtime_lib != "" } {
append always_ld_library_path ":$hsa_runtime_lib"
}
}

However, the libgomp GCN plugin is unconditionally built against the
GCC-shipped 'include/hsa*.h' header files, and at run time does
'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
library "used for builing".  It thus doesn't make sense to amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library.

	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_init): Don't
	'append always_ld_library_path ":$hsa_runtime_lib"'.
	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib): Don't set.
---
 libgomp/testsuite/lib/libgomp.exp | 4 
 libgomp/testsuite/libgomp-test-support.exp.in | 1 -
 2 files changed, 5 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 8c5ecfff0ac..0aaa58f19c5 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -202,10 +202,6 @@ proc libgomp_init { args } {
 	lappend ALWAYS_CFLAGS "additional_flags=-L$cuda_driver_lib"
 	append always_ld_library_path ":$cuda_driver_lib"
 	}
-	global hsa_runtime_lib
-	if { $hsa_runtime_lib != "" } {
-	append always_ld_library_path ":$hsa_runtime_lib"
-	}
 }
 
 # We use atomic operations in the testcases to validate results.
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
index 98fb442b537..3c88d1d5a62 100644
--- a/libgomp/testsuite/libgomp-test-support.exp.in
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -1,6 +1,5 @@
 set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
 set cuda_driver_lib "@CUDA_DRIVER_LIB@"
-set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
 
 set offload_plugins "@offload_plugins@"
 set offload_targets "@offload_targets@"
-- 
2.35.1



[PING] libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'

2022-05-05 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2022-04-28T15:45:20+0200, I wrote:
> Hi Tom!
>
> On 2022-04-08T09:35:44+0200, Tom de Vries  wrote:
>> On 4/8/22 00:27, Thomas Schwinge wrote:
>>> On 2017-01-13T19:11:23+0100, Jakub Jelinek  wrote:
 Especially for distributions it is undesirable to need to have proprietary
 CUDA libraries and headers installed when building GCC.
>>>
 --- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.0 +0100
 +++ libgomp/plugin/configfrag.ac  2017-01-13 17:33:26.608240936 +0100
>>>
 +   PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
 +   PLUGIN_NVPTX_LIBS='-ldl'
 +   PLUGIN_NVPTX_DYNAMIC=1
>>>
 +AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
 +  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it 
 should be linked against it.])
>>>
>>> Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
>>> control two orthogonal aspects; OK to disentangle that with the attached
>>> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
>>> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
>
>> we discussed dropping --with-cuda, so do I understand it correctly that
>> you now propose to drop --with-cuda and --with-cuda-driver-lib but
>> intend to keep --with-cuda-driver-include ?
>
> No, I think you're reading too much into this first patch.  ;-)
>
> The goal with this patch is just to help disentangle two orthogonal
> concepts (as described in the commit log), and then...
>
>> Can you explain what user or maintainer scenario is served by this?
>
> ... in a next step, we may indeed remove the current user-visible
> '--with-cuda-driver' etc., but keep the underlying functionality
> available for the developers.  That's to address the point you'd made in
> the "Proposal to remove '--with-cuda-driver'" thread: that it still
> "could be useful for debugging / comparison purposes" -- and especially
> for development purposes, in my opinion: if you develop CUDA API-level
> changes in the libgomp nvptx plugin, it's likely to be easier to just use
> the full CUDA toolkit 'cuda.h' and directly link against libcuda (so that
> you've got all symbols etc. available), and only once you know what
> exactly you need, update GCC's 'include/cuda/cuda.h' and
> 'libgomp/plugin/cuda-lib.def'.
>
> With that hopefully clarified, OK to push the re-attached
> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
>
>> Is
>> there a problem with using gcc's cuda.h?
>
> No, all good.
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c455522ac5d8ab41e5d11f8997678e042ff48e87 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Apr 2022 23:10:16 +0200
Subject: [PATCH] libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'

Including the GCC-shipped 'include/cuda/cuda.h' vs. system  and
'dlopen'ing the CUDA Driver library vs. linking it are separate concerns.

	libgomp/
	* plugin/Makefrag.am: Handle 'PLUGIN_NVPTX_DYNAMIC'.
	* plugin/configfrag.ac (PLUGIN_NVPTX_DYNAMIC): Change
	'AC_DEFINE_UNQUOTED' into 'AM_CONDITIONAL'.
	* plugin/plugin-nvptx.c: Split 'PLUGIN_NVPTX_DYNAMIC' into
	'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and
	'PLUGIN_NVPTX_LINK_LIBCUDA'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
---
 libgomp/Makefile.in   | 26 +++---
 libgomp/config.h.in   |  4 
 libgomp/configure | 21 +++--
 libgomp/plugin/Makefrag.am| 16 +++-
 libgomp/plugin/configfrag.ac  |  3 +--
 libgomp/plugin/plugin-nvptx.c |  4 ++--
 6 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 22cb2136a08..d43c584a32d 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -119,8 +119,16 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 @PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
-@PLUGIN_GCN_TRUE@am__append_2 = libgomp-plugin-gcn.la
-@USE_FORTRAN_TRUE@am__append_3 = openacc.f90
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system .
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_2 = -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H \
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@	-DPLUGIN_NVPTX_LINK_LIBCUDA
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__append_3 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_4 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_GCN_TRUE@am__append_5 = libgomp-plugin-gcn.la

Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Segher Boessenkool
On Tue, Apr 12, 2022 at 09:14:55PM -0400, Michael Meissner wrote:
> This is V4 of the patch.  Compared to V3 of the patch, GCC will just
> ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.

But incorrectly :-(

> The splitting of signed halfword and word loads into unsigned load and
> sign extension is now suppressed with -Os, but it is done normally if we
> are not optimizing for space.

I have no idea what that means.  Other than that I asked to remove that.

> This code makes the -mpower8-fusion option a nop.  It is accepted without
> warning, but it does nothing.  Power8 fusion is only enabled if we are tuning
> for a power8.

It should *delete* the option, and have
;; This option existed in the past, but now is always off.
mno-power8-fusion
Target RejectNegative Undocumented Ignore

> The undocumented -mpower8-fusion-sign option is also made into a nop.

That one should be deleted.

> +  /* The Power8 fusion option was removed.  We ignore using it in #pragma and
> + attribute target.  Users may have used the options to suppress errors if
> + they declare an inline function to be specifically power8 and the 
> function
> + was included by power9 or power10 which turned off the power8 fusion
> + support.  */
> +  { "power8-fusion", 0,  false, true  },

What does the comment mean?

> +  /* Don't print options that exist for backwards compatibility, but are
> +  ignored now like -mpower8-fusion.  */
> +  if (!mask)
> + continue;

No.  Such options should not be in the mask at all.

> +/* Power8 has special fusion operations that are enabled if we are tuning for
> +   power8.  This used to be settable with an option (-mpower8-fusion), but 
> that
> +   option has been removed.  */
> +#define TARGET_P8_FUSION (rs6000_tune == PROCESSOR_POWER8)

The plan was to not have p8 fusion at all.  GCC never implemented any of
the more useful p8 fusion things anyway, and those were only marginally
beneficial anyway.

> +/* Power8 fusion does not fuse loads with sign extends.  If we are not
> +   optimizing for space, split loads with sign extension to loads with zero
> +   extension and an explicit sign extend operation, so that the zero 
> extending
> +   load can be fused.  */
> +#define TARGET_P8_FUSION_SIGN(TARGET_P8_FUSION   
> \
> +  && !optimize_function_for_size_p (cfun))

As I said before, don't do this.  Just remove the whole thing.

> +; The -mpower8-fusion and -mpower8-fusion-sign options existed in the past, 
> but
> +; they are ignored now.

Don't put them together.  It is much easier for everything if they are
separate, boring, and exactly like everything else.

>  mpower8-fusion
> -Target Mask(P8_FUSION) Var(rs6000_isa_flags)
> -Fuse certain integer operations together for better performance on power8.
> +Target Undocumented Ignore

It should be deleted, instead, and be replaced with

;; This option existed in the past, but now is always off.
mno-power8-fusion
Target RejectNegative Undocumented Ignore

mpower8-fusion
Target RejectNegative Undocumented WarnRemoved

i.e. just like all other removed flags.  If someone explicitly tries to
enable it he/she *should* get a warning.

>  mpower8-fusion-sign
> -Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags)
> -Allow sign extension in fusion operations.
> +Target Undocumented Ignore

And this one should be completely removed, since no one ever used it.


Segher


[GCC 12][committed] d: Merge upstream dmd 88de5e369.

2022-05-05 Thread Iain Buclaw via Gcc-patches
Hi,

This patch merges the D front-end with upstream dmd 88de5e369,
synchronizing the latest regression fixes from the stable v2.100.0
branch that were found in production and industry codebases.

D front-end changes:

- Merge regression fixes in v2.100.0 branch.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to the releases/gcc-12 branch.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 88de5e369.
---
 gcc/d/dmd/MERGE   |  2 +-
 gcc/d/dmd/traits.d|  2 +-
 gcc/d/dmd/typesem.d   |  8 ++--
 gcc/testsuite/gdc.test/compilable/test23087.d |  9 +
 gcc/testsuite/gdc.test/compilable/test23089.d |  7 +++
 gcc/testsuite/gdc.test/runnable/test23083.d   | 16 
 6 files changed, 40 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gdc.test/compilable/test23087.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/test23089.d
 create mode 100644 gcc/testsuite/gdc.test/runnable/test23083.d

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index 984e375479b..73697fba8e6 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-081d61e157f0064dc93c757d61cd998d3cb5288f
+88de5e369b2c322e55174ae4f3bef5ad0c0c0930
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/traits.d b/gcc/d/dmd/traits.d
index 04e1c47d16e..db77107e4ae 100644
--- a/gcc/d/dmd/traits.d
+++ b/gcc/d/dmd/traits.d
@@ -1515,7 +1515,7 @@ Expression semanticTraits(TraitsExp e, Scope* sc)
 
 if (tf)
 {
-link = fd ? fd.linkage : tf.linkage;
+link = fd ? fd.toAliasFunc().linkage : tf.linkage;
 }
 else
 {
diff --git a/gcc/d/dmd/typesem.d b/gcc/d/dmd/typesem.d
index f63b17752ed..5db7d43371e 100644
--- a/gcc/d/dmd/typesem.d
+++ b/gcc/d/dmd/typesem.d
@@ -3637,12 +3637,16 @@ Expression dotExp(Type mt, Scope* sc, Expression e, 
Identifier ident, int flag)
 }
 else
 {
+Expression e0;
+Expression ev = e;
+ev = extractSideEffect(sc, "__tup", e0, ev);
+
 const length = cast(size_t)mt.dim.toUInteger();
 auto exps = new Expressions();
 exps.reserve(length);
 foreach (i; 0 .. length)
-exps.push(new IndexExp(e.loc, e, new IntegerExp(e.loc, i, 
Type.tsize_t)));
-e = new TupleExp(e.loc, exps);
+exps.push(new IndexExp(e.loc, ev, new IntegerExp(e.loc, i, 
Type.tsize_t)));
+e = new TupleExp(e.loc, e0, exps);
 }
 }
 else
diff --git a/gcc/testsuite/gdc.test/compilable/test23087.d 
b/gcc/testsuite/gdc.test/compilable/test23087.d
new file mode 100644
index 000..6927ddf04df
--- /dev/null
+++ b/gcc/testsuite/gdc.test/compilable/test23087.d
@@ -0,0 +1,9 @@
+// https://issues.dlang.org/show_bug.cgi?id=23087
+struct S
+{
+this(bool) {}
+this(bool, int) {}
+}
+
+static foreach (ctor; __traits(getOverloads, S, "__ctor"))
+static assert(__traits(getLinkage, ctor) == "D");
diff --git a/gcc/testsuite/gdc.test/compilable/test23089.d 
b/gcc/testsuite/gdc.test/compilable/test23089.d
new file mode 100644
index 000..1bc29138573
--- /dev/null
+++ b/gcc/testsuite/gdc.test/compilable/test23089.d
@@ -0,0 +1,7 @@
+// https://issues.dlang.org/show_bug.cgi?id=23089
+extern(System) int i23089;
+
+extern(System):
+
+alias F23089 = void function(int);
+F23089 f23089;
diff --git a/gcc/testsuite/gdc.test/runnable/test23083.d 
b/gcc/testsuite/gdc.test/runnable/test23083.d
new file mode 100644
index 000..41c881f30a5
--- /dev/null
+++ b/gcc/testsuite/gdc.test/runnable/test23083.d
@@ -0,0 +1,16 @@
+// https://issues.dlang.org/show_bug.cgi?id=23083
+int calls = 0;
+
+int[2] f()
+{
+calls++;
+return [123, 456];
+}
+
+void g(int a, int b) {}
+
+void main()
+{
+g(f().tupleof);
+assert(calls == 1);
+}
-- 
2.34.1



Re: Ping #5: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-05-05 Thread Peter Bergner via Gcc-patches
On 5/2/22 8:06 PM, Michael Meissner wrote:
> Ping #5:
> 
> | Date: Tue, 12 Apr 2022 21:14:55 -0400
> | From: Michael Meissner 
> | Subject: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR 
> target/102059
> | Message-ID: 
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593153.html
> 
> We really need closure on this so I can do the backport to GCC 10 that the
> customer is asking for.
> 
If we cannot get this in soonish, maybe we can at least get approval for
applying Mike's simpler patch to the release branches, specifically GCC 10?

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059#c31


Peter


Re: [PATCH] Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.

2022-05-05 Thread Iain Sandoe via Gcc-patches



> On 5 May 2022, at 19:50, Martin Liška  wrote:
> 
> On 5/5/22 20:35, Andrew Pinski wrote:
>> GCC_VERSION will be 0 if GCC is not being used.
>> So you need to audit these better really.
> 
> Ah, I see. So it basically means all the non-GCC conditional
> code needs to remain and I can replace
> #if GCC_VERSION >= X_Y_Z with #ifdef __GNUC__

I think several non-GCC compilers define __GNUC__ so that depends on what you 
intend.
Iain

> 
> Am I correct?
> Martin



Re: [PATCH] Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.

2022-05-05 Thread Martin Liška
On 5/5/22 20:35, Andrew Pinski wrote:
> GCC_VERSION will be 0 if GCC is not being used.
> So you need to audit these better really.

Ah, I see. So it basically means all the non-GCC conditional
code needs to remain and I can replace
#if GCC_VERSION >= X_Y_Z with #ifdef __GNUC__

Am I correct?
Martin


Re: [PATCH] testsuite: Update Wconversion testcase check type.

2022-05-05 Thread Marek Polacek via Gcc-patches
On Thu, May 05, 2022 at 06:33:20PM +0800, jiawei wrote:
> Some compiler target like arm-linux\riscv\power\s390x\xtensa-gcc handle 
> char as unsigned char, then there are no warnings occur and got FAIL cases.
> Just change the type char into explicit signed char to keep the feature
> consistency.
> 
> gcc/testsuite/ChangeLog:
> 
> * c-c++-common/Wconversion-1.c: Update type.

Ok, and sorry for introducing this problem!
 
> ---
>  gcc/testsuite/c-c++-common/Wconversion-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/c-c++-common/Wconversion-1.c 
> b/gcc/testsuite/c-c++-common/Wconversion-1.c
> index ed65918c70f..7053f6b5dbb 100644
> --- a/gcc/testsuite/c-c++-common/Wconversion-1.c
> +++ b/gcc/testsuite/c-c++-common/Wconversion-1.c
> @@ -10,5 +10,5 @@ void g()
>signed char sc = 300; /* { dg-warning "conversion from .int. to .signed 
> char. changes value from .300. to .44." } */
>unsigned char uc = 300; /* { dg-warning "conversion from .int. to 
> .unsigned char. changes value from .300. to .44." } */
>unsigned char uc2 = 300u; /* { dg-warning "conversion from .unsigned int. 
> to .unsigned char. changes value from .300. to .44." } */
> -  char c2 = (double)1.0 + 200; /* { dg-warning "overflow in conversion from 
> .double. to .char. changes value from .2.01e\\+2. to .127." } */
> +  signed char c2 = (double)1.0 + 200; /* { dg-warning "overflow in 
> conversion from .double. to .signed char. changes value from .2.01e\\+2. to 
> .127." } */
>  }
> -- 
> 2.25.1
> 

Marek



Re: [PATCH] libsanitizer: cherry-pick commit f52e365092aa from upstream

2022-05-05 Thread H.J. Lu via Gcc-patches
On Thu, May 5, 2022 at 11:28 AM Martin Liška  wrote:
>
> On 5/5/22 18:21, H.J. Lu wrote:
> > On Thu, May 5, 2022 at 4:24 AM Martin Liška  wrote:
> >>
> >> On 5/5/22 01:07, H.J. Lu wrote:
> >>> On Wed, May 4, 2022 at 1:59 AM Martin Liška  wrote:
> 
>  Hello.
> 
>  I'm going to do merge from upstream.
> 
>  Patch can bootstrap on x86_64-linux-gnu and survives regression
>  tests. I've also tested on ppc64le-linux-gnu and verified the ABI.
> 
>  The only real change is a small change in
>  gcc/testsuite/c-c++-common/asan/alloca_loop_unpoisoning.c where we
>  need --param=asan-use-after-return=0.
> 
>  I'm going to push the patches.
> >>>
> >>> Hi,
> >>>
> >>> I am checking in this patch to cherry-pick
> >>>
> >>> f52e365092aa [sanitizer] Use newfstatat for x32
> >>>
> >>> to restore x32 build.
> >>>
> >>
> >> I'm going to do one more merge from upstream
> >> (75f9e83ace52773af65dcebca543005ec8a2705d) as we want to include Tobias's
> >> revision 6f095babc2b7d564168c7afc5bf6afb2188fd6b4 and my
> >> revision f1b9245199f3457a4d06d32d1bc6e44573c166e3.
> >
> > I am testing a patch for
> >
> > https://github.com/llvm/llvm-project/issues/55288

I submitted:

https://reviews.llvm.org/D125025

> > to fix:
> >
> > https://gcc.gnu.org/pipermail/gcc-regression/2022-May/076571.html
>
> Interesting. How did you run these tests that the error shows up?

Just normal GCC bootstrap and check with x32 enabled.

> >
> > The same bug is also in GCC 12.  But somehow, it doesn't show up in
> > GCC tests.
>
> So please backport it once it's merged.
>

Will do after GCC 12 is released.

Thanks.

-- 
H.J.


Re: [PATCH] Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.

2022-05-05 Thread Andrew Pinski via Gcc-patches
On Thu, May 5, 2022 at 5:19 AM Martin Liška  wrote:
>
> Right now, the minimal required version of GCC is 4.8.x
> that is a version that well supports c++11.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

No. This is broken as GCC_VERSION is used for two things. First it is
used to say which GCC version it is being compiled with and second it
also says if GCC is being used.
GCC_VERSION will be 0 if GCC is not being used.
So you need to audit these better really.

Thanks,
Andrew Pinski

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * bitmap.cc (bitmap_popcount):
> Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
> (bitmap_count_bits_in_word): Likewise.
> (bitmap_single_bit_set_p): Likewise.
> (bitmap_first_set_bit): Likewise.
> (bitmap_last_set_bit): Likewise.
> * bitmap.h (if): Likewise.
> * config/ia64/ia64.cc (RWS_FIELD_TYPE): Likewise.
> * config/rs6000/rs6000.h (if): Likewise.
> * defaults.h: Likewise.
> * diagnostic-core.h (if): Likewise.
> (ATTRIBUTE_GCC_DIAG): Likewise.
> * dwarf2cfi.cc (if): Likewise.
> * dwarf2out.cc (if): Likewise.
> (DWARF2_ASM_LINE_DEBUG_INFO): Likewise.
> (DWARF2_ASM_VIEW_DEBUG_INFO): Likewise.
> * gcc.cc (if): Likewise.
> * genautomata.cc (struct state_ainsn_table): Likewise.
> (regexp_mode_check_failed): Likewise.
> (REGEXP_ONEOF): Likewise.
> * genconditions.cc (write_header): Likewise.
> (write_writer): Likewise.
> * genmatch.cc: Likewise.
> * genmodes.cc (GCC_INSN_MODES_INLINE_H): Likewise.
> * genoutput.cc (output_insn_data): Likewise.
> * ggc-page.cc (if): Likewise.
> (prefetch): Likewise.
> (ggc_internal_alloc): Likewise.
> * ggc-tests.cc (test_finalization): Likewise.
> * ggc.h (need_finalization_p): Likewise.
> * hwint.cc (floor_log2): Likewise.
> (ceil_log2): Likewise.
> (exact_log2): Likewise.
> (ctz_hwi): Likewise.
> (clz_hwi): Likewise.
> (ffs_hwi): Likewise.
> (popcount_hwi): Likewise.
> * hwint.h (HAVE_LONG_LONG): Likewise.
> (SIZEOF_LONG_LONG): Likewise.
> (sizeof_long_long_must_be_8[sizeof): Likewise.
> (clz_hwi): Likewise.
> (ctz_hwi): Likewise.
> (ffs_hwi): Likewise.
> (popcount_hwi): Likewise.
> (exact_log2): Likewise.
> (floor_log2): Likewise.
> (ceil_log2): Likewise.
> * ira-int.h: Likewise.
> * machmode.h (mode_to_bytes): Likewise.
> (mode_to_inner): Likewise.
> (mode_to_unit_size): Likewise.
> (mode_to_unit_precision): Likewise.
> (mode_to_nunits): Likewise.
> * output.h (ATTRIBUTE_ASM_FPRINTF): Likewise.
> * pretty-print.h (ATTRIBUTE_GCC_PPDIAG): Likewise.
> * rtl.cc (dump_rtx_statistics): Likewise.
> * rtl.h (test): Likewise.
> (RTX_FLAG): Likewise.
> (enum label_kind): Likewise.
> * sbitmap.cc (sbitmap_popcount): Likewise.
> (bitmap_count_bits): Likewise.
> * stringpool.h (get_identifier_with_length): Likewise.
> * system.h (HAVE_DESIGNATED_INITIALIZERS): Likewise.
> (HAVE_DESIGNATED_UNION_INITIALIZERS): Likewise.
> (if): Likewise.
> (__FUNCTION__): Likewise.
> (__builtin_expect): Likewise.
> (elif): Likewise.
> (gcc_assert): Likewise.
> (ALWAYS_INLINE): Likewise.
> (WARN_UNUSED_RESULT): Likewise.
> (STATIC_CONSTANT_P): Likewise.
> (defined): Likewise.
> (BROKEN_VALUE_INITIALIZATION): Likewise.
> (DEBUG_FUNCTION): Likewise.
> (DEBUG_VARIABLE): Likewise.
> * tree-vrp.cc (vrp_asserts::find_switch_asserts): Likewise.
> * tree.cc (get_file_function_name): Likewise.
> * tree.h (as_internal_fn): Likewise.
> (if): Likewise.
> (DECL_RTL_KNOWN_SET): Likewise.
> (prepare_target_option_nodes_for_pch): Likewise.
> (tree_operand_length): Likewise.
> (tree_to_poly_uint64): Likewise.
> * var-tracking.cc (int_mem_offset): Likewise.
> * vec.h (if): Likewise.
> * wide-int.cc (defined): Likewise.
> (if): Likewise.
>
> gcc/cp/ChangeLog:
>
> * cp-tree.h (BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK):
> Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
> (STRIP_TEMPLATE): Likewise.
> * tree.cc (cp_tree_c_finish_parsing): Likewise.
>
> gcc/fortran/ChangeLog:
>
> * gfortran.h (ATTRIBUTE_GCC_GFC):
> Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
>
> gcc/jit/ChangeLog:
>
> * jit-common.h (GNU_PRINTF):
> Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
> ---
>  gcc/bitmap.cc  |  73 +-
>  

Re: [PATCH] libsanitizer: cherry-pick commit f52e365092aa from upstream

2022-05-05 Thread Martin Liška
On 5/5/22 18:21, H.J. Lu wrote:
> On Thu, May 5, 2022 at 4:24 AM Martin Liška  wrote:
>>
>> On 5/5/22 01:07, H.J. Lu wrote:
>>> On Wed, May 4, 2022 at 1:59 AM Martin Liška  wrote:

 Hello.

 I'm going to do merge from upstream.

 Patch can bootstrap on x86_64-linux-gnu and survives regression
 tests. I've also tested on ppc64le-linux-gnu and verified the ABI.

 The only real change is a small change in
 gcc/testsuite/c-c++-common/asan/alloca_loop_unpoisoning.c where we
 need --param=asan-use-after-return=0.

 I'm going to push the patches.
>>>
>>> Hi,
>>>
>>> I am checking in this patch to cherry-pick
>>>
>>> f52e365092aa [sanitizer] Use newfstatat for x32
>>>
>>> to restore x32 build.
>>>
>>
>> I'm going to do one more merge from upstream
>> (75f9e83ace52773af65dcebca543005ec8a2705d) as we want to include Tobias's
>> revision 6f095babc2b7d564168c7afc5bf6afb2188fd6b4 and my
>> revision f1b9245199f3457a4d06d32d1bc6e44573c166e3.
> 
> I am testing a patch for
> 
> https://github.com/llvm/llvm-project/issues/55288
> 
> to fix:
> 
> https://gcc.gnu.org/pipermail/gcc-regression/2022-May/076571.html

Interesting. How did you run these tests that the error shows up?

> 
> The same bug is also in GCC 12.  But somehow, it doesn't show up in
> GCC tests.

So please backport it once it's merged.

Martin



Re: [PATCH] Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.

2022-05-05 Thread Martin Liška
On 5/5/22 14:24, Richard Biener wrote:
> Hmm, but we support C++11 host compilers that are not GCC but
> may claim to be, with GCC_VERSION 4.2.x for example.  Are we sure
> all those liars implement what we guard with the version checks?

Do you know about any real example of such a liars?
Why should we even care about them?

Martin

> 
> I suppose to be "correct" we'd at least need to preserve
> #if __GNUC__
> in places where we might use the host compiler?  (if compilers then lie
> it's their own fault)



Re: [PATCH] Add operators / and * for profile_{count,probability}.

2022-05-05 Thread Martin Liška
On 5/5/22 15:49, Jan Hubicka wrote:
> Hi,
>> The patch simplifies usage of the profile_{count,probability} types.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> 
> The reason I intentionally did not add * and / to the original API was
> to detect situations where values that should be
> profile_count/profile_probability are stored into integers, since
> previous code used integers for everything.
> 
> Having one to add apply_scale made him/her (mostly me :) to think if the
> value is really just a fixed scale or it it should be better converted
> to proper data type (count or probability).
> 
> I guess now we completed the conversion so risk of this creeping in is
> relatively low and the code indeed looks better.

Yes, that's my impression as well that the profiling code is quite settled down.

> It will make it bit
> harder for me to backport jump threading profile updating fixes I plan
> for 12.2 but it should not be hard.

You'll manage ;)

>> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
>> index b4357c03e86..a1ac1146445 100644
>> --- a/gcc/cfgloopmanip.cc
>> +++ b/gcc/cfgloopmanip.cc
>> @@ -563,8 +563,7 @@ scale_loop_profile (class loop *loop, 
>> profile_probability p,
>>  
>>/* Probability of exit must be 1/iterations.  */
>>count_delta = e->count ();
>> -  e->probability = profile_probability::always ()
>> -.apply_scale (1, iteration_bound);
>> +  e->probability = profile_probability::always () / iteration_bound;
> However this is kind of example of the problem. 
> iteration_bound is gcov_type so we can get overflow here.

typedef int64_t gcov_type;

and apply_scale takes int64_t types as arguments. Similarly the newly added 
operators,
so how can that change anything?

> I guess we want to downgrade iteration_bound since it is always either 0
> or int.
>> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
>> index e14b4e6c94a..cef26a9878e 100644
>> --- a/gcc/tree-switch-conversion.cc
>> +++ b/gcc/tree-switch-conversion.cc
>> @@ -1782,7 +1782,7 @@ switch_decision_tree::analyze_switch_statement ()
>>tree high = CASE_HIGH (elt);
>>  
>>profile_probability p
>> -= case_edge->probability.apply_scale (1, (intptr_t) (case_edge->aux));
>> += case_edge->probability / ((intptr_t) (case_edge->aux));
> 
> I think the switch ranges may be also in risk of overflow?
> 
> We could make operators to accept gcov_type or int64_t.

As explained, they do.

Cheers,
Martin

> 
> Thanks,
> Honza



Re: [PATCH] Come up with {,UN}LIKELY macros.

2022-05-05 Thread Martin Liška
On 5/5/22 17:31, Segher Boessenkool wrote:
> On Thu, May 05, 2022 at 09:06:45AM -0400, Marek Polacek via Gcc-patches wrote:
>> On Thu, May 05, 2022 at 02:31:05PM +0200, Martin Liška wrote:
>>> Some parts of the compiler already define:
>>> #define likely(cond) __builtin_expect ((cond), 1)
>>>
>>> So the patch should unify it.
> 
>> That's funny, yesterday I added another one: 
>> cp/parser.cc:cp_parser_init_declarator
>> which is not replaced in this patch.
>>
>> I would've preferred the name gcc_{,un}likely but I don't want to start
>> a long bikeshedding...
> 
> GCC_LIKELY is fine with me.  A bare LIKELY isn't though.  We have much
> more common macros having LIKELY in the name already (PROB_*LIKELY,
> CLASS_LIKELY_SPILLED, the various IPA things, loop versioning, etc.),
> but also we have LIKELY and UNLIKELY as function arguments in various
> places.

Well, out of the 2 suggested names (GCC_LIKELY and gcc_likely), I prefer 
GCC_LIKELY.
You are right that LIKELY may confuse various people.

Is the community fine with the suggested name?

Martin

> 
> 
> Segher



[PATCH RFA] attribs: fix typedefs in generic code [PR105492]

2022-05-05 Thread Jason Merrill via Gcc-patches
In my patch for PR100545 I added an assert to check for broken typedefs in
set_underlying_type, and it found one in this case:
rs6000_handle_altivec_attribute had the same problem as
handle_mode_attribute.  So let's move the fixup into decl_attributes.

Tested that this fixes the ICE on a cross compiler, regression tested
x86_64-pc-linux-gnu, OK for trunk?

PR c/105492

gcc/ChangeLog:

* attribs.cc (decl_attributes): Fix broken typedefs here.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_mode_attribute): Don't fix broken typedefs
here.
---
 gcc/attribs.cc| 15 +++
 gcc/c-family/c-attribs.cc | 10 --
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index b219f878042..0648391f0c6 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -872,6 +872,21 @@ decl_attributes (tree *node, tree attributes, int flags,
  tree ret = (spec->handler) (cur_and_last_decl, name, args,
  flags|cxx11_flag, _add_attrs);
 
+ /* Fix up typedefs clobbered by attribute handlers.  */
+ if (TREE_CODE (*node) == TYPE_DECL
+ && anode == _TYPE (*node)
+ && DECL_ORIGINAL_TYPE (*node)
+ && TYPE_NAME (*anode) == *node
+ && TYPE_NAME (cur_and_last_decl[0]) != *node)
+   {
+ tree t = cur_and_last_decl[0];
+ DECL_ORIGINAL_TYPE (*node) = t;
+ tree tt = build_variant_type_copy (t);
+ cur_and_last_decl[0] = tt;
+ TREE_TYPE (*node) = tt;
+ TYPE_NAME (tt) = *node;
+   }
+
  *anode = cur_and_last_decl[0];
  if (ret == error_mark_node)
{
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index b1953a45f9b..a280987c111 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2204,16 +2204,6 @@ handle_mode_attribute (tree *node, tree name, tree args,
 TYPE_QUALS (type));
   if (TYPE_USER_ALIGN (type))
*node = build_aligned_type (*node, TYPE_ALIGN (type));
-
-  tree decl = node[2];
-  if (decl && TYPE_NAME (type) == decl)
-   {
- /* Set up the typedef all over again.  */
- DECL_ORIGINAL_TYPE (decl) = NULL_TREE;
- TREE_TYPE (decl) = *node;
- set_underlying_type (decl);
- *node = TREE_TYPE (decl);
-   }
 }
 
   return NULL_TREE;

base-commit: 000f4480005035d0811e009a7cb25b42721f0a6e
-- 
2.27.0



[PATCH][_Hashtable] Fix insertion of range of type convertible to value_type PR 56112

2022-05-05 Thread François Dumont via Gcc-patches

Hi

Renewing my patch to fix PR 56112 but for the insert methods, I totally 
change it, now works also with move-only key types.


I let you Jonathan find a better name than _ValueTypeEnforcer as usual :-)

libstdc++: [_Hashtable] Insert range of types convertible to value_type 
PR 56112


Fix insertion of range of types convertible to value_type. Fix also when 
this value_type

has a move-only key_type which also allow converted values to be moved.

libstdc++-v3/ChangeLog:

    PR libstdc++/56112
    * include/bits/hashtable_policy.h (_ValueTypeEnforcer): New.
    * include/bits/hashtable.h 
(_Hashtable<>::_M_insert_unique_aux): New.
    (_Hashtable<>::_M_insert(_Arg&&, const _NodeGenerator&, 
true_type)): Use latters.
    (_Hashtable<>::_M_insert(_Arg&&, const _NodeGenerator&, 
false_type)): Likewise.
    (_Hashtable(_InputIterator, _InputIterator, size_type, const 
_Hash&, const _Equal&,

    const allocator_type&, true_type)): Use this.insert range.
    (_Hashtable(_InputIterator, _InputIterator, size_type, const 
_Hash&, const _Equal&,

    const allocator_type&, false_type)): Use _M_insert.
    * testsuite/23_containers/unordered_map/cons/56112.cc: Check 
how many times conversion

    is done.
    (test02): New test case.
    * testsuite/23_containers/unordered_set/cons/56112.cc: New test.

Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 5e1a417f7cd..cd42d3c9ba0 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -898,21 +898,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 	std::pair
-	_M_insert(_Arg&& __arg, const _NodeGenerator& __node_gen,
-		  true_type /* __uks */)
+	_M_insert_unique_aux(_Arg&& __arg, const _NodeGenerator& __node_gen)
 	{
 	  return _M_insert_unique(
 	_S_forward_key(_ExtractKey{}(std::forward<_Arg>(__arg))),
 	std::forward<_Arg>(__arg), __node_gen);
 	}
 
+  template
+	std::pair
+	_M_insert(_Arg&& __arg, const _NodeGenerator& __node_gen,
+		  true_type /* __uks */)
+	{
+	  using __to_value
+	= __detail::_ValueTypeEnforcer<_ExtractKey, value_type>;
+	  return _M_insert_unique_aux(
+	__to_value{}(std::forward<_Arg>(__arg)), __node_gen);
+	}
+
   template
 	iterator
 	_M_insert(_Arg&& __arg, const _NodeGenerator& __node_gen,
 		  false_type __uks)
 	{
-	  return _M_insert(cend(), std::forward<_Arg>(__arg), __node_gen,
-			   __uks);
+	  using __to_value
+	= __detail::_ValueTypeEnforcer<_ExtractKey, value_type>;
+	  return _M_insert(cend(),
+	__to_value{}(std::forward<_Arg>(__arg)), __node_gen, __uks);
 	}
 
   // Insert with hint, not used when keys are unique.
@@ -1184,10 +1196,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		 const _Hash& __h, const _Equal& __eq,
 		 const allocator_type& __a, true_type /* __uks */)
   : _Hashtable(__bkt_count_hint, __h, __eq, __a)
-  {
-	for (; __f != __l; ++__f)
-	  this->insert(*__f);
-  }
+  { this->insert(__f, __l); }
 
   templateinsert(*__f);
+	  _M_insert(*__f, __node_gen, __uks);
   }
 
   template(__x).first; }
   };
 
+  template
+struct _ValueTypeEnforcer;
+
+  template
+struct _ValueTypeEnforcer<_Identity, _Value>
+{
+  template
+	constexpr _Kt&&
+	operator()(_Kt&& __k) noexcept
+	{ return std::forward<_Kt>(__k); }
+};
+
+  template
+struct _ValueTypeEnforcer<_Select1st, _Value>
+{
+  constexpr _Value&&
+  operator()(_Value&& __x) noexcept
+  { return std::move(__x); }
+
+  constexpr const _Value&
+  operator()(const _Value& __x) noexcept
+  { return __x; }
+
+  using __fst_type = typename _Value::first_type;
+
+  template
+	using __mutable_value_t =
+	  std::pair<
+	typename std::remove_const::type,
+	typename _Pair::second_type>;
+
+  constexpr __enable_if_t::value,
+			  __mutable_value_t<_Value>&&>
+  operator()(__mutable_value_t<_Value>&& __x) noexcept
+  { return std::move(__x); }
+
+  constexpr __enable_if_t::value,
+			  const __mutable_value_t<_Value>&>
+  operator()(const __mutable_value_t<_Value>& __x) noexcept
+  { return __x; }
+
+  template
+	constexpr std::pair<_Kt, _Val>&&
+	operator()(std::pair<_Kt, _Val>&& __x) noexcept
+	{ return std::move(__x); }
+
+  template
+	constexpr const std::pair<_Kt, _Val>&
+	operator()(const std::pair<_Kt, _Val>& __x) noexcept
+	{ return __x; }
+};
+
   template
 struct _NodeBuilder;
 
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/cons/56112.cc b/libstdc++-v3/testsuite/23_containers/unordered_map/cons/56112.cc
index c4cdeee234c..4476103c986 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_map/cons/56112.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/cons/56112.cc
@@ -20,30 +20,108 @@
 #include 
 #include 
 
+#include 
+
 struct Key
 {
   explicit Key(const int* p) : value(p) 

Re: [PATCH] Skip constant folding for fmin/max when either argument is sNaN [PR105414]

2022-05-05 Thread Joseph Myers
On Thu, 5 May 2022, Richard Biener via Gcc-patches wrote:

> MIN/MAX_EXPR shouldn't even appear with -fsignalling-nans for this
> reason, at least that's what I thought.  But yes, you might have a point
> here (but maybe it's also not strictly enough specified).  One option would
> be to do (minmax == MAX_EXPR || minmax == MIN_EXPR || !tree_expr ...)
> 
> Joseph - are MIN_EXPR and MAX_EXPR supposed to turn sNaN into qNaN
> and the 'undefinedness' is merely as to which operand is chosen?

I don't know what MIN_EXPR and MAX_EXPR are supposed to do with sNaN 
arguments.  As noted, the fmax and fmin functions should produce a qNaN 
result with the "invalid" exception raised (so with an sNaN argument, it's 
definitely not valid to fold them in the absence of -fno-trapping-math; 
with -fsignaling-nans -fno-trapping-math, if an argument is known to be 
sNaN it would be valid to fold to qNaN, but I doubt there's much use of 
that option combination).

C never attempts to define which qNaN result (choice of payload or sign 
bit) is produced by an operation and I don't think our optimizations 
should be trying to define that (with any command-line options currently 
supported) either (other than for non-computational operations such as 
fabs and copysign, but even there there is scope for 
implementation-defined handling of assignment as a convertFormat operation 
rather than a copy operation).  Note that while some architectures 
propagate NaN payloads from a NaN operand to an instruction, others (e.g. 
RISC-V) do not, and when payloads are propagated there would still be the 
matter of which payload is chosen when there is more than one NaN operand 
(on x86, that is handled differently for SSE and x87 instructions).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libsanitizer: cherry-pick commit f52e365092aa from upstream

2022-05-05 Thread H.J. Lu via Gcc-patches
On Thu, May 5, 2022 at 4:24 AM Martin Liška  wrote:
>
> On 5/5/22 01:07, H.J. Lu wrote:
> > On Wed, May 4, 2022 at 1:59 AM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> I'm going to do merge from upstream.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression
> >> tests. I've also tested on ppc64le-linux-gnu and verified the ABI.
> >>
> >> The only real change is a small change in
> >> gcc/testsuite/c-c++-common/asan/alloca_loop_unpoisoning.c where we
> >> need --param=asan-use-after-return=0.
> >>
> >> I'm going to push the patches.
> >
> > Hi,
> >
> > I am checking in this patch to cherry-pick
> >
> > f52e365092aa [sanitizer] Use newfstatat for x32
> >
> > to restore x32 build.
> >
>
> I'm going to do one more merge from upstream
> (75f9e83ace52773af65dcebca543005ec8a2705d) as we want to include Tobias's
> revision 6f095babc2b7d564168c7afc5bf6afb2188fd6b4 and my
> revision f1b9245199f3457a4d06d32d1bc6e44573c166e3.

I am testing a patch for

https://github.com/llvm/llvm-project/issues/55288

to fix:

https://gcc.gnu.org/pipermail/gcc-regression/2022-May/076571.html

The same bug is also in GCC 12.  But somehow, it doesn't show up in
GCC tests.

-- 
H.J.


Re: [PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections

2022-05-05 Thread Julian Brown
On Thu, 5 May 2022 14:40:38 +0200
Jakub Jelinek via Gcc-patches  wrote:

> On Thu, May 05, 2022 at 12:46:29PM +0100, Julian Brown wrote:
> > All the above (at least) has been done as part of the patch series
> > posted here:
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591973.html  
> 
> Ah, ok, so is this patch superseded by that series, or do you want to
> apply it just to be removed again?

Regarding this one (Chung-Lin's, not mine), I'm not sure -- I don't
know if it addresses a problem that is still present with my patch
series applied, nor do I know if my series and this patch have been
tested together. We might need to do some work to integrate the bits,
one way or the other.

Thanks,

Julian


Re: [PATCH] Come up with {,UN}LIKELY macros.

2022-05-05 Thread Segher Boessenkool
On Thu, May 05, 2022 at 09:06:45AM -0400, Marek Polacek via Gcc-patches wrote:
> On Thu, May 05, 2022 at 02:31:05PM +0200, Martin Liška wrote:
> > Some parts of the compiler already define:
> > #define likely(cond) __builtin_expect ((cond), 1)
> > 
> > So the patch should unify it.

> That's funny, yesterday I added another one: 
> cp/parser.cc:cp_parser_init_declarator
> which is not replaced in this patch.
> 
> I would've preferred the name gcc_{,un}likely but I don't want to start
> a long bikeshedding...

GCC_LIKELY is fine with me.  A bare LIKELY isn't though.  We have much
more common macros having LIKELY in the name already (PROB_*LIKELY,
CLASS_LIKELY_SPILLED, the various IPA things, loop versioning, etc.),
but also we have LIKELY and UNLIKELY as function arguments in various
places.


Segher


Re: [PATCH] ipa-visibility: Optimize TLS access [PR99619]

2022-05-05 Thread Alexander Monakov
On Thu, 5 May 2022, Jan Hubicka wrote:

> > It follows from how local-dynamic model is defined: we call __tls_get_addr
> > with an argument that identifies the current DSO (not the individual
> > thread-local variable), and then compute the address of the variable with
> > a simple addition, so when there are two or more TLS variables, we can
> > call __tls_get_addr just once (but at -O0 we will end up with redundant
> > calls).
> 
> Thanks for explanation.
> So this is something that really depends on optimization flags of the
> function referring the variable rather than on optimization flags of the
> variable itself and only makes difference if there is -O0 function that
> contains more than one reference to a TLS var?

Well, for an -O0 function it doesn't matter how many different TLS variables
it is referencing. The interesting case is an -O2 function referencing one
local-dynamic TLS variable.

> I guess then a correct answer would be to search for such references.

Presumably at RTL generation time, i.e. let the middle end discover the
most specific TLS access model, and then selectively downgrade local-dynamic
to global-dynamic when lowering an -O0 function.

> What happens when there are multiple object files with a hidden TLS var
> where some gts LOCAL_DYNAMIC and others GLOBAL_DYNAMIC? (Which may
> happen when linking together object files compiled with different
> versions of compiler if we go ahead with this patch on hidden symbols).

They have different relocations, so there's an increase in number of GOT
entries, but otherwise I don't think there's any problem.

Alexander


Re: [PATCH] [PR100106] Reject unaligned subregs when strict alignment is required

2022-05-05 Thread Segher Boessenkool
On Thu, May 05, 2022 at 03:52:01AM -0300, Alexandre Oliva wrote:
> The testcase for pr100106, compiled with optimization for 32-bit
> powerpc -mcpu=604 with -mstrict-align expands the initialization of a
> union from a float _Complex value into a load from an SCmode
> constant pool entry, aligned to 4 bytes, into a DImode pseudo,
> requiring 8-byte alignment.

> +  else if (reg && MEM_P (reg)
> +&& STRICT_ALIGNMENT && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
> +return false;

Please fix the line breaks?  Either do a break before every &&, or put
as many things as possible on one line?

Note that you should never have paradoxical subregs of mem on rs6000 or
any other target with INSN_SCHEDULING.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile { target { ilp32 } } } */
> +/* { dg-options "-mcpu=604 -O -mstrict-align" } */
> +
> +#include "../../gcc.c-torture/compile/pr100106.c"

It is better to copy the 11 lines of code.

Please comment what the ilp32 is for (namely, the -mcpu= will barf
without it)..  The testcase is okay with those changes, thanks!


Seghr


Re: [PATCH] Use more ARRAY_SIZE.

2022-05-05 Thread Martin Liška
On 5/5/22 14:58, Iain Buclaw wrote:
> This D front-end change doesn't look right to me, besides the slight

Hello.

Sorry, I've re-read the patch and fixed some places where the macro usage
was wrong.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

MartinFrom 8d9630e411321c8584dd83ff64ec6fefad48813e Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 13 Jan 2022 18:46:26 +0100
Subject: [PATCH] Use more ARRAY_SIZE.

gcc/ada/ChangeLog:

	* locales.c (iso_639_1_to_639_3): Use ARRAY_SIZE.
	(language_name_to_639_3): Likewise.
	(country_name_to_3166): Likewise.

gcc/analyzer/ChangeLog:

	* engine.cc (exploded_node::get_dot_fillcolor): Use ARRAY_SIZE.
	* function-set.cc (test_stdio_example): Likewise.
	* sm-file.cc (get_file_using_fns): Likewise.
	* sm-malloc.cc (malloc_state_machine::unaffected_by_call_p): Likewise.
	* sm-signal.cc (get_async_signal_unsafe_fns): Likewise.

gcc/ChangeLog:

	* attribs.cc (diag_attr_exclusions): Use ARRAY_SIZE.
	(decls_mismatched_attributes): Likewise.
	* builtins.cc (c_strlen): Likewise.
	* cfg.cc (DEF_BASIC_BLOCK_FLAG): Likewise.
	* common/config/aarch64/aarch64-common.cc (aarch64_option_init_struct): Likewise.
	* config/aarch64/aarch64-builtins.cc (aarch64_lookup_simd_builtin_type): Likewise.
	(aarch64_init_simd_builtin_types): Likewise.
	(aarch64_init_builtin_rsqrt): Likewise.
	* config/aarch64/aarch64.cc (is_madd_op): Likewise.
	* config/arm/arm-builtins.cc (arm_lookup_simd_builtin_type): Likewise.
	(arm_init_simd_builtin_types): Likewise.
	* config/avr/gen-avr-mmcu-texi.cc (mcus[ARRAY_SIZE): Likewise.
	(c_prefix): Likewise.
	(main): Likewise.
	* config/c6x/c6x.cc (N_SAVE_ORDER): Likewise.
	* config/darwin-c.cc (darwin_register_frameworks): Likewise.
	* config/gcn/mkoffload.cc (process_obj): Likewise.
	* config/i386/i386-builtins.cc (get_builtin_code_for_version): Likewise.
	(fold_builtin_cpu): Likewise.
	* config/m32c/m32c.cc (PUSHM_N): Likewise.
	* config/nvptx/mkoffload.cc (process): Likewise.
	* config/rs6000/driver-rs6000.cc (host_detect_local_cpu): Likewise.
	* config/s390/s390.cc (NR_C_MODES): Likewise.
	* config/tilepro/gen-mul-tables.cc (find_sequences): Likewise.
	(create_insn_code_compression_table): Likewise.
	* config/vms/vms.cc (NBR_CRTL_NAMES): Likewise.
	* diagnostic-format-json.cc (json_from_expanded_location): Likewise.
	* dwarf2out.cc (ARRAY_SIZE): Likewise.
	* genhooks.cc (emit_documentation): Likewise.
	(emit_init_macros): Likewise.
	* gimple-ssa-sprintf.cc (format_floating): Likewise.
	* gimple-ssa-warn-access.cc (memmodel_name): Likewise.
	* godump.cc (keyword_hash_init): Likewise.
	* hash-table.cc (hash_table_higher_prime_index): Likewise.
	* input.cc (for_each_line_table_case): Likewise.
	* ipa-free-lang-data.cc (free_lang_data): Likewise.
	* ipa-inline.cc (sanitize_attrs_match_for_inline_p): Likewise.
	* optc-save-gen.awk: Likewise.
	* spellcheck.cc (test_metric_conditions): Likewise.
	* tree-vect-slp-patterns.cc (sizeof): Likewise.
	(ARRAY_SIZE): Likewise.
	* tree.cc (build_common_tree_nodes): Likewise.

gcc/c-family/ChangeLog:

	* c-common.cc (ARRAY_SIZE): Use ARRAY_SIZE.
	(c_common_nodes_and_builtins): Likewise.
	* c-format.cc (check_tokens): Likewise.
	(check_plain): Likewise.
	* c-pragma.cc (c_pp_lookup_pragma): Likewise.
	(init_pragma): Likewise.
	* known-headers.cc (get_string_macro_hint): Likewise.
	(get_stdlib_header_for_name): Likewise.
	* c-attribs.cc: Likewise.

gcc/c/ChangeLog:

	* c-decl.cc (match_builtin_function_types): Use ARRAY_SIZE.

gcc/cp/ChangeLog:

	* module.cc (depset::entity_kind_name): Use ARRAY_SIZE.
	* name-lookup.cc (get_std_name_hint): Likewise.
	* parser.cc (cp_parser_new): Likewise.

gcc/fortran/ChangeLog:

	* frontend-passes.cc (gfc_code_walker): Use ARRAY_SIZE.
	* openmp.cc (gfc_match_omp_context_selector_specification): Likewise.
	* trans-intrinsic.cc (conv_intrinsic_ieee_builtin): Likewise.
	* trans-types.cc (gfc_get_array_descr_info): Likewise.

gcc/jit/ChangeLog:

	* jit-builtins.cc (find_builtin_by_name): Use ARRAY_SIZE.
	(get_string_for_type_id): Likewise.
	* jit-recording.cc (recording::context::context): Likewise.

gcc/lto/ChangeLog:

	* lto-common.cc (lto_resolution_read): Use ARRAY_SIZE.
	* lto-lang.cc (lto_init): Likewise.
---
 gcc/ada/locales.c   |  6 +++---
 gcc/analyzer/engine.cc  |  2 +-
 gcc/analyzer/function-set.cc|  2 +-
 gcc/analyzer/sm-file.cc |  3 +--
 gcc/analyzer/sm-malloc.cc   |  3 +--
 gcc/analyzer/sm-signal.cc   |  3 +--
 gcc/attribs.cc  |  4 ++--
 gcc/builtins.cc |  2 +-
 gcc/c-family/c-attribs.cc   |  3 +--
 gcc/c-family/c-common.cc|  7 ++-
 gcc/c-family/c-format.cc| 12 ++--
 gcc/c-family/c-pragma.cc|  9 -
 gcc/c-family/known-headers.cc   |  5 ++---
 gcc/c/c-decl.cc  

Re: [PATCH] OpenMP, C++: Add template support for the has_device_addr clause.

2022-05-05 Thread Jakub Jelinek via Gcc-patches
On Wed, Feb 23, 2022 at 05:01:45PM +0100, Marcel Vollweiler wrote:
> gcc/cp/ChangeLog:
> 
>   * pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_HAS_DEVICE_ADDR.
>   * semantics.cc (finish_omp_clauses): Handle PARM_DECL and
>   NON_LVALUE_EXPR.
> 
> gcc/ChangeLog:
> 
>   * gimplify.cc (gimplify_scan_omp_clauses): Handle NON_LVALUE_EXPR.
>   (gimplify_adjust_omp_clauses): Likewise.
>   * omp-low.cc (scan_sharing_clauses): Likewise.
>   (lower_omp_target): Likewise.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.c++/target-has-device-addr-7.C: New test.
>   * testsuite/libgomp.c++/target-has-device-addr-8.C: New test.
>   * testsuite/libgomp.c++/target-has-device-addr-9.C: New test.
> 
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -17652,6 +17652,7 @@ tsubst_omp_clauses (tree clauses, enum 
> c_omp_region_type ort,
>   case OMP_CLAUSE_USE_DEVICE_PTR:
>   case OMP_CLAUSE_USE_DEVICE_ADDR:
>   case OMP_CLAUSE_IS_DEVICE_PTR:
> + case OMP_CLAUSE_HAS_DEVICE_ADDR:
>   case OMP_CLAUSE_INCLUSIVE:
>   case OMP_CLAUSE_EXCLUSIVE:
> OMP_CLAUSE_DECL (nc)
> @@ -17797,6 +17798,7 @@ tsubst_omp_clauses (tree clauses, enum 
> c_omp_region_type ort,
> case OMP_CLAUSE_USE_DEVICE_PTR:
> case OMP_CLAUSE_USE_DEVICE_ADDR:
> case OMP_CLAUSE_IS_DEVICE_PTR:
> +   case OMP_CLAUSE_HAS_DEVICE_ADDR:
> case OMP_CLAUSE_INCLUSIVE:
> case OMP_CLAUSE_EXCLUSIVE:
> case OMP_CLAUSE_ALLOCATE:

This part is ok.

> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 0cb17a6..452ecfd 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -8534,11 +8534,14 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   {
> if (handle_omp_array_sections (c, ort))
>   remove = true;
> +   else if (TREE_CODE (TREE_CHAIN (t)) == PARM_DECL)
> + t = TREE_CHAIN (t);
> else
>   {
> t = OMP_CLAUSE_DECL (c);
> while (TREE_CODE (t) == INDIRECT_REF
> -  || TREE_CODE (t) == ARRAY_REF)
> +  || TREE_CODE (t) == ARRAY_REF
> +  || TREE_CODE (t) == NON_LVALUE_EXPR)
>   t = TREE_OPERAND (t, 0);
>   }
>   }

This is wrong.
When processing_template_decl, handle_omp_array_sections often punts, keeps
things as is because if something is dependent, we can't do much about it.
The else if (TREE_CODE (TREE_CHAIN (t)) == PARM_DECL) is obviously wrong,
there is really nothing specific about PARM_DECLs (just that you used
exactly that in the testcase), nor about array section with exactly one
dimension.  What is done elsewhere is look through all TREE_LISTs to find
the base expression, and if that expression is a VAR_DECL/PARM_DECL, nice,
we can do further processing, if processing_template_decl and it is
something different, just defer and otherwise error out.

So I think you want:
--- gcc/cp/semantics.cc.jj  2022-05-05 11:56:16.160443828 +0200
+++ gcc/cp/semantics.cc 2022-05-05 15:52:39.651211448 +0200
@@ -8553,14 +8553,23 @@ finish_omp_clauses (tree clauses, enum c
  else
{
  t = OMP_CLAUSE_DECL (c);
+ if (TREE_CODE (t) == TREE_LIST)
+   {
+ while (TREE_CODE (t) == TREE_LIST)
+   t = TREE_CHAIN (t);
+   }
  while (TREE_CODE (t) == INDIRECT_REF
 || TREE_CODE (t) == ARRAY_REF)
t = TREE_OPERAND (t, 0);
}
}
- bitmap_set_bit (_on_device_head, DECL_UID (t));
  if (VAR_P (t) || TREE_CODE (t) == PARM_DECL)
-   cxx_mark_addressable (t);
+   {
+ bitmap_set_bit (_on_device_head, DECL_UID (t));
+ if (!processing_template_decl
+ && !cxx_mark_addressable (t))
+   remove = true;
+   }
  goto check_dup_generic_t;
 
case OMP_CLAUSE_USE_DEVICE_ADDR:
instead, as I said look through the TREE_LISTs, then only use DECL_UID
on actual VAR_DECLs/PARM_DECLs not random other expressions and
never call cxx_mark_addressable when processing_template_decl (and remove
clause if cxx_mark_addressable fails).
Note, check_dup_generic_t will do among other things:
  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL
  && (!field_ok || TREE_CODE (t) != FIELD_DECL))
{
  if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
break;
  ... error ...
}
so with processing_template_decl it will just defer it for later,
but otherwise if t is something invalid it will diagnose it.
But one really shouldn't rely on t being VAR_DECL/PARM_DECL before
that checking is done...

With your pt.cc change and my semantics.cc change, all your new testcases
look fine.

> diff 

Re: [PATCH] [PR100106] Reject unaligned subregs when strict alignment is required

2022-05-05 Thread Segher Boessenkool
On Thu, May 05, 2022 at 08:59:21AM +0100, Richard Sandiford wrote:
> Alexandre Oliva via Gcc-patches  writes:
> I know this is the best being the enemy of the good, but given
> that we're at the start of stage 1, would it be feasible to try
> to get rid of (subreg (mem)) altogether for GCC 13?

Yes please!

> We could do
> it target-by-target, with a target macro (yes, macro :-)) that opts
> in to keeping the existing behaviour.  (subreg (mem)) would then be
> unconditionally invalid when the macro isn't defined.  (Even in
> debug expressions, since those ought to narrow to a mem anyway.)

Or we can simply threaten to drop all unconverted targets.  That way at
least there is a *chance* (a slim chance, but still) that the conversion
will ever be finished.

Paradoxical subregs of memory are already not allowed on targets with
instruction scheduling, btw.


Segher


Re: [PATCH] Add operators / and * for profile_{count,probability}.

2022-05-05 Thread Jan Hubicka via Gcc-patches
Hi,
> The patch simplifies usage of the profile_{count,probability} types.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?

The reason I intentionally did not add * and / to the original API was
to detect situations where values that should be
profile_count/profile_probability are stored into integers, since
previous code used integers for everything.

Having one to add apply_scale made him/her (mostly me :) to think if the
value is really just a fixed scale or it it should be better converted
to proper data type (count or probability).

I guess now we completed the conversion so risk of this creeping in is
relatively low and the code indeed looks better. It will make it bit
harder for me to backport jump threading profile updating fixes I plan
for 12.2 but it should not be hard.
> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> index b4357c03e86..a1ac1146445 100644
> --- a/gcc/cfgloopmanip.cc
> +++ b/gcc/cfgloopmanip.cc
> @@ -563,8 +563,7 @@ scale_loop_profile (class loop *loop, profile_probability 
> p,
>  
> /* Probability of exit must be 1/iterations.  */
> count_delta = e->count ();
> -   e->probability = profile_probability::always ()
> - .apply_scale (1, iteration_bound);
> +   e->probability = profile_probability::always () / iteration_bound;
However this is kind of example of the problem. 
iteration_bound is gcov_type so we can get overflow here.
I guess we want to downgrade iteration_bound since it is always either 0
or int.
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index e14b4e6c94a..cef26a9878e 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -1782,7 +1782,7 @@ switch_decision_tree::analyze_switch_statement ()
>tree high = CASE_HIGH (elt);
>  
>profile_probability p
> - = case_edge->probability.apply_scale (1, (intptr_t) (case_edge->aux));
> + = case_edge->probability / ((intptr_t) (case_edge->aux));

I think the switch ranges may be also in risk of overflow?

We could make operators to accept gcov_type or int64_t.

Thanks,
Honza


Re: [PATCH] Skip constant folding for fmin/max when either argument is sNaN [PR105414]

2022-05-05 Thread Segher Boessenkool
Hi!

On Thu, May 05, 2022 at 05:30:58PM +0800, HAO CHEN GUI wrote:
> On 5/5/2022 下午 4:30, Kewen.Lin wrote:
> > on 2022/5/5 16:09, Richard Biener via Gcc-patches wrote:
> >> On Thu, May 5, 2022 at 10:07 AM HAO CHEN GUI via Gcc-patches
> >>  wrote:
> >>>This patch skips constant folding for fmin/max when either argument
> >>> is sNaN. According to C standard,
> >>>fmin(sNaN, sNaN)= qNaN, fmin(sNaN, NaN) = qNaN
> >>>So signaling NaN should be tested and skipped for fmin/max in match.pd.
> >>>
> >>>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.

The C standard does not talk about sNaNs *at all*, in any released
version of the standard.  And the C2x drafts do not talk about signaling
arguments for fmin/fmax specifically, so it should just raise an error
like any other floating operation with an sNaN arg will.  This means we
have to make sure to not optimise away all operations if there may be
an sNaN (and we have HONOR_SNANS for the mode in use).

You never have to convert to a qNaN manually.   Instead, any normal
operation on the sNaN will raise the exception, and convert to the qNaN.
There is no sane way you can raise the exception manually, so you should
make sure we end up with a real operation in the RTL, and then generate
proper machine code for it as well.  See rs6000 extendsfdf2 for example,
for that last part.

And of course both the gimple min_expr and the RTL smin are not defined
for NaN inputs at all anyway :-P


Segher


Re: [PATCH] Remove conditional STATIC_ASSERT.

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 2:41 PM Martin Liška  wrote:
>
> On 5/5/22 14:29, Richard Biener wrote:
> > Can we then use static_assert (...) instead and remove the
> > macro?
>
> Oh yes, we can ;)
>
> > Do we have C compiled code left (I think we might,
> > otherwise we'd not have __cplusplus guards in system.h),
> > in which case the #if should change to #ifdef __cplusplus?
>
> No, there's no such a consumer of the macro.

OK, but for C uses it should still be different so my suggestion
to change to #ifdef __cplusplus remains.  OTOH then the change
is somewhat pointless.

> What about the updated version of the patch?
>
> Cheers,
> Martin


Re: [PATCH] Come up with {,UN}LIKELY macros.

2022-05-05 Thread Marek Polacek via Gcc-patches
On Thu, May 05, 2022 at 02:31:05PM +0200, Martin Liška wrote:
> Some parts of the compiler already define:
> #define likely(cond) __builtin_expect ((cond), 1)
> 
> So the patch should unify it.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/c/ChangeLog:
> 
>   * c-parser.cc (c_parser_conditional_expression): Use {,UN}LIKELY
>   macros.
>   (c_parser_binary_expression): Likewise.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-gimplify.cc (cp_genericize_r): Use {,UN}LIKELY
>   macros.
>   * parser.cc (cp_finalize_omp_declare_simd): Likewise.
>   (cp_finalize_oacc_routine): Likewise.

That's funny, yesterday I added another one: 
cp/parser.cc:cp_parser_init_declarator
which is not replaced in this patch.

I would've preferred the name gcc_{,un}likely but I don't want to start
a long bikeshedding...

Thanks,

Marek



Re: [PATCH] Remove loop-incremented dead code.

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 2:44 PM Martin Liška  wrote:
>
> The code is dead and can be removed.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * genautomata.cc (create_composed_state): Remove dead code.
> * graphite-poly.cc (print_pdrs): Likewise.
> * lto-wrapper.cc (run_gcc): Likewise.
> * tree-switch-conversion.cc 
> (switch_decision_tree::balance_case_nodes):
> Likewise.
> ---
>  gcc/genautomata.cc| 21 +
>  gcc/graphite-poly.cc  | 10 --
>  gcc/lto-wrapper.cc|  7 +--
>  gcc/tree-switch-conversion.cc | 10 +++---
>  4 files changed, 13 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/genautomata.cc b/gcc/genautomata.cc
> index e43314e4ea3..389c1c3e0ed 100644
> --- a/gcc/genautomata.cc
> +++ b/gcc/genautomata.cc
> @@ -5661,7 +5661,6 @@ create_composed_state (state_t original_state, arc_t 
> arcs_marked_by_insn,
>state_t state_in_table;
>state_t temp_state;
>alt_state_t canonical_alt_states_list;
> -  int alts_number;
>int new_state_p = 0;
>
>if (arcs_marked_by_insn == NULL)
> @@ -5731,17 +5730,15 @@ create_composed_state (state_t original_state, arc_t 
> arcs_marked_by_insn,
>   || (curr_arc->insn->insn_reserv_decl
>   != DECL_INSN_RESERV (advance_cycle_insn_decl)))
> add_arc (state, curr_arc->to_state, curr_arc->insn);
> -}
> -  arcs_marked_by_insn->to_state = state;
> -  for (alts_number = 0,
> -  curr_arc = arcs_marked_by_insn->next_arc_marked_by_insn;
> -   curr_arc != NULL;
> -   curr_arc = next_arc)
> -{
> -  next_arc = curr_arc->next_arc_marked_by_insn;
> -  remove_arc (original_state, curr_arc);
> - alts_number++;
> -}
> +   }
> + arcs_marked_by_insn->to_state = state;
> + for (curr_arc = arcs_marked_by_insn->next_arc_marked_by_insn;
> +  curr_arc != NULL;
> +  curr_arc = next_arc)
> +   {
> + next_arc = curr_arc->next_arc_marked_by_insn;
> + remove_arc (original_state, curr_arc);
> +   }
>  }
>  }
>if (!state->it_was_placed_in_stack_for_DFA_forming)
> diff --git a/gcc/graphite-poly.cc b/gcc/graphite-poly.cc
> index 42ed038768e..173aae07442 100644
> --- a/gcc/graphite-poly.cc
> +++ b/gcc/graphite-poly.cc
> @@ -341,20 +341,10 @@ dump_gbb_conditions (FILE *file, gimple_poly_bb_p gbb)
>  void
>  print_pdrs (FILE *file, poly_bb_p pbb)
>  {
> -  int nb_reads = 0;
> -  int nb_writes = 0;
> -
>if (PBB_DRS (pbb).is_empty ())
>  return;
>
>fprintf (file, "Data references (\n");
> -
> -  for (poly_dr_p pdr : PBB_DRS (pbb))
> -if (PDR_TYPE (pdr) == PDR_READ)
> -  nb_reads++;
> -else
> -  nb_writes++;
> -
>fprintf (file, "Read data references (\n");
>
>for (poly_dr_p pdr : PBB_DRS (pbb))
> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> index 285e6e96af5..26e06e77be4 100644
> --- a/gcc/lto-wrapper.cc
> +++ b/gcc/lto-wrapper.cc
> @@ -1428,7 +1428,6 @@ run_gcc (unsigned argc, char *argv[])
>char **lto_argv, **ltoobj_argv;
>bool linker_output_rel = false;
>bool skip_debug = false;
> -  unsigned n_debugobj;
>const char *incoming_dumppfx = dumppfx = NULL;
>static char current_dir[] = { '.', DIR_SEPARATOR, '\0' };
>
> @@ -1871,7 +1870,6 @@ cont1:
>
>/* Copy the early generated debug info from the objects to temporary
>   files and append those to the partial link commandline.  */
> -  n_debugobj = 0;
>early_debug_object_names = NULL;
>if (! skip_debug)
>  {
> @@ -1881,10 +1879,7 @@ cont1:
> {
>   const char *tem;
>   if ((tem = debug_objcopy (ltoobj_argv[i], !linker_output_rel)))
> -   {
> - early_debug_object_names[i] = tem;
> - n_debugobj++;
> -   }
> +   early_debug_object_names[i] = tem;
> }
>  }
>
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index e14b4e6c94a..50a17927f39 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -2039,18 +2039,14 @@ switch_decision_tree::balance_case_nodes 
> (case_tree_node **head,
>if (np)
>  {
>int i = 0;
> -  int ranges = 0;
>case_tree_node **npp;
>case_tree_node *left;
>profile_probability prob = profile_probability::never ();
>
> -  /* Count the number of entries on branch.  Also count the ranges.  */
> +  /* Count the number of entries on branch.  */
>
>while (np)
> {
> - if (!tree_int_cst_equal (np->m_c->get_low (), np->m_c->get_high ()))
> -   ranges++;
> -
>   i++;
>   prob += np->m_c->m_prob;
>   np = np->m_right;
> @@ 

Re: [PATCH] Use more ARRAY_SIZE.

2022-05-05 Thread Iain Buclaw via Gcc-patches
Excerpts from Martin Liška's message of Mai 5, 2022 2:16 pm:
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/d/ChangeLog:
> 
>   * longdouble.h: Use ARRAY_SIZE.
> 
> diff --git a/gcc/d/longdouble.h b/gcc/d/longdouble.h
> index 1e457ae04d6..2d9695a4309 100644
> --- a/gcc/d/longdouble.h
> +++ b/gcc/d/longdouble.h
> @@ -117,7 +117,7 @@ public:
>  private:
>/* Including gcc/real.h presents too many problems, so just
>   statically allocate enough space for REAL_VALUE_TYPE.  */
> -  long realvalue[(2 + (16 + sizeof (long)) / sizeof (long))];
> +  long realvalue[(2 + (16 + ARRAY_SIZE (long))];
>  };
>  
>  /* Declared, but "volatile" is not required.  */
> 

Hi,

This D front-end change doesn't look right to me, besides the slight
difference in parentheses meaning the calculation would be off by some
measure - (2 + (16 + N) / N) => 7 vs. (2 + 16 + N) => 22 - if I
understand the ARRAY_SIZE macro correctly, it wouldn't even generate
valid code either.

Regards,
Iain.


Re: [PATCH] Remove conditional STATIC_ASSERT.

2022-05-05 Thread Martin Liška
On 5/5/22 14:51, Pedro Alves wrote:
> On 2022-05-05 13:41, Martin Liška wrote:
>> On 5/5/22 14:29, Richard Biener wrote:
>>> Can we then use static_assert (...) instead and remove the
>>> macro?
>>
>> Oh yes, we can ;)
>>
>>> Do we have C compiled code left (I think we might,
>>> otherwise we'd not have __cplusplus guards in system.h),
>>> in which case the #if should change to #ifdef __cplusplus?
>>
>> No, there's no such a consumer of the macro.
>>
>> What about the updated version of the patch?
> 
> static_assert without the second/message parameter requires C++17:
> 
>   https://en.cppreference.com/w/cpp/language/static_assert

Oh, you are correct :) Thanks:

/home/marxin/Programming/gcc/gcc/wide-int.h: In static member function ‘static 
wide_int wi::int_traits::get_binary_result(const T1&, const 
T2&)’:
/home/marxin/Programming/gcc/gcc/wide-int.h:1205:60: warning: ‘static_assert’ 
without a message only available with ‘-std=c++17’ or ‘-std=gnu++17’ 
[-Wpedantic]
 1205 |  || wi::int_traits ::precision_type != 
FLEXIBLE_PRECISION);

> 
> The macro expanded to always have a message argument.


That said, we should go with the original version of the patch.

Cheers,
Martin


[PATCH][pushed] profile: Unify identifier names for profiling

2022-05-05 Thread Martin Liška
The patch unifies names used for profiling.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thanks,
Martin

gcc/ChangeLog:

* tree-profile.cc (gimple_gen_ic_profiler): Prefix names with
PROF_*.
(gimple_gen_time_profiler): Likewise.
---
 gcc/tree-profile.cc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 6d40401f86f..97aab8801d0 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -383,8 +383,8 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag)
 Example:
   f_1 = foo;
   __gcov_indirect_call.counters = &__gcov4.main[0];
-  PROF_9 = f_1;
-  __gcov_indirect_call.callee = PROF_9;
+  PROF_fn_9 = f_1;
+  __gcov_indirect_call.callee = PROF_fn_9;
   _4 = f_1 ();
*/
 
@@ -394,7 +394,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag)
 ic_tuple_var, ic_tuple_counters_field, NULL_TREE);
 
   stmt1 = gimple_build_assign (counter_ref, ref_ptr);
-  tmp1 = make_temp_ssa_name (ptr_type_node, NULL, "PROF");
+  tmp1 = make_temp_ssa_name (ptr_type_node, NULL, "PROF_fn");
   stmt2 = gimple_build_assign (tmp1, unshare_expr (value->hvalue.value));
   tree callee_ref = build3 (COMPONENT_REF, ptr_type_node,
 ic_tuple_var, ic_tuple_callee_field, NULL_TREE);
@@ -520,7 +520,7 @@ gimple_gen_time_profiler (unsigned tag)
   if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
 {
   tree ptr = make_temp_ssa_name (build_pointer_type (type), NULL,
-"time_profiler_counter_ptr");
+"PROF_time_profiler_counter_ptr");
   tree addr = build1 (ADDR_EXPR, TREE_TYPE (ptr),
  tree_time_profiler_counter);
   gassign *assign = gimple_build_assign (ptr, NOP_EXPR, addr);
@@ -532,10 +532,10 @@ gimple_gen_time_profiler (unsigned tag)
   build_int_cst (integer_type_node,
  MEMMODEL_RELAXED));
   tree result_type = TREE_TYPE (TREE_TYPE (f));
-  tree tmp = make_temp_ssa_name (result_type, NULL, "time_profile");
+  tree tmp = make_temp_ssa_name (result_type, NULL, "PROF_time_profile");
   gimple_set_lhs (stmt, tmp);
   gsi_insert_after (, stmt, GSI_NEW_STMT);
-  tmp = make_temp_ssa_name (type, NULL, "time_profile");
+  tmp = make_temp_ssa_name (type, NULL, "PROF_time_profile");
   assign = gimple_build_assign (tmp, NOP_EXPR,
gimple_call_lhs (stmt));
   gsi_insert_after (, assign, GSI_NEW_STMT);
@@ -544,11 +544,11 @@ gimple_gen_time_profiler (unsigned tag)
 }
   else
 {
-  tree tmp = make_temp_ssa_name (type, NULL, "time_profile");
+  tree tmp = make_temp_ssa_name (type, NULL, "PROF_time_profile");
   gassign *assign = gimple_build_assign (tmp, tree_time_profiler_counter);
   gsi_insert_before (, assign, GSI_NEW_STMT);
 
-  tmp = make_temp_ssa_name (type, NULL, "time_profile");
+  tmp = make_temp_ssa_name (type, NULL, "PROF_time_profile");
   assign = gimple_build_assign (tmp, PLUS_EXPR, gimple_assign_lhs (assign),
one);
   gsi_insert_after (, assign, GSI_NEW_STMT);
-- 
2.36.0



Re: [PATCH] Remove conditional STATIC_ASSERT.

2022-05-05 Thread Pedro Alves
On 2022-05-05 13:41, Martin Liška wrote:
> On 5/5/22 14:29, Richard Biener wrote:
>> Can we then use static_assert (...) instead and remove the
>> macro?
> 
> Oh yes, we can ;)
> 
>> Do we have C compiled code left (I think we might,
>> otherwise we'd not have __cplusplus guards in system.h),
>> in which case the #if should change to #ifdef __cplusplus?
> 
> No, there's no such a consumer of the macro.
> 
> What about the updated version of the patch?

static_assert without the second/message parameter requires C++17:

  https://en.cppreference.com/w/cpp/language/static_assert

The macro expanded to always have a message argument.


Re: [PATCH] lto-plugin: add support for feature detection

2022-05-05 Thread Martin Liška
On 5/5/22 12:52, Alexander Monakov wrote:
> Feels a bit weird to ask, but before entertaining such an API extension,
> can we step back and understand the v3 variant of get_symbols? It is not
> documented, and from what little I saw I did not get the "motivation" for
> its existence (what it is doing that couldn't be done with the v2 api).

Please see here:
https://github.com/rui314/mold/issues/181#issuecomment-1037927757

> 
> To me lack of documentation looks like a serious issue :/

Yes, documentation is missing. This is what can be seen from gold's 
implementation:

// Get the symbol resolution info for a plugin-claimed input file.

static enum ld_plugin_status
get_symbols(const void* handle, int nsyms, ld_plugin_symbol* syms)
...

// Version 2 of the above.  The only difference is that this version
// is allowed to return the resolution code LDPR_PREVAILING_DEF_IRONLY_EXP.


// Version 3 of the above.  The only difference from v2 is that it
// returns LDPS_NO_SYMS instead of LDPS_OK for the objects we never
// decided to include.

static enum ld_plugin_status
get_symbols_v3(const void* handle, int nsyms, ld_plugin_symbol* syms)

Which is something like documentation :(

Martin


[PATCH][pushed] Remove sanity checking in stream_out_histogram_value.

2022-05-05 Thread Martin Liška
The patch is pre-approved by Honza.

Cheers,
Martin

gcc/ChangeLog:

* value-prof.cc (stream_out_histogram_value): Remove sanity
checking.
---
 gcc/value-prof.cc | 12 
 1 file changed, 12 deletions(-)

diff --git a/gcc/value-prof.cc b/gcc/value-prof.cc
index c240a186336..9656ce5870d 100644
--- a/gcc/value-prof.cc
+++ b/gcc/value-prof.cc
@@ -331,18 +331,6 @@ stream_out_histogram_value (struct output_block *ob, 
histogram_value hist)
   /* When user uses an unsigned type with a big value, constant converted
 to gcov_type (a signed type) can be negative.  */
   gcov_type value = hist->hvalue.counters[i];
-  if (hist->type == HIST_TYPE_TOPN_VALUES
- || hist->type == HIST_TYPE_IOR)
-   /* Note that the IOR counter tracks pointer values and these can have
-  sign bit set.  */
-   ;
-  else if (hist->type == HIST_TYPE_INDIR_CALL && i == 0)
-   /* 'all' counter overflow is stored as a negative value. Individual
-  counters and values are expected to be non-negative.  */
-   ;
-  else
-   gcc_assert (value >= 0);
-
   streamer_write_gcov_count (ob, value);
 }
   if (hist->hvalue.next)
-- 
2.36.0



[PATCH] Remove loop-incremented dead code.

2022-05-05 Thread Martin Liška
The code is dead and can be removed.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* genautomata.cc (create_composed_state): Remove dead code.
* graphite-poly.cc (print_pdrs): Likewise.
* lto-wrapper.cc (run_gcc): Likewise.
* tree-switch-conversion.cc (switch_decision_tree::balance_case_nodes):
Likewise.
---
 gcc/genautomata.cc| 21 +
 gcc/graphite-poly.cc  | 10 --
 gcc/lto-wrapper.cc|  7 +--
 gcc/tree-switch-conversion.cc | 10 +++---
 4 files changed, 13 insertions(+), 35 deletions(-)

diff --git a/gcc/genautomata.cc b/gcc/genautomata.cc
index e43314e4ea3..389c1c3e0ed 100644
--- a/gcc/genautomata.cc
+++ b/gcc/genautomata.cc
@@ -5661,7 +5661,6 @@ create_composed_state (state_t original_state, arc_t 
arcs_marked_by_insn,
   state_t state_in_table;
   state_t temp_state;
   alt_state_t canonical_alt_states_list;
-  int alts_number;
   int new_state_p = 0;
 
   if (arcs_marked_by_insn == NULL)
@@ -5731,17 +5730,15 @@ create_composed_state (state_t original_state, arc_t 
arcs_marked_by_insn,
  || (curr_arc->insn->insn_reserv_decl
  != DECL_INSN_RESERV (advance_cycle_insn_decl)))
add_arc (state, curr_arc->to_state, curr_arc->insn);
-}
-  arcs_marked_by_insn->to_state = state;
-  for (alts_number = 0,
-  curr_arc = arcs_marked_by_insn->next_arc_marked_by_insn;
-   curr_arc != NULL;
-   curr_arc = next_arc)
-{
-  next_arc = curr_arc->next_arc_marked_by_insn;
-  remove_arc (original_state, curr_arc);
- alts_number++;
-}
+   }
+ arcs_marked_by_insn->to_state = state;
+ for (curr_arc = arcs_marked_by_insn->next_arc_marked_by_insn;
+  curr_arc != NULL;
+  curr_arc = next_arc)
+   {
+ next_arc = curr_arc->next_arc_marked_by_insn;
+ remove_arc (original_state, curr_arc);
+   }
 }
 }
   if (!state->it_was_placed_in_stack_for_DFA_forming)
diff --git a/gcc/graphite-poly.cc b/gcc/graphite-poly.cc
index 42ed038768e..173aae07442 100644
--- a/gcc/graphite-poly.cc
+++ b/gcc/graphite-poly.cc
@@ -341,20 +341,10 @@ dump_gbb_conditions (FILE *file, gimple_poly_bb_p gbb)
 void
 print_pdrs (FILE *file, poly_bb_p pbb)
 {
-  int nb_reads = 0;
-  int nb_writes = 0;
-
   if (PBB_DRS (pbb).is_empty ())
 return;
 
   fprintf (file, "Data references (\n");
-
-  for (poly_dr_p pdr : PBB_DRS (pbb))
-if (PDR_TYPE (pdr) == PDR_READ)
-  nb_reads++;
-else
-  nb_writes++;
-
   fprintf (file, "Read data references (\n");
 
   for (poly_dr_p pdr : PBB_DRS (pbb))
diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 285e6e96af5..26e06e77be4 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -1428,7 +1428,6 @@ run_gcc (unsigned argc, char *argv[])
   char **lto_argv, **ltoobj_argv;
   bool linker_output_rel = false;
   bool skip_debug = false;
-  unsigned n_debugobj;
   const char *incoming_dumppfx = dumppfx = NULL;
   static char current_dir[] = { '.', DIR_SEPARATOR, '\0' };
 
@@ -1871,7 +1870,6 @@ cont1:
 
   /* Copy the early generated debug info from the objects to temporary
  files and append those to the partial link commandline.  */
-  n_debugobj = 0;
   early_debug_object_names = NULL;
   if (! skip_debug)
 {
@@ -1881,10 +1879,7 @@ cont1:
{
  const char *tem;
  if ((tem = debug_objcopy (ltoobj_argv[i], !linker_output_rel)))
-   {
- early_debug_object_names[i] = tem;
- n_debugobj++;
-   }
+   early_debug_object_names[i] = tem;
}
 }
 
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index e14b4e6c94a..50a17927f39 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -2039,18 +2039,14 @@ switch_decision_tree::balance_case_nodes 
(case_tree_node **head,
   if (np)
 {
   int i = 0;
-  int ranges = 0;
   case_tree_node **npp;
   case_tree_node *left;
   profile_probability prob = profile_probability::never ();
 
-  /* Count the number of entries on branch.  Also count the ranges.  */
+  /* Count the number of entries on branch.  */
 
   while (np)
{
- if (!tree_int_cst_equal (np->m_c->get_low (), np->m_c->get_high ()))
-   ranges++;
-
  i++;
  prob += np->m_c->m_prob;
  np = np->m_right;
@@ -2063,8 +2059,8 @@ switch_decision_tree::balance_case_nodes (case_tree_node 
**head,
  left = *npp;
  profile_probability pivot_prob = prob.apply_scale (1, 2);
 
- /* Find the place in the list that bisects the list's total cost,
-where ranges count as 2.  */
+ /* Find the place in the list 

Re: [PATCH] Remove conditional STATIC_ASSERT.

2022-05-05 Thread Martin Liška
On 5/5/22 14:29, Richard Biener wrote:
> Can we then use static_assert (...) instead and remove the
> macro?

Oh yes, we can ;)

> Do we have C compiled code left (I think we might,
> otherwise we'd not have __cplusplus guards in system.h),
> in which case the #if should change to #ifdef __cplusplus?

No, there's no such a consumer of the macro.

What about the updated version of the patch?

Cheers,
MartinFrom a610603daddd45de3a6a218006ffa6d1f8855e1c Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 4 May 2022 21:17:54 +0200
Subject: [PATCH] Start using static_assert.

As we require a c++11 compliant compiler, static_assert is always
available.

gcc/ChangeLog:

	* system.h (STATIC_ASSERT): Remove.
	* basic-block.h (STATIC_ASSERT): Use normal static_assert.

gcc/ChangeLog:

	* basic-block.h (static_assert): Use static_assert directly.
	* bitmap.h (Traits>::base_bitmap_view): Likewise.
	* common/config/i386/i386-common.cc (STATIC_ASSERT): Likewise.
	(static_assert): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (struct binary_imm_narrowb_base): Likewise.
	(struct binary_imm_narrowt_base): Likewise.
	(struct unary_narrowb_base): Likewise.
	(struct unary_narrowt_base): Likewise.
	* config/i386/i386-builtins.cc (BDESC_VERIFYS): Likewise.
	* config/i386/i386-options.cc (STATIC_ASSERT): Likewise.
	(static_assert): Likewise.
	* config/i386/i386.h (STATIC_ASSERT): Likewise.
	(static_assert): Likewise.
	* dumpfile.cc (make_item_for_dump_dec): Likewise.
	* expmed.cc (make_tree): Likewise.
	* input.h (STATIC_ASSERT): Likewise.
	(static_assert): Likewise.
	* ira-build.cc (ira_conflict_vector_profitable_p): Likewise.
	* lto-streamer.h (STATIC_ASSERT): Likewise.
	(static_assert): Likewise.
	* lto-wrapper.cc: Likewise.
	* poly-int.h (C>::poly_int): Likewise.
	(maybe_eq): Likewise.
	(print_dec): Likewise.
	* profile-count.cc (profile_count::to_frequency): Likewise.
	* rtl.h (subreg_shape::unique_id): Likewise.
	* system.h (STATIC_ASSERT): Likewise.
	* tree.h (TYPE_VECTOR_SUBPARTS): Likewise.
	(SET_TYPE_VECTOR_SUBPARTS): Likewise.
	* wide-int.h (wide_int_storage::wide_int_storage): Likewise.
	(wide_int_storage>::get_binary_result): Likewise.
	(N>::set_len): Likewise.
	(wi::mask): Likewise.
	(wi::shifted_mask): Likewise.

gcc/c-family/ChangeLog:

	* c-omp.cc (c_omp_split_clauses): Use static_assert directly.

gcc/cp/ChangeLog:

	* cp-tree.h (STATIC_ASSERT): Use static_assert directly.

gcc/fortran/ChangeLog:

	* openmp.cc (resolve_omp_clauses): Use static_assert directly.
---
 gcc/basic-block.h  |  5 +
 gcc/bitmap.h   |  2 +-
 gcc/c-family/c-omp.cc  |  2 +-
 gcc/common/config/i386/i386-common.cc  |  2 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc |  8 
 gcc/config/i386/i386-builtins.cc   |  2 +-
 gcc/config/i386/i386-options.cc|  2 +-
 gcc/config/i386/i386.h |  2 +-
 gcc/cp/cp-tree.h   |  2 +-
 gcc/dumpfile.cc|  2 +-
 gcc/expmed.cc  |  2 +-
 gcc/fortran/openmp.cc  |  2 +-
 gcc/input.h|  2 +-
 gcc/ira-build.cc   |  2 +-
 gcc/lto-streamer.h |  2 +-
 gcc/lto-wrapper.cc |  2 +-
 gcc/poly-int.h | 10 +-
 gcc/profile-count.cc   |  2 +-
 gcc/rtl.h  |  6 +++---
 gcc/system.h   |  9 -
 gcc/tree.h |  4 ++--
 gcc/wide-int.h | 18 +-
 22 files changed, 39 insertions(+), 51 deletions(-)

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index e3fff1f6975..fe81b6d90cc 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -158,10 +158,7 @@ struct GTY((chain_next ("%h.next_bb"), chain_prev ("%h.prev_bb"))) basic_block_d
 /* This ensures that struct gimple_bb_info is smaller than
struct rtl_bb_info, so that inlining the former into basic_block_def
is the better choice.  */
-typedef int __assert_gimple_bb_smaller_rtl_bb
-  [(int) sizeof (struct rtl_bb_info)
-   - (int) sizeof (struct gimple_bb_info)];
-
+static_assert (sizeof (rtl_bb_info) >= sizeof (gimple_bb_info));
 
 #define BB_FREQ_MAX 1
 
diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index e7bf67a5474..3c61c6dcfb7 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -1005,7 +1005,7 @@ base_bitmap_view::base_bitmap_view (const T ,
   /* The code currently assumes that each element of ARRAY corresponds
  to exactly one bitmap_element.  */
   const size_t array_element_bits = CHAR_BIT * sizeof (array_element_type);
-  STATIC_ASSERT (BITMAP_ELEMENT_ALL_BITS % array_element_bits == 0);
+  static_assert (BITMAP_ELEMENT_ALL_BITS % 

[PATCH] tree-optimization/104162 - CSE of [ptr].a[i] and ptr + CST

2022-05-05 Thread Richard Biener via Gcc-patches
This adds the capability to value-numbering of treating complex
address expressions where the offset becomes invariant as equal
to a POINTER_PLUS_EXPR.  This restores CSE that is now prevented
by early lowering of [ptr + CST] to a POINTER_PLUS_EXPR.

Unfortunately this regresses gcc.dg/asan/pr99673.c again, so
the testcase is adjusted accordingly.

Re-bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2022-01-26  Richard Biener  

PR tree-optimization/104162
* tree-ssa-sccvn.cc (vn_reference_lookup): Handle
[_1 + 5].a[i] like a POINTER_PLUS_EXPR if the offset
becomes invariant.
(vn_reference_insert): Likewise.

* gcc.dg/tree-ssa/ssa-fre-99.c: New testcase.
* gcc.dg/asan/pr99673.c: Adjust.
---
 gcc/testsuite/gcc.dg/asan/pr99673.c|  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-99.c | 27 +
 gcc/tree-ssa-sccvn.cc  | 66 +-
 3 files changed, 95 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-99.c

diff --git a/gcc/testsuite/gcc.dg/asan/pr99673.c 
b/gcc/testsuite/gcc.dg/asan/pr99673.c
index 05857fd46c7..a1e9631af2e 100644
--- a/gcc/testsuite/gcc.dg/asan/pr99673.c
+++ b/gcc/testsuite/gcc.dg/asan/pr99673.c
@@ -1,4 +1,6 @@
 /* { dg-do compile } */
+/* Skip XPASS cases.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-O2 -flto" } { "" } } */
 /* { dg-additional-options "-Wstringop-overread" } */
 
 struct B {
@@ -22,6 +24,6 @@ void g (struct C *pc, struct D *pd, int i)
   pd->i = pb->i;
 
   const short *psa = pb->a[i].sa;
-  if (f (psa))
+  if (f (psa)) /* { dg-bogus "from a region of size" "pr99673" { xfail *-*-* } 
} */
 return;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-99.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-99.c
new file mode 100644
index 000..101d0d63f7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-99.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* Disable FRE1 because that for the sake of __builtin_object_size
+   will not consider the equality but still valueize 'i', defeating
+   the purpose of the check.  */
+/* { dg-options "-O -fdump-tree-fre3 -fdisable-tree-fre1" } */
+
+struct S { int a[4]; };
+
+int i;
+int bar (struct S *p)
+{
+  char *q = (char *)p + 4;
+  i = 1;
+  int *r = &((struct S *)p)->a[i];
+  return q == (char *)r;
+}
+int baz (struct S *p)
+{
+  i = 1;
+  int *r = &((struct S *)p)->a[i];
+  char *q = (char *)p + 4;
+  return q == (char *)r;
+}
+
+/* Verify FRE can handle valueizing >a[i] and value-numbering it
+   equal to a POINTER_PLUS_EXPR.  */
+/* { dg-final { scan-tree-dump-times "return 1;" 2 "fre3" } } */
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 3c90c1e23e6..76587632312 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -3666,6 +3666,38 @@ vn_reference_lookup (tree op, tree vuse, vn_lookup_kind 
kind,
   vr1.vuse = vuse_ssa_val (vuse);
   vr1.operands = operands
 = valueize_shared_reference_ops_from_ref (op, _anything);
+
+  /* Handle [ptr + 5].b[1].c as POINTER_PLUS_EXPR.  Avoid doing
+ this before the pass folding __builtin_object_size had a chance to run.  
*/
+  if ((cfun->curr_properties & PROP_objsz)
+  && operands[0].opcode == ADDR_EXPR
+  && operands.last ().opcode == SSA_NAME)
+{
+  poly_int64 off = 0;
+  vn_reference_op_t vro;
+  unsigned i;
+  for (i = 1; operands.iterate (i, ); ++i)
+   {
+ if (vro->opcode == SSA_NAME)
+   break;
+ else if (known_eq (vro->off, -1))
+   break;
+ off += vro->off;
+   }
+  if (i == operands.length () - 1)
+   {
+ gcc_assert (operands[i-1].opcode == MEM_REF);
+ tree ops[2];
+ ops[0] = operands[i].op0;
+ ops[1] = wide_int_to_tree (sizetype, off);
+ tree res = vn_nary_op_lookup_pieces (2, POINTER_PLUS_EXPR,
+  TREE_TYPE (op), ops, NULL);
+ if (res)
+   return res;
+ return NULL_TREE;
+   }
+}
+
   vr1.type = TREE_TYPE (op);
   ao_ref op_ref;
   ao_ref_init (_ref, op);
@@ -3757,13 +3789,45 @@ vn_reference_insert (tree op, tree result, tree vuse, 
tree vdef)
   vn_reference_t vr1;
   bool tem;
 
+  vec operands
+= valueize_shared_reference_ops_from_ref (op, );
+  /* Handle [ptr + 5].b[1].c as POINTER_PLUS_EXPR.  Avoid doing this
+ before the pass folding __builtin_object_size had a chance to run.  */
+  if ((cfun->curr_properties & PROP_objsz)
+  && operands[0].opcode == ADDR_EXPR
+  && operands.last ().opcode == SSA_NAME)
+{
+  poly_int64 off = 0;
+  vn_reference_op_t vro;
+  unsigned i;
+  for (i = 1; operands.iterate (i, ); ++i)
+   {
+ if (vro->opcode == SSA_NAME)
+   break;
+ else if (known_eq (vro->off, -1))
+   break;
+ off += vro->off;
+   }
+  if (i == operands.length () - 1)
+   {
+ 

Re: [PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections

2022-05-05 Thread Jakub Jelinek via Gcc-patches
On Thu, May 05, 2022 at 12:46:29PM +0100, Julian Brown wrote:
> All the above (at least) has been done as part of the patch series
> posted here:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591973.html

Ah, ok, so is this patch superseded by that series, or do you want to apply
it just to be removed again?

> > At least for the C FE maybe we'll
> > need to arrange for less folding to be done because C still folds too
> > much stuff prematurely. Then when finishing clauses verify that
> > OMP_ARRAY_SECTION trees appear only where we allow them and not
> > elsewhere (say foo (1, 2, 3)[:36]
> > would be ok if foo returns a pointer, but
> > foo (ptr[0:13], 2, 3)
> > would not) and then need to differentiate between the cases listed in
> > the standard which we handle for each . -> [idx] when starting from a
> > var (in such a case I vaguely recall there are rules for pointer
> > attachments etc.) or other arbitrary expressions (in that case we
> > just evaluate those expressions and e.g. in the foo (1, 2, 3)[:36]
> > case basically do tmp = foo (1, 2, 3);
> > and mapping of tmp[:36].
> 
> ...which also changes/refactors quite a lot regarding how lowering
> clauses into mapping nodes works (the "address inspector" bits).
> "Weird" cases like mapping the return value from functions doesn't
> necessarily DTRT yet -- it wasn't entirely clear how that should/could
> work, I don't think.

I vaguely remember that the ./-/[] handling applies only when it starts
from a variable and as soon as one triggers something else, perhaps
including just *& or similar stuff then only the final lvalue is mapped and
nothing else, but dunno if it is from just discussions about the topic or
what is actually written in the spec, will need to look it up.

Jakub



[PATCH] Come up with {,UN}LIKELY macros.

2022-05-05 Thread Martin Liška
Some parts of the compiler already define:
#define likely(cond) __builtin_expect ((cond), 1)

So the patch should unify it.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/c/ChangeLog:

* c-parser.cc (c_parser_conditional_expression): Use {,UN}LIKELY
macros.
(c_parser_binary_expression): Likewise.

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_genericize_r): Use {,UN}LIKELY
macros.
* parser.cc (cp_finalize_omp_declare_simd): Likewise.
(cp_finalize_oacc_routine): Likewise.

gcc/ChangeLog:

* system.h (LIKELY): Define.
(UNLIKELY): Likewise.
* domwalk.cc (sort_bbs_postorder): Use {,UN}LIKELY
macros.
* dse.cc (set_position_unneeded): Likewise.
(set_all_positions_unneeded): Likewise.
(any_positions_needed_p): Likewise.
(all_positions_needed_p): Likewise.
* expmed.cc (flip_storage_order): Likewise.
* genmatch.cc (dt_simplify::gen_1): Likewise.
* ggc-common.cc (gt_pch_save): Likewise.
* print-rtl.cc: Likewise.
* rtl-iter.h (T>::array_type::~array_type): Likewise.
(T>::next): Likewise.
* rtl-ssa/internals.inl: Likewise.
* rtl-ssa/member-fns.inl: Likewise.
* rtlanal.cc (T>::add_subrtxes_to_queue): Likewise.
(rtx_properties::try_to_add_dest): Likewise.
* rtlanal.h (growing_rtx_properties::repeat): Likewise.
(vec_rtx_properties_base::~vec_rtx_properties_base): Likewise.
* simplify-rtx.cc (simplify_replace_fn_rtx): Likewise.
* sort.cc (likely): Likewise.
(mergesort): Likewise.
* wide-int.h (wi::eq_p): Likewise.
(wi::ltu_p): Likewise.
(wi::cmpu): Likewise.
(wi::bit_and): Likewise.
(wi::bit_and_not): Likewise.
(wi::bit_or): Likewise.
(wi::bit_or_not): Likewise.
(wi::bit_xor): Likewise.
(wi::add): Likewise.
(wi::sub): Likewise.
---
 gcc/c/c-parser.cc  |  4 ++--
 gcc/cp/cp-gimplify.cc  |  6 +++---
 gcc/cp/parser.cc   |  4 ++--
 gcc/domwalk.cc |  4 ++--
 gcc/dse.cc |  8 
 gcc/expmed.cc  |  4 ++--
 gcc/genmatch.cc|  4 ++--
 gcc/ggc-common.cc  |  4 ++--
 gcc/print-rtl.cc   |  2 +-
 gcc/rtl-iter.h |  8 
 gcc/rtl-ssa/internals.inl  |  2 +-
 gcc/rtl-ssa/member-fns.inl |  4 ++--
 gcc/rtlanal.cc | 12 ++--
 gcc/rtlanal.h  |  4 ++--
 gcc/simplify-rtx.cc|  2 +-
 gcc/sort.cc| 28 +---
 gcc/system.h   |  5 -
 gcc/wide-int.h | 20 ++--
 18 files changed, 63 insertions(+), 62 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 129dd727ef3..d431d5fb4c1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -7669,7 +7669,7 @@ c_parser_conditional_expression (c_parser *parser, struct 
c_expr *after,
   c_inhibit_evaluation_warnings -= cond.value == truthvalue_true_node;
   location_t loc1 = make_location (exp1.get_start (), exp1.src_range);
   location_t loc2 = make_location (exp2.get_start (), exp2.src_range);
-  if (__builtin_expect (omp_atomic_lhs != NULL, 0)
+  if (UNLIKELY (omp_atomic_lhs != NULL)
   && (TREE_CODE (cond.value) == GT_EXPR
  || TREE_CODE (cond.value) == LT_EXPR
  || TREE_CODE (cond.value) == EQ_EXPR)
@@ -7865,7 +7865,7 @@ c_parser_binary_expression (c_parser *parser, struct 
c_expr *after,
 stack[sp].expr   \
   = convert_lvalue_to_rvalue (stack[sp].loc, \
  stack[sp].expr, true, true);\
-if (__builtin_expect (omp_atomic_lhs != NULL_TREE, 0) && sp == 1 \
+if (UNLIKELY (omp_atomic_lhs != NULL_TREE) && sp == 1\
&& ((c_parser_next_token_is (parser, CPP_SEMICOLON)   \
 && ((1 << stack[sp].prec)\
 & ((1 << PREC_BITOR) | (1 << PREC_BITXOR)\
diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index b52d9cb5754..7ab44efa058 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1178,7 +1178,7 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
   hash_set *p_set = wtd->p_set;
 
   /* If in an OpenMP context, note var uses.  */
-  if (__builtin_expect (wtd->omp_ctx != NULL, 0)
+  if (UNLIKELY (wtd->omp_ctx != NULL)
   && (VAR_P (stmt)
  || TREE_CODE (stmt) == PARM_DECL
  || TREE_CODE (stmt) == RESULT_DECL)
@@ -1242,7 +1242,7 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
   if (is_invisiref_parm (TREE_OPERAND (stmt, 0)))
{
  /* If in an OpenMP context, note var uses.  */
- if (__builtin_expect 

Re: [PATCH] Remove conditional STATIC_ASSERT.

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 2:20 PM Martin Liška  wrote:
>
> As we require a c++11 compliant compiler, the #if __cplusplus >= 201103L
> conditional build is always true.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Can we then use static_assert (...) instead and remove the
macro?  Do we have C compiled code left (I think we might,
otherwise we'd not have __cplusplus guards in system.h),
in which case the #if should change to #ifdef __cplusplus?

Thanks,
Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * basic-block.h (STATIC_ASSERT): Use normal STATIC_ASSERT.
> * system.h (STATIC_ASSERT): Define always as static_assert.
> ---
>  gcc/basic-block.h | 5 +
>  gcc/system.h  | 9 +
>  2 files changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/basic-block.h b/gcc/basic-block.h
> index e3fff1f6975..21a9b24dbf9 100644
> --- a/gcc/basic-block.h
> +++ b/gcc/basic-block.h
> @@ -158,10 +158,7 @@ struct GTY((chain_next ("%h.next_bb"), chain_prev 
> ("%h.prev_bb"))) basic_block_d
>  /* This ensures that struct gimple_bb_info is smaller than
> struct rtl_bb_info, so that inlining the former into basic_block_def
> is the better choice.  */
> -typedef int __assert_gimple_bb_smaller_rtl_bb
> -  [(int) sizeof (struct rtl_bb_info)
> -   - (int) sizeof (struct gimple_bb_info)];
> -
> +STATIC_ASSERT (sizeof (rtl_bb_info) >= sizeof (gimple_bb_info));
>
>  #define BB_FREQ_MAX 1
>
> diff --git a/gcc/system.h b/gcc/system.h
> index 1688b763ef5..48145951337 100644
> --- a/gcc/system.h
> +++ b/gcc/system.h
> @@ -801,14 +801,7 @@ extern void fancy_abort (const char *, int, const char *)
>
>  #define STATIC_CONSTANT_P(X) (__builtin_constant_p (X) && (X))
>
> -/* static_assert (COND, MESSAGE) is available in C++11 onwards.  */
> -#if __cplusplus >= 201103L
> -#define STATIC_ASSERT(X) \
> -  static_assert ((X), #X)
> -#else
> -#define STATIC_ASSERT(X) \
> -  typedef int assertion1[(X) ? 1 : -1] ATTRIBUTE_UNUSED
> -#endif
> +#define STATIC_ASSERT(X) static_assert ((X), #X)
>
>  /* Provide a fake boolean type.  We make no attempt to use the
> C99 _Bool, as it may not be available in the bootstrap compiler,
> --
> 2.36.0
>


[Committed] PR testsuite/105486: Use "signed char" in gcc.dg/pr102950.c

2022-05-05 Thread Roger Sayle

Although the automated regression testing scripts for powerpc64 appear
to be somewhat garbled at the moment, they've correctly identified that
my new test case for pr102950.c is failing on powerpc64, as char by
default is unsigned on this target.  This patch tweaks the new testcase
by explicitly using "signed char" so that it's testing the intended EVRP
behaviour portably.  Committed as obvious.


2022-05-05  Roger Sayle  

gcc/testsuite/ChangeLog
PR testsuite/105486
* gcc.dg/pr102950.c: Use explicit "signed char" in test case.


Roger
--

diff --git a/gcc/testsuite/gcc.dg/pr102950.c b/gcc/testsuite/gcc.dg/pr102950.c
index 0ab23bd4dbc..317568370d4 100644
--- a/gcc/testsuite/gcc.dg/pr102950.c
+++ b/gcc/testsuite/gcc.dg/pr102950.c
@@ -3,9 +3,9 @@
 
 extern void link_error(void);
 
-static char a;
+static signed char a;
 static short d(unsigned e) {
-  char b;
+  signed char b;
   short c;
   a = b = e;
   if (b)


Re: [PATCH] Remove non-ANSI C path in ansidecl.h.

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 2:19 PM Martin Liška  wrote:
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> include/ChangeLog:
>
> * ansidecl.h (PTR): Remove Not ANCI C part.
> ---
>  include/ansidecl.h | 16 +---
>  1 file changed, 1 insertion(+), 15 deletions(-)
>
> diff --git a/include/ansidecl.h b/include/ansidecl.h
> index 4275c9b9cbd..f42c6afc7e9 100644
> --- a/include/ansidecl.h
> +++ b/include/ansidecl.h
> @@ -89,21 +89,7 @@ So instead we use the macro below and test it against 
> specific values.  */
>  # endif
>  #endif
>
> -#else  /* Not ANSI C.  */
> -
> -#define PTRchar *
> -
> -/* some systems define these in header files for non-ansi mode */
> -#undef const
> -#undef volatile
> -#undef signed
> -#undef inline
> -#define const
> -#define volatile
> -#define signed
> -#define inline
> -
> -#endif /* ANSI C.  */

You'd have to ask the sourceware side as well (binutils), but for sure
either the
guarding #if should be removed or the #else path should contain an #error.

Richard.

> +#endif
>
>  /* Define macros for some gcc attributes.  This permits us to use the
> macros freely, and know that they will come into play for the
> --
> 2.36.0
>


Re: [PATCH] Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 2:19 PM Martin Liška  wrote:
>
> Right now, the minimal required version of GCC is 4.8.x
> that is a version that well supports c++11.

Hmm, but we support C++11 host compilers that are not GCC but
may claim to be, with GCC_VERSION 4.2.x for example.  Are we sure
all those liars implement what we guard with the version checks?

I suppose to be "correct" we'd at least need to preserve
#if __GNUC__
in places where we might use the host compiler?  (if compilers then lie
it's their own fault)

Richard.

> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * bitmap.cc (bitmap_popcount):
> Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
> (bitmap_count_bits_in_word): Likewise.
> (bitmap_single_bit_set_p): Likewise.
> (bitmap_first_set_bit): Likewise.
> (bitmap_last_set_bit): Likewise.
> * bitmap.h (if): Likewise.
> * config/ia64/ia64.cc (RWS_FIELD_TYPE): Likewise.
> * config/rs6000/rs6000.h (if): Likewise.
> * defaults.h: Likewise.
> * diagnostic-core.h (if): Likewise.
> (ATTRIBUTE_GCC_DIAG): Likewise.
> * dwarf2cfi.cc (if): Likewise.
> * dwarf2out.cc (if): Likewise.
> (DWARF2_ASM_LINE_DEBUG_INFO): Likewise.
> (DWARF2_ASM_VIEW_DEBUG_INFO): Likewise.
> * gcc.cc (if): Likewise.
> * genautomata.cc (struct state_ainsn_table): Likewise.
> (regexp_mode_check_failed): Likewise.
> (REGEXP_ONEOF): Likewise.
> * genconditions.cc (write_header): Likewise.
> (write_writer): Likewise.
> * genmatch.cc: Likewise.
> * genmodes.cc (GCC_INSN_MODES_INLINE_H): Likewise.
> * genoutput.cc (output_insn_data): Likewise.
> * ggc-page.cc (if): Likewise.
> (prefetch): Likewise.
> (ggc_internal_alloc): Likewise.
> * ggc-tests.cc (test_finalization): Likewise.
> * ggc.h (need_finalization_p): Likewise.
> * hwint.cc (floor_log2): Likewise.
> (ceil_log2): Likewise.
> (exact_log2): Likewise.
> (ctz_hwi): Likewise.
> (clz_hwi): Likewise.
> (ffs_hwi): Likewise.
> (popcount_hwi): Likewise.
> * hwint.h (HAVE_LONG_LONG): Likewise.
> (SIZEOF_LONG_LONG): Likewise.
> (sizeof_long_long_must_be_8[sizeof): Likewise.
> (clz_hwi): Likewise.
> (ctz_hwi): Likewise.
> (ffs_hwi): Likewise.
> (popcount_hwi): Likewise.
> (exact_log2): Likewise.
> (floor_log2): Likewise.
> (ceil_log2): Likewise.
> * ira-int.h: Likewise.
> * machmode.h (mode_to_bytes): Likewise.
> (mode_to_inner): Likewise.
> (mode_to_unit_size): Likewise.
> (mode_to_unit_precision): Likewise.
> (mode_to_nunits): Likewise.
> * output.h (ATTRIBUTE_ASM_FPRINTF): Likewise.
> * pretty-print.h (ATTRIBUTE_GCC_PPDIAG): Likewise.
> * rtl.cc (dump_rtx_statistics): Likewise.
> * rtl.h (test): Likewise.
> (RTX_FLAG): Likewise.
> (enum label_kind): Likewise.
> * sbitmap.cc (sbitmap_popcount): Likewise.
> (bitmap_count_bits): Likewise.
> * stringpool.h (get_identifier_with_length): Likewise.
> * system.h (HAVE_DESIGNATED_INITIALIZERS): Likewise.
> (HAVE_DESIGNATED_UNION_INITIALIZERS): Likewise.
> (if): Likewise.
> (__FUNCTION__): Likewise.
> (__builtin_expect): Likewise.
> (elif): Likewise.
> (gcc_assert): Likewise.
> (ALWAYS_INLINE): Likewise.
> (WARN_UNUSED_RESULT): Likewise.
> (STATIC_CONSTANT_P): Likewise.
> (defined): Likewise.
> (BROKEN_VALUE_INITIALIZATION): Likewise.
> (DEBUG_FUNCTION): Likewise.
> (DEBUG_VARIABLE): Likewise.
> * tree-vrp.cc (vrp_asserts::find_switch_asserts): Likewise.
> * tree.cc (get_file_function_name): Likewise.
> * tree.h (as_internal_fn): Likewise.
> (if): Likewise.
> (DECL_RTL_KNOWN_SET): Likewise.
> (prepare_target_option_nodes_for_pch): Likewise.
> (tree_operand_length): Likewise.
> (tree_to_poly_uint64): Likewise.
> * var-tracking.cc (int_mem_offset): Likewise.
> * vec.h (if): Likewise.
> * wide-int.cc (defined): Likewise.
> (if): Likewise.
>
> gcc/cp/ChangeLog:
>
> * cp-tree.h (BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK):
> Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
> (STRIP_TEMPLATE): Likewise.
> * tree.cc (cp_tree_c_finish_parsing): Likewise.
>
> gcc/fortran/ChangeLog:
>
> * gfortran.h (ATTRIBUTE_GCC_GFC):
> Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
>
> gcc/jit/ChangeLog:
>
> * jit-common.h (GNU_PRINTF):
> Fold GCC_VERSION >= $old_version to TRUE, otherwise 

[PATCH] Remove conditional STATIC_ASSERT.

2022-05-05 Thread Martin Liška
As we require a c++11 compliant compiler, the #if __cplusplus >= 201103L
conditional build is always true.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* basic-block.h (STATIC_ASSERT): Use normal STATIC_ASSERT.
* system.h (STATIC_ASSERT): Define always as static_assert.
---
 gcc/basic-block.h | 5 +
 gcc/system.h  | 9 +
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index e3fff1f6975..21a9b24dbf9 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -158,10 +158,7 @@ struct GTY((chain_next ("%h.next_bb"), chain_prev 
("%h.prev_bb"))) basic_block_d
 /* This ensures that struct gimple_bb_info is smaller than
struct rtl_bb_info, so that inlining the former into basic_block_def
is the better choice.  */
-typedef int __assert_gimple_bb_smaller_rtl_bb
-  [(int) sizeof (struct rtl_bb_info)
-   - (int) sizeof (struct gimple_bb_info)];
-
+STATIC_ASSERT (sizeof (rtl_bb_info) >= sizeof (gimple_bb_info));
 
 #define BB_FREQ_MAX 1
 
diff --git a/gcc/system.h b/gcc/system.h
index 1688b763ef5..48145951337 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -801,14 +801,7 @@ extern void fancy_abort (const char *, int, const char *)
 
 #define STATIC_CONSTANT_P(X) (__builtin_constant_p (X) && (X))
 
-/* static_assert (COND, MESSAGE) is available in C++11 onwards.  */
-#if __cplusplus >= 201103L
-#define STATIC_ASSERT(X) \
-  static_assert ((X), #X)
-#else
-#define STATIC_ASSERT(X) \
-  typedef int assertion1[(X) ? 1 : -1] ATTRIBUTE_UNUSED
-#endif
+#define STATIC_ASSERT(X) static_assert ((X), #X)
 
 /* Provide a fake boolean type.  We make no attempt to use the
C99 _Bool, as it may not be available in the bootstrap compiler,
-- 
2.36.0



[PATCH] Remove non-ANSI C path in ansidecl.h.

2022-05-05 Thread Martin Liška
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

include/ChangeLog:

* ansidecl.h (PTR): Remove Not ANCI C part.
---
 include/ansidecl.h | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/include/ansidecl.h b/include/ansidecl.h
index 4275c9b9cbd..f42c6afc7e9 100644
--- a/include/ansidecl.h
+++ b/include/ansidecl.h
@@ -89,21 +89,7 @@ So instead we use the macro below and test it against 
specific values.  */
 # endif
 #endif
 
-#else  /* Not ANSI C.  */
-
-#define PTRchar *
-
-/* some systems define these in header files for non-ansi mode */
-#undef const
-#undef volatile
-#undef signed
-#undef inline
-#define const
-#define volatile
-#define signed
-#define inline
-
-#endif /* ANSI C.  */
+#endif
 
 /* Define macros for some gcc attributes.  This permits us to use the
macros freely, and know that they will come into play for the
-- 
2.36.0



[PATCH] Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.

2022-05-05 Thread Martin Liška
Right now, the minimal required version of GCC is 4.8.x
that is a version that well supports c++11.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* bitmap.cc (bitmap_popcount):
Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
(bitmap_count_bits_in_word): Likewise.
(bitmap_single_bit_set_p): Likewise.
(bitmap_first_set_bit): Likewise.
(bitmap_last_set_bit): Likewise.
* bitmap.h (if): Likewise.
* config/ia64/ia64.cc (RWS_FIELD_TYPE): Likewise.
* config/rs6000/rs6000.h (if): Likewise.
* defaults.h: Likewise.
* diagnostic-core.h (if): Likewise.
(ATTRIBUTE_GCC_DIAG): Likewise.
* dwarf2cfi.cc (if): Likewise.
* dwarf2out.cc (if): Likewise.
(DWARF2_ASM_LINE_DEBUG_INFO): Likewise.
(DWARF2_ASM_VIEW_DEBUG_INFO): Likewise.
* gcc.cc (if): Likewise.
* genautomata.cc (struct state_ainsn_table): Likewise.
(regexp_mode_check_failed): Likewise.
(REGEXP_ONEOF): Likewise.
* genconditions.cc (write_header): Likewise.
(write_writer): Likewise.
* genmatch.cc: Likewise.
* genmodes.cc (GCC_INSN_MODES_INLINE_H): Likewise.
* genoutput.cc (output_insn_data): Likewise.
* ggc-page.cc (if): Likewise.
(prefetch): Likewise.
(ggc_internal_alloc): Likewise.
* ggc-tests.cc (test_finalization): Likewise.
* ggc.h (need_finalization_p): Likewise.
* hwint.cc (floor_log2): Likewise.
(ceil_log2): Likewise.
(exact_log2): Likewise.
(ctz_hwi): Likewise.
(clz_hwi): Likewise.
(ffs_hwi): Likewise.
(popcount_hwi): Likewise.
* hwint.h (HAVE_LONG_LONG): Likewise.
(SIZEOF_LONG_LONG): Likewise.
(sizeof_long_long_must_be_8[sizeof): Likewise.
(clz_hwi): Likewise.
(ctz_hwi): Likewise.
(ffs_hwi): Likewise.
(popcount_hwi): Likewise.
(exact_log2): Likewise.
(floor_log2): Likewise.
(ceil_log2): Likewise.
* ira-int.h: Likewise.
* machmode.h (mode_to_bytes): Likewise.
(mode_to_inner): Likewise.
(mode_to_unit_size): Likewise.
(mode_to_unit_precision): Likewise.
(mode_to_nunits): Likewise.
* output.h (ATTRIBUTE_ASM_FPRINTF): Likewise.
* pretty-print.h (ATTRIBUTE_GCC_PPDIAG): Likewise.
* rtl.cc (dump_rtx_statistics): Likewise.
* rtl.h (test): Likewise.
(RTX_FLAG): Likewise.
(enum label_kind): Likewise.
* sbitmap.cc (sbitmap_popcount): Likewise.
(bitmap_count_bits): Likewise.
* stringpool.h (get_identifier_with_length): Likewise.
* system.h (HAVE_DESIGNATED_INITIALIZERS): Likewise.
(HAVE_DESIGNATED_UNION_INITIALIZERS): Likewise.
(if): Likewise.
(__FUNCTION__): Likewise.
(__builtin_expect): Likewise.
(elif): Likewise.
(gcc_assert): Likewise.
(ALWAYS_INLINE): Likewise.
(WARN_UNUSED_RESULT): Likewise.
(STATIC_CONSTANT_P): Likewise.
(defined): Likewise.
(BROKEN_VALUE_INITIALIZATION): Likewise.
(DEBUG_FUNCTION): Likewise.
(DEBUG_VARIABLE): Likewise.
* tree-vrp.cc (vrp_asserts::find_switch_asserts): Likewise.
* tree.cc (get_file_function_name): Likewise.
* tree.h (as_internal_fn): Likewise.
(if): Likewise.
(DECL_RTL_KNOWN_SET): Likewise.
(prepare_target_option_nodes_for_pch): Likewise.
(tree_operand_length): Likewise.
(tree_to_poly_uint64): Likewise.
* var-tracking.cc (int_mem_offset): Likewise.
* vec.h (if): Likewise.
* wide-int.cc (defined): Likewise.
(if): Likewise.

gcc/cp/ChangeLog:

* cp-tree.h (BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK):
Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
(STRIP_TEMPLATE): Likewise.
* tree.cc (cp_tree_c_finish_parsing): Likewise.

gcc/fortran/ChangeLog:

* gfortran.h (ATTRIBUTE_GCC_GFC):
Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.

gcc/jit/ChangeLog:

* jit-common.h (GNU_PRINTF):
Fold GCC_VERSION >= $old_version to TRUE, otherwise to FALSE.
---
 gcc/bitmap.cc  |  73 +-
 gcc/bitmap.h   |  18 ++-
 gcc/config/ia64/ia64.cc|   5 +-
 gcc/config/rs6000/rs6000.h |   2 -
 gcc/cp/cp-tree.h   |   4 +-
 gcc/cp/tree.cc |   2 +-
 gcc/defaults.h |   2 +-
 gcc/diagnostic-core.h  |   4 --
 gcc/dwarf2cfi.cc   |   4 +-
 gcc/dwarf2out.cc   |  16 ++
 gcc/fortran/gfortran.h |   4 --
 gcc/gcc.cc |   2 -
 gcc/genautomata.cc |   6 +--
 gcc/genconditions.cc   |   9 +---
 gcc/genmatch.cc|  12 -
 gcc/genmodes.cc|   4 +-
 

[PATCH] Add operators / and * for profile_{count,probability}.

2022-05-05 Thread Martin Liška
The patch simplifies usage of the profile_{count,probability} types.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* bb-reorder.cc (find_traces_1_round): Add operators / and * and
use them.
(better_edge_p): Likewise.
* cfgloop.cc (find_subloop_latch_edge_by_profile): Likewise.
* cfgloopmanip.cc (scale_loop_profile): Likewise.
* cfgrtl.cc (force_nonfallthru_and_redirect): Likewise.
* cgraph.cc (cgraph_edge::maybe_hot_p): Likewise.
* config/sh/sh.cc (expand_cbranchdi4): Likewise.
* dojump.cc (do_compare_rtx_and_jump): Likewise.
* final.cc (compute_alignments): Likewise.
* ipa-cp.cc (update_counts_for_self_gen_clones): Likewise.
(decide_about_value): Likewise.
* ipa-inline-analysis.cc (do_estimate_edge_time): Likewise.
* loop-unroll.cc (unroll_loop_runtime_iterations): Likewise.
* modulo-sched.cc (sms_schedule): Likewise.
* omp-expand.cc (extract_omp_for_update_vars): Likewise.
(expand_omp_ordered_sink): Likewise.
(expand_omp_for_ordered_loops): Likewise.
(expand_omp_for_static_nochunk): Likewise.
* predict.cc (maybe_hot_count_p): Likewise.
(probably_never_executed): Likewise.
(set_even_probabilities): Likewise.
(handle_missing_profiles): Likewise.
(expensive_function_p): Likewise.
* profile-count.h: Likewise.
* profile.cc (compute_branch_probabilities): Likewise.
* stmt.cc (emit_case_dispatch_table): Likewise.
* symtab-thunks.cc (expand_thunk): Likewise.
* tree-ssa-loop-manip.cc (tree_transform_and_unroll_loop): Likewise.
* tree-ssa-sink.cc (select_best_block): Likewise.
* tree-switch-conversion.cc 
(switch_decision_tree::analyze_switch_statement): Likewise.
(switch_decision_tree::balance_case_nodes): Likewise.
(switch_decision_tree::emit_case_nodes): Likewise.
* tree-vect-loop.cc (scale_profile_for_vect_loop): Likewise.
---
 gcc/bb-reorder.cc |  6 ++--
 gcc/cfgloop.cc|  2 +-
 gcc/cfgloopmanip.cc   |  5 ++--
 gcc/cfgrtl.cc |  4 +--
 gcc/cgraph.cc |  5 ++--
 gcc/config/sh/sh.cc   |  2 +-
 gcc/dojump.cc |  2 +-
 gcc/final.cc  | 12 +++-
 gcc/ipa-cp.cc | 10 +++
 gcc/ipa-inline-analysis.cc|  2 +-
 gcc/loop-unroll.cc|  8 +++---
 gcc/modulo-sched.cc   | 20 ++---
 gcc/omp-expand.cc | 24 ++--
 gcc/predict.cc| 17 ++-
 gcc/profile-count.h   | 46 --
 gcc/profile.cc|  5 ++--
 gcc/stmt.cc   |  5 ++--
 gcc/symtab-thunks.cc  | 10 +++
 gcc/tree-ssa-loop-manip.cc| 11 
 gcc/tree-ssa-sink.cc  |  3 +-
 gcc/tree-switch-conversion.cc | 53 +--
 gcc/tree-vect-loop.cc |  5 ++--
 22 files changed, 137 insertions(+), 120 deletions(-)

diff --git a/gcc/bb-reorder.cc b/gcc/bb-reorder.cc
index d20ccb83aa6..6600f44d4d7 100644
--- a/gcc/bb-reorder.cc
+++ b/gcc/bb-reorder.cc
@@ -761,7 +761,7 @@ find_traces_1_round (int branch_th, profile_count count_th,
& EDGE_CAN_FALLTHRU)
&& !(single_succ_edge (e->dest)->flags & EDGE_COMPLEX)
&& single_succ (e->dest) == best_edge->dest
-   && (e->dest->count.apply_scale (2, 1)
+   && (e->dest->count * 2
>= best_edge->count () || for_size))
  {
best_edge = e;
@@ -944,7 +944,7 @@ better_edge_p (const_basic_block bb, const_edge e, 
profile_probability prob,
 
   /* The BEST_* values do not have to be best, but can be a bit smaller than
  maximum values.  */
-  profile_probability diff_prob = best_prob.apply_scale (1, 10);
+  profile_probability diff_prob = best_prob / 10;
 
   /* The smaller one is better to keep the original order.  */
   if (optimize_function_for_size_p (cfun))
@@ -966,7 +966,7 @@ better_edge_p (const_basic_block bb, const_edge e, 
profile_probability prob,
 is_better_edge = false;
   else
 {
-  profile_count diff_count = best_count.apply_scale (1, 10);
+  profile_count diff_count = best_count / 10;
   if (count < best_count - diff_count
  || (!best_count.initialized_p ()
  && count.nonzero_p ()))
diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index 5ffcc77d93f..57bf7b1855d 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -619,7 +619,7 @@ find_subloop_latch_edge_by_profile (vec latches)
 }
 
   if (!tcount.initialized_p () || !(tcount.ipa () > HEAVY_EDGE_MIN_SAMPLES)
-  || (tcount - mcount).apply_scale (HEAVY_EDGE_RATIO, 1) > tcount)
+  || (tcount - mcount) 

[PATCH] Use more ARRAY_SIZE.

2022-05-05 Thread Martin Liška
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ada/ChangeLog:

* locales.c (iso_639_1_to_639_3): Use ARRAY_SIZE.
(language_name_to_639_3): Likewise.
(country_name_to_3166): Likewise.

gcc/analyzer/ChangeLog:

* engine.cc (exploded_node::get_dot_fillcolor): Use ARRAY_SIZE.
* function-set.cc (test_stdio_example): Likewise.
* sm-file.cc (get_file_using_fns): Likewise.
* sm-malloc.cc (malloc_state_machine::unaffected_by_call_p): Likewise.
* sm-signal.cc (get_async_signal_unsafe_fns): Likewise.

gcc/ChangeLog:

* attribs.cc (diag_attr_exclusions): Use ARRAY_SIZE.
(decls_mismatched_attributes): Likewise.
* builtins.cc (c_strlen): Likewise.
* cfg.cc (DEF_BASIC_BLOCK_FLAG): Likewise.
* common/config/aarch64/aarch64-common.cc (aarch64_option_init_struct): 
Likewise.
* config/aarch64/aarch64-builtins.cc 
(aarch64_lookup_simd_builtin_type): Likewise.
(aarch64_init_simd_builtin_types): Likewise.
(aarch64_init_builtin_rsqrt): Likewise.
* config/aarch64/aarch64.cc (is_madd_op): Likewise.
* config/arm/arm-builtins.cc (arm_lookup_simd_builtin_type): Likewise.
(arm_init_simd_builtin_types): Likewise.
* config/avr/gen-avr-mmcu-texi.cc (mcus[ARRAY_SIZE): Likewise.
(c_prefix): Likewise.
(main): Likewise.
* config/c6x/c6x.cc (N_SAVE_ORDER): Likewise.
* config/darwin-c.cc (darwin_register_frameworks): Likewise.
* config/gcn/mkoffload.cc (process_obj): Likewise.
* config/i386/i386-builtins.cc (get_builtin_code_for_version): Likewise.
(fold_builtin_cpu): Likewise.
* config/m32c/m32c.cc (PUSHM_N): Likewise.
* config/nvptx/mkoffload.cc (process): Likewise.
* config/rs6000/driver-rs6000.cc (host_detect_local_cpu): Likewise.
* config/rs6000/rs6000.cc (rs6000_hash_constant): Likewise.
* config/s390/s390.cc (NR_C_MODES): Likewise.
* config/tilepro/gen-mul-tables.cc (find_sequences): Likewise.
(create_insn_code_compression_table): Likewise.
* config/vms/vms.cc (NBR_CRTL_NAMES): Likewise.
* diagnostic-format-json.cc (json_from_expanded_location): Likewise.
* dwarf2out.cc (ARRAY_SIZE): Likewise.
* genautomata.cc (output_get_cpu_unit_code_func): Likewise.
* genhooks.cc (emit_documentation): Likewise.
(emit_init_macros): Likewise.
* gimple-ssa-sprintf.cc (format_floating): Likewise.
* gimple-ssa-warn-access.cc (memmodel_name): Likewise.
* godump.cc (keyword_hash_init): Likewise.
* hash-table.cc (hash_table_higher_prime_index): Likewise.
* input.cc (for_each_line_table_case): Likewise.
* ipa-free-lang-data.cc (free_lang_data): Likewise.
* ipa-inline.cc (sanitize_attrs_match_for_inline_p): Likewise.
* optc-save-gen.awk: Likewise.
* spellcheck.cc (test_metric_conditions): Likewise.
* sreal.cc (sreal_verify_arithmetics): Likewise.
(sreal_verify_shifting): Likewise.
* tree-vect-slp-patterns.cc (sizeof): Likewise.
(ARRAY_SIZE): Likewise.
* tree.cc (build_common_tree_nodes): Likewise.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_access_attribute): Use ARRAY_SIZE.
* c-common.cc (ARRAY_SIZE): Likewise.
(c_common_nodes_and_builtins): Likewise.
* c-format.cc (check_tokens): Likewise.
(check_plain): Likewise.
* c-pragma.cc (c_pp_lookup_pragma): Likewise.
(init_pragma): Likewise.
* c-warn.cc (warn_parm_array_mismatch): Likewise.
* known-headers.cc (get_string_macro_hint): Likewise.
(get_stdlib_header_for_name): Likewise.

gcc/c/ChangeLog:

* c-decl.cc (match_builtin_function_types): Use ARRAY_SIZE.

gcc/cp/ChangeLog:

* module.cc (depset::entity_kind_name): Use ARRAY_SIZE.
* name-lookup.cc (get_std_name_hint): Likewise.
* parser.cc (cp_parser_new): Likewise.

gcc/d/ChangeLog:

* longdouble.h: Use ARRAY_SIZE.

gcc/fortran/ChangeLog:

* frontend-passes.cc (gfc_code_walker): Use ARRAY_SIZE.
* openmp.cc (gfc_match_omp_context_selector_specification): Likewise.
* trans-intrinsic.cc (conv_intrinsic_ieee_builtin): Likewise.
* trans-types.cc (gfc_get_array_descr_info): Likewise.

gcc/jit/ChangeLog:

* jit-builtins.cc (find_builtin_by_name): Use ARRAY_SIZE.
(get_string_for_type_id): Likewise.
* jit-recording.cc (recording::context::context): Likewise.

gcc/lto/ChangeLog:

* lto-common.cc (lto_resolution_read): Use ARRAY_SIZE.
* lto-lang.cc (lto_init): Likewise.
---
 gcc/ada/locales.c   |  6 +++---
 gcc/analyzer/engine.cc  |  2 +-
 gcc/analyzer/function-set.cc|  2 +-
 gcc/analyzer/sm-file.cc |  3 +--
 

Re: [PATCH] ipa-visibility: Optimize TLS access [PR99619]

2022-05-05 Thread Jan Hubicka via Gcc-patches
> On Thu, 5 May 2022, Jan Hubicka wrote:
> 
> > Also note that visibility pass is run twice (once at compile time before
> > early optimizations and then again at LTO). Since LTO linking may
> > promote public symbols to local/hidden, perhaps we want to do this only
> > second time the pass is executed?
> 
> The first pass appears to be redundant and could be avoided, yes.
> 
> > What is the reson we avoid using LOCAL_DYNAMIC with !optimize while we
> > are happy to use LOCAL_EXEC with !optimize !flag_shlib?
> 
> It follows from how local-dynamic model is defined: we call __tls_get_addr
> with an argument that identifies the current DSO (not the individual
> thread-local variable), and then compute the address of the variable with
> a simple addition, so when there are two or more TLS variables, we can
> call __tls_get_addr just once (but at -O0 we will end up with redundant
> calls).

Thanks for explanation.
So this is something that really depends on optimization flags of the
function referring the variable rather than on optimization flags of the
variable itself and only makes difference if there is -O0 function that
contains more than one reference to a TLS var?

I guess then a correct answer would be to search for such references.
What happens when there are multiple object files with a hidden TLS var
where some gts LOCAL_DYNAMIC and others GLOBAL_DYNAMIC? (Which may
happen when linking together object files compiled with different
versions of compiler if we go ahead with this patch on hidden symbols).

Honza
> 
> There's no such concern for local-exec vs initial-exec promotion.
> 
> Alexander


Re: [PATCH] ipa-visibility: Optimize TLS access [PR99619]

2022-05-05 Thread Alexander Monakov
On Thu, 5 May 2022, Jan Hubicka wrote:

> Also note that visibility pass is run twice (once at compile time before
> early optimizations and then again at LTO). Since LTO linking may
> promote public symbols to local/hidden, perhaps we want to do this only
> second time the pass is executed?

The first pass appears to be redundant and could be avoided, yes.

> What is the reson we avoid using LOCAL_DYNAMIC with !optimize while we
> are happy to use LOCAL_EXEC with !optimize !flag_shlib?

It follows from how local-dynamic model is defined: we call __tls_get_addr
with an argument that identifies the current DSO (not the individual
thread-local variable), and then compute the address of the variable with
a simple addition, so when there are two or more TLS variables, we can
call __tls_get_addr just once (but at -O0 we will end up with redundant
calls).

There's no such concern for local-exec vs initial-exec promotion.

Alexander


Re: [PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections

2022-05-05 Thread Julian Brown
On Thu, 5 May 2022 10:52:57 +0200
Jakub Jelinek via Gcc-patches  wrote:

> On Mon, Feb 21, 2022 at 11:18:57PM +0800, Chung-Lin Tang wrote:
> > as encountered in cases where a program constructs its own
> > deep-copying for arrays-of-pointers, e.g:
> > 
> >#pragma omp target enter data map(to:level->vectors[:N])
> >for (i = 0; i < N; i++)
> >  #pragma omp target enter data map(to:level->vectors[i][:N])
> > 
> > We need to treat the part of the array reference before the array
> > section as a base-pointer (here 'level->vectors[i]'), providing
> > pointer-attachment behavior.
> > 
> > This patch adds this inside handle_omp_array_sections(), tracing
> > the whole sequence of array dimensions, creating a whole
> > base-pointer reference iteratively using build_array_ref(). The
> > conditions are that each of the "absorbed" dimensions must be
> > length==1, and the final reference must be of pointer-type (so that
> > pointer attachment makes sense).
> > 
> > There's also a little patch in gimplify_scan_omp_clauses(), to make
> > sure the array-ref base-pointer goes down the right path.
> > 
> > This case was encountered when working to make 534.hpgmgfv_t from
> > SPEChpc 2021 properly compile. Tested without regressions on trunk.
> > Okay to go in once stage1 opens?  
> 
> I'm afraid this is going in the wrong direction.  The OpenMP 5.0
> change that:
> "A locator list item is any lvalue expression, including variables,
> or an array section."
> is much more general than just allowing -> in the expressions, there
> can be function calls and many other.  So, the more code like this
> patch we add, the more we'll need to throw away again.  And as it is
> in OpenMP 5.0, we really need to throw it away in the GCC 13 cycle.
> 
> So, what we really need is add OMP_ARRAY_SECTION tree code, some
> parser flag to know that we are inside of an OpenMP clause that
> allows array sections, and just where we normally parse ARRAY_REFs if
> that flag is on also parse array sections and parse the map/to/from
> clauses just as normal expressions.

All the above (at least) has been done as part of the patch series
posted here:

https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591973.html

> At least for the C FE maybe we'll
> need to arrange for less folding to be done because C still folds too
> much stuff prematurely. Then when finishing clauses verify that
> OMP_ARRAY_SECTION trees appear only where we allow them and not
> elsewhere (say foo (1, 2, 3)[:36]
> would be ok if foo returns a pointer, but
> foo (ptr[0:13], 2, 3)
> would not) and then need to differentiate between the cases listed in
> the standard which we handle for each . -> [idx] when starting from a
> var (in such a case I vaguely recall there are rules for pointer
> attachments etc.) or other arbitrary expressions (in that case we
> just evaluate those expressions and e.g. in the foo (1, 2, 3)[:36]
> case basically do tmp = foo (1, 2, 3);
> and mapping of tmp[:36].

...which also changes/refactors quite a lot regarding how lowering
clauses into mapping nodes works (the "address inspector" bits).
"Weird" cases like mapping the return value from functions doesn't
necessarily DTRT yet -- it wasn't entirely clear how that should/could
work, I don't think.

HTH,

Julian


Re: [PATCH] libsanitizer: cherry-pick commit f52e365092aa from upstream

2022-05-05 Thread Martin Liška
On 5/5/22 01:07, H.J. Lu wrote:
> On Wed, May 4, 2022 at 1:59 AM Martin Liška  wrote:
>> 
>> Hello.
>> 
>> I'm going to do merge from upstream.
>> 
>> Patch can bootstrap on x86_64-linux-gnu and survives regression
>> tests. I've also tested on ppc64le-linux-gnu and verified the ABI.
>> 
>> The only real change is a small change in 
>> gcc/testsuite/c-c++-common/asan/alloca_loop_unpoisoning.c where we
>> need --param=asan-use-after-return=0.
>> 
>> I'm going to push the patches.
> 
> Hi,
> 
> I am checking in this patch to cherry-pick
> 
> f52e365092aa [sanitizer] Use newfstatat for x32
> 
> to restore x32 build.
> 

I'm going to do one more merge from upstream
(75f9e83ace52773af65dcebca543005ec8a2705d) as we want to include Tobias's
revision 6f095babc2b7d564168c7afc5bf6afb2188fd6b4 and my
revision f1b9245199f3457a4d06d32d1bc6e44573c166e3.

Martin


[PATCH v2][GCC] arm: Add support for dwarf debug directives and pseudo hard-register for PAC feature.

2022-05-05 Thread Srinath Parvathaneni via Gcc-patches
Hello,

This patch teaches the DWARF support in gcc about RA_AUTH_CODE pseudo 
hard-register and also 
.save {ra_auth_code} and .cfi_offset ra_auth_code  dwarf directives for 
the PAC feature
in Armv8.1-M architecture.

RA_AUTH_CODE register number is 107 and it's dwarf register number is 143.

When compiled with " -march=armv8.1-m.main -mbranch-protection=pac-ret+leaf+bti 
-mthumb
-mfloat-abi=soft -fasynchronous-unwind-tables -g -O2 -S" command line options, 
the assembly
output after this patch looks like below:

...
.cfi_startproc
pacbti  ip, lr, sp
movsr1, #40
push{ip, lr}
.save {ra_auth_code, lr}
.cfi_def_cfa_offset 8
.cfi_offset 143, -8
.cfi_offset 14, -4
...
pop {ip, lr}
.cfi_restore 14
.cfi_restore 143
.cfi_def_cfa_offset 0
movsr0, #0
aut ip, lr, sp
bx  lr
.cfi_endproc
...

This patch can be committed after the patch at 
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583407.html
is committed.

Regression tested on arm-none-eabi target and found no regressions.

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2022-04-06  Srinath Parvathaneni  

* config/arm/aout.h (ra_auth_code): Add to enum.
* config/arm/arm.cc (emit_multi_reg_push): Add RA_AUTH_CODE register to
dwarf frame expression.
(arm_emit_multi_reg_pop): Restore RA_AUTH_CODE register.
(arm_expand_prologue): Mark as frame related insn.
(arm_regno_class): Check for pac pseudo reigster.
(arm_dbx_register_number): Assign ra_auth_code register number in dwarf.
(arm_unwind_emit_sequence): Print .save directive with ra_auth_code
register.
(arm_conditional_register_usage): Mark ra_auth_code in fixed reigsters.
* config/arm/arm.h (FIRST_PSEUDO_REGISTER): Modify.
(IS_PAC_Pseudo_REGNUM): Define.
(enum reg_class): Add PAC_REG entry.
* config/arm/arm.md (RA_AUTH_CODE): Define.

gcc/testsuite/ChangeLog:

2022-04-06  Srinath Parvathaneni  

* g++.target/arm/pac-1.C: New test.
* gcc.target/arm/pac-9.c: Likewise.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 
b918ad3782fbee82320febb8b6e72ad615780261..ffeed45a678f17c63d5b42c21f020ca416cbf23f
 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -74,7 +74,8 @@
   "wr8",   "wr9",   "wr10",  "wr11",   \
   "wr12",  "wr13",  "wr14",  "wr15",   \
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",  \
-  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0" \
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0",\
+  "ra_auth_code"   \
 }
 #endif
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
3495ab857eac38ecdf37e55f1d201b1c35cbde0b..c7067819f6785e44d30d8e5365505ab98682
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -816,7 +816,8 @@ extern const int arm_arch_cde_coproc_bits[];
s16-s31   S VFP variable (aka d8-d15).
vfpcc   Not a real register.  Represents the VFP condition
code flags.
-   vpr Used to represent MVE VPR predication.  */
+   vpr Used to represent MVE VPR predication.
+   ra_auth_codePseudo register to save PAC.  */
 
 /* The stack backtrace structure is as follows:
   fp points to here:  |  save code pointer  |  [fp]
@@ -857,7 +858,7 @@ extern const int arm_arch_cde_coproc_bits[];
   1,1,1,1,1,1,1,1, \
   1,1,1,1, \
   /* Specials.  */ \
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
 }
 
 /* 1 for registers not available across function calls.
@@ -887,7 +888,7 @@ extern const int arm_arch_cde_coproc_bits[];
   1,1,1,1,1,1,1,1, \
   1,1,1,1, \
   /* Specials.  */ \
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
 }
 
 #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE
@@ -1063,10 +1064,10 @@ extern const int arm_arch_cde_coproc_bits[];
&& (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
 
 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
+   + 1 APSRQ + 1 APSRGE + 1 VPR + 1 Pseudo register to save PAC.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
-#define FIRST_PSEUDO_REGISTER   107
+#define FIRST_PSEUDO_REGISTER   108
 
 #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO)
 
@@ -1253,12 +1254,15 @@ extern int arm_regs_in_sequence[];
   CC_REGNUM, VFPCC_REGNUM, \
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,\
   SP_REGNUM, 

Re: [PATCH] lto-plugin: add support for feature detection

2022-05-05 Thread Alexander Monakov
On Thu, 5 May 2022, Richard Biener via Gcc-patches wrote:

> > I think they should simply try to not register LDPT_GET_SYMBOLS or
> > LDPT_GET_SYMBOLS_V2 with the plugin in the onload hook and if
> > that fails they will know the plugin doesn't support V3 only.  I suppose
> > it should work to call onload() multiple times (when only increasing the
> > set of supported functions) until it returns LDPS_OK without intermediately
> > dlclosing it (maybe call cleanup_handler inbertween).  This should work
> > for old plugin versions.
> >
> > That said, a better API extension compared to adding some random
> > symbol like you propose is to enhance the return value from onload(),
> > maybe returning an alternate transfer vector specifying symbol entries
> > that will not be used (or return a transfer vector that will be used).
> > We've been mostly versioning the symbol related hooks here.
> >
> > That said, I do not like at all this proposed add of a special symbol
> > to flag exclusive v3 use.  That's a hack and not extensible at all.
> 
> Speaking of which, onload_v2 would be in order and should possibly
> return some instantiation handle of the plugin that the linker could
> instruct the plugin to dispose (reset ()?).  I see the GCC implementation
> of the plugin just has a single global state and it doesn't seem that it's
> prepared for multiple onload() calls (but it might work by accident if
> you never remove things from the support vector but only add).
> 
> Without revamping the whole API onload_v2 could set the current
> global state for the plugin based on the transfer vector and the reset()
> API would discard the state (might also be redundant and implicitely
> performed by the next onload_v2 call).
> 
> onload_v2 could then also have an "output" transfer vector where the
> plugin simply copies the entries it picked and dropped those it will
> never call.  We'd document the plugin may only pick _one_ of the versioned
> API variants.
> 
> That said, the heuristic outlined above might still work with the present
> onload() API and existing implementations.

Feels a bit weird to ask, but before entertaining such an API extension,
can we step back and understand the v3 variant of get_symbols? It is not
documented, and from what little I saw I did not get the "motivation" for
its existence (what it is doing that couldn't be done with the v2 api).

To me lack of documentation looks like a serious issue :/

Likewise, I don't really understand why mold cannot be flexible and
efficiently service both the v2 and v3 variants without committing 
to one of those in advance.

Alexander


Re: [PATCH] ipa-visibility: Optimize TLS access [PR99619]

2022-05-05 Thread Jan Hubicka via Gcc-patches
> > @@ -872,6 +872,22 @@ function_and_variable_visibility (bool whole_program)
> > }
> > }
> >  }
> > +  FOR_EACH_VARIABLE (vnode)
> > +{
> > +  tree decl = vnode->decl;
> > +  
> > +  /* Optimize TLS model based on visibility (taking into account
> > + optimizations done in the preceding loop), unless it was
> > + specified explicitly.  */
> > +  
> > +  if (DECL_THREAD_LOCAL_P (decl)
> > +  && !lookup_attribute ("tls_model", DECL_ATTRIBUTES (decl)))
> > +{
> > +  enum tls_model new_model = decl_default_tls_model (decl);
> > +  gcc_checking_assert (new_model >= decl_tls_model (decl));
> > +  set_decl_tls_model (decl, new_model);
> > +}
> > +}
> >  
> 
> decl_default_tls_model depends on the global optimize flag, which is
> almost always problematic in IPA passes.  I was able to make your patch
> ICE using the vis-attr-hidden.c testcase from your patch with:
> 
>   mjambor@virgil:~/gcc/small/tests/tls$ ~/gcc/small/inst/bin/gcc -O2 -fPIC 
> -flto -c vis-attr-hidden.c
>   mjambor@virgil:~/gcc/small/tests/tls$ ~/gcc/small/inst/bin/gcc -fPIC -O0 
> -shared -flto vis-attr-hidden.o
>   during IPA pass: whole-program
>   lto1: internal compiler error: in function_and_variable_visibility, at 
> ipa-visibility.cc:888
>   0x25f48c0 function_and_variable_visibility
>   /home/mjambor/gcc/small/src/gcc/ipa-visibility.cc:888
>   0x25f4b33 whole_program_function_and_variable_visibility
>   /home/mjambor/gcc/small/src/gcc/ipa-visibility.cc:938
>   0x25f4bda execute
>   /home/mjambor/gcc/small/src/gcc/ipa-visibility.cc:986
>   Please submit a full bug report, with preprocessed source (by using 
> -freport-bug).
>   Please include the complete backtrace with any bug report.
>   See  for instructions.
>   lto-wrapper: fatal error: /home/mjambor/gcc/small/inst/bin/gcc returned 1 
> exit status
>   compilation terminated.
>   /usr/bin/ld: error: lto-wrapper failed
>   collect2: error: ld returned 1 exit status
> 
> Note the use of LTO, mismatching -O flags and the -shared flag in the
> link step.

Also note that visibility pass is run twice (once at compile time before
early optimizations and then again at LTO). Since LTO linking may
promote public symbols to local/hidden, perhaps we want to do this only
second time the pass is executed?
> 
> A simple but somewhat lame way to avoid the ICE would be to run your
> loop over variables only from pass_ipa_function_and_variable_visibility
> and not from pass_ipa_whole_program_visibility.
> 
> I am afraid a real solution would involve copying relevant entries from
> global_options to the symtab node representing the variable when it is
> created/finalized, properly streaming them for LTO, and modifying
> decl_default_tls_model to rely on those rather than global_options
> itself.
> 
> But maybe Honza has some other clever idea.

Hmm, I think it is similar to the semantic_interposition flag added
recently (and causing some interesting issues).

What is the reson we avoid using LOCAL_DYNAMIC with !optimize while we
are happy to use LOCAL_EXEC with !optimize !flag_shlib?

Honza
> 
> Also, please be careful not to unnecessarily commit trailing blank
> spaces, the empty lines in your patch are not really empty.
> 
> Martin


[PATCH] testsuite: Update Wconversion testcase check type.

2022-05-05 Thread jiawei
Some compiler target like arm-linux\riscv\power\s390x\xtensa-gcc handle 
char as unsigned char, then there are no warnings occur and got FAIL cases.
Just change the type char into explicit signed char to keep the feature
consistency.

gcc/testsuite/ChangeLog:

* c-c++-common/Wconversion-1.c: Update type.

---
 gcc/testsuite/c-c++-common/Wconversion-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/Wconversion-1.c 
b/gcc/testsuite/c-c++-common/Wconversion-1.c
index ed65918c70f..7053f6b5dbb 100644
--- a/gcc/testsuite/c-c++-common/Wconversion-1.c
+++ b/gcc/testsuite/c-c++-common/Wconversion-1.c
@@ -10,5 +10,5 @@ void g()
   signed char sc = 300; /* { dg-warning "conversion from .int. to .signed 
char. changes value from .300. to .44." } */
   unsigned char uc = 300; /* { dg-warning "conversion from .int. to .unsigned 
char. changes value from .300. to .44." } */
   unsigned char uc2 = 300u; /* { dg-warning "conversion from .unsigned int. to 
.unsigned char. changes value from .300. to .44." } */
-  char c2 = (double)1.0 + 200; /* { dg-warning "overflow in conversion from 
.double. to .char. changes value from .2.01e\\+2. to .127." } */
+  signed char c2 = (double)1.0 + 200; /* { dg-warning "overflow in conversion 
from .double. to .signed char. changes value from .2.01e\\+2. to .127." } */
 }
-- 
2.25.1



Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async

2022-05-05 Thread Tobias Burnus

On 05.05.22 10:30, Jakub Jelinek via Fortran wrote:

+  memcpy_t *a = args;
+  int ret = omp_target_memcpy_copy (a->dst, a->src, a->length, a->dst_offset,
+a->src_offset, a->dst_devicep,
+a->src_devicep);
+  if (ret)
+gomp_fatal ("asynchronous memcpy failed");


I wonder whether that should be 'omp_target_memcpy_async failed' or
similar to make clear that it comes from a user's API call.

Or "asynchronous memcpy API routine failed" to avoid a bit the issue of
...memcpy_async vs. ..._memcpy_rect_aysnc?


I'm not really sure killing the whole program if the copying failed is the
best action.  Has it been discussed on omp-lang?  Perhaps the APIs should
have a way how to propagate the result to the caller when it completes
somehow?


I think it hasn't been discussed – but the question is how to handle it
best with the current API. Namely, should it simply continue at the
taskwait? Having some way to communicate back that it failed would be
useful – either by a by-reference argument or some other more indirect
means.

I think aborting it bad – but not aborting and silently continuing is
likely to break as well. IMO, we the fatal is fine for now, but we might
need to come up with something on the spec side.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.

2022-05-05 Thread Jakub Jelinek via Gcc-patches
On Thu, May 05, 2022 at 11:45:19AM +0200, Tobias Burnus wrote:
> > On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote:
> > > +interface
> > > +  function omp_target_is_accessible (ptr, size, device_num) 
> > > bind(c)
> > > +use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, 
> > > c_int
> > > +integer(c_int) :: omp_target_is_accessible
> > The function returning integer(c_int) rather than logical seems like
> > a screw up in the standard, but too late to fix that :(.
> 
> I think the idea is that it can directly call the C function without
> needing a wrapper. And as default-kind 'logical' != 'integer(c_int)' in
> general, it cannot return logical. (In case of GCC, just claiming that
> it is logical would work. But some Fortran compilers use -1 for .true.
> and only flip a single bit for .not. For those,
> "if(.not.omp_target_is_accessible(..)) will not work properly, if the C
> function returns 1.
> 
> But I concur that requiring "/= 0" is ugly!

Yeah, but for the APIs that don't have any iso_c_binding arguments
we just use wrappers rather than bind(c) and it allows for more Fortran-like
callers.  So, if omp_target_is_accessible had the *_ wrapper (or alias if
we determine logical ir the same as c_int in the ABI passing), people could
avoid the /= 0 stuff.
Anyway, that is just a thought for future APIs that if they return
false/true only bind(c) isn't always a good idea.

Jakub



Re: [PATCH] rewrite undefined overflow to defined in ifcombine

2022-05-05 Thread Jakub Jelinek via Gcc-patches
On Thu, May 05, 2022 at 11:44:59AM +0200, Richard Biener wrote:
> When we make stmts to execute unconditionally in ifcombine we have
> to make sure to rewrite stmts that can invoke undefined behavior
> on overflow into a form with defined overflow.  That's possible
> for all but signed division for which we have to avoid the transform.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> This was developed as not working solution to fix PR105142 and
> I do not have a testcase that experiences wrong-code (but I only
> tried for about 10 minutes to construct one).  Still the problem
> is obviously latent and we should fix it.
> 
> Thus - OK for trunk?
> 
> Thanks,
> Richard.
> 
> 2022-04-04  Richard Biener  
> 
>   * tree-ssa-ifcombine.cc (bb_no_side_effects_p): Avoid executing
>   divisions with undefined overflow unconditionally.
>   (pass_tree_ifcombine::execute): Rewrite stmts with undefined
>   overflow to defined.

LGTM.

Jakub



Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.

2022-05-05 Thread Tobias Burnus

Hi,

On 05.05.22 11:33, Jakub Jelinek via Gcc-patches wrote:

On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote:

+interface
+  function omp_target_is_accessible (ptr, size, device_num) bind(c)
+use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, c_int
+integer(c_int) :: omp_target_is_accessible

The function returning integer(c_int) rather than logical seems like
a screw up in the standard, but too late to fix that :(.


I think the idea is that it can directly call the C function without
needing a wrapper. And as default-kind 'logical' != 'integer(c_int)' in
general, it cannot return logical. (In case of GCC, just claiming that
it is logical would work. But some Fortran compilers use -1 for .true.
and only flip a single bit for .not. For those,
"if(.not.omp_target_is_accessible(..)) will not work properly, if the C
function returns 1.

But I concur that requiring "/= 0" is ugly!


OT, tried to look how libomptarget implements it and they don't at least
on llvm-project trunk, but while looking at that, noticed that for
omp_target_is_present they do return false from omp_target_is_present
while we return true.  It is unclear if NULL has corresponding storage
on the device (NULL always corresponds to NULL on the device) or not.


Regarding NULL: no idea what's the best semantic – we could ask
for clarification.

Regarding target:
I think "false" from on device makes more sense in general, especially
if the device number points to a different device. It might work in some
cases – but false simply plays save. Note that the spec states:

"When called from within a target region the effect is unspecified."

Thus, either behavior is fine.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] rewrite undefined overflow to defined in ifcombine

2022-05-05 Thread Richard Biener via Gcc-patches
When we make stmts to execute unconditionally in ifcombine we have
to make sure to rewrite stmts that can invoke undefined behavior
on overflow into a form with defined overflow.  That's possible
for all but signed division for which we have to avoid the transform.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

This was developed as not working solution to fix PR105142 and
I do not have a testcase that experiences wrong-code (but I only
tried for about 10 minutes to construct one).  Still the problem
is obviously latent and we should fix it.

Thus - OK for trunk?

Thanks,
Richard.

2022-04-04  Richard Biener  

* tree-ssa-ifcombine.cc (bb_no_side_effects_p): Avoid executing
divisions with undefined overflow unconditionally.
(pass_tree_ifcombine::execute): Rewrite stmts with undefined
overflow to defined.
---
 gcc/tree-ssa-ifcombine.cc | 29 +
 1 file changed, 29 insertions(+)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 3a4ab694b71..cb86cc1ea5f 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -125,10 +125,26 @@ bb_no_side_effects_p (basic_block bb)
   if (is_gimple_debug (stmt))
continue;
 
+  gassign *ass;
+  enum tree_code rhs_code;
   if (gimple_has_side_effects (stmt)
  || gimple_uses_undefined_value_p (stmt)
  || gimple_could_trap_p (stmt)
  || gimple_vuse (stmt)
+ /* We need to rewrite stmts with undefined overflow to use
+unsigned arithmetic but cannot do so for signed division.  */
+ || ((ass = dyn_cast  (stmt))
+ && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (ass)))
+ && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (gimple_assign_lhs (ass)))
+ && ((rhs_code = gimple_assign_rhs_code (ass)), true)
+ && (rhs_code == TRUNC_DIV_EXPR
+ || rhs_code == CEIL_DIV_EXPR
+ || rhs_code == FLOOR_DIV_EXPR
+ || rhs_code == ROUND_DIV_EXPR)
+ /* We cannot use expr_not_equal_to since we'd have to restrict
+flow-sensitive info to whats known at the outer if.  */
+ && (TREE_CODE (gimple_assign_rhs2 (ass)) != INTEGER_CST
+ || !integer_minus_onep (gimple_assign_rhs2 (ass
  /* const calls don't match any of the above, yet they could
 still have some side-effects - they could contain
 gimple_could_trap_p statements, like floating point
@@ -847,6 +863,19 @@ pass_tree_ifcombine::execute (function *fun)
/* Clear range info from all stmts in BB which is now executed
   conditional on a always true/false condition.  */
reset_flow_sensitive_info_in_bb (bb);
+   for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+gsi_next ())
+ {
+   gassign *ass = dyn_cast  (gsi_stmt (gsi));
+   if (!ass)
+ continue;
+   tree lhs = gimple_assign_lhs (ass);
+   if ((INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+|| POINTER_TYPE_P (TREE_TYPE (lhs)))
+   && arith_code_with_undefined_signed_overflow
+(gimple_assign_rhs_code (ass)))
+ rewrite_to_defined_overflow (ass, true);
+ }
cfg_changed |= true;
  }
 }
-- 
2.35.3


Re: [PATCH] testsuite: add missing dg-require-effective-target fpic

2022-05-05 Thread Marc Poulhies via Gcc-patches
Marc Poulhiès  writes:

> Require effective target fpic for newly added test.
>
> gcc/testsuite/
>   * g++.dg/ext/visibility/visibility-local-extern1.C: Add missing
>   dg-require-effective-target fpic.
>
> Tested on x86_64-linux. Ok for master?
>
> ---
>  gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C 
> b/gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C
> index 40c20199d0c..6fb1cc7f381 100644
> --- a/gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C
> +++ b/gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C
> @@ -1,6 +1,7 @@
>  // PR c++/103291
>  // { dg-additional-options -fpic }
>  // { dg-final { scan-assembler-not "@GOTPCREL" } }
> +// { dg-require-effective-target fpic }
>  
>  #pragma GCC visibility push(hidden)

ping ?

Marc


Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.

2022-05-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote:
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -226,6 +226,11 @@ OMP_5.1 {
>   omp_get_teams_thread_limit_;
>  } OMP_5.0.2;
>  
> +OMP_5.1.1 {
> +  global:
> + omp_target_is_accessible;
> +} OMP_5.1;
> +

You've already added another OMP_5.1.1 symbol, so this hunk will need to be
adjusted.  Keep the names in there alphabetically sorted.

> --- a/libgomp/omp_lib.f90.in
> +++ b/libgomp/omp_lib.f90.in
> @@ -835,6 +835,16 @@
>end function omp_target_disassociate_ptr
>  end interface
>  
> +interface
> +  function omp_target_is_accessible (ptr, size, device_num) bind(c)
> +use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, c_int
> +integer(c_int) :: omp_target_is_accessible

The function returning integer(c_int) rather than logical seems like
a screw up in the standard, but too late to fix that :(.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -3666,6 +3666,24 @@ omp_target_disassociate_ptr (const void *ptr, int 
> device_num)
>  }
>  
>  int
> +omp_target_is_accessible (const void *ptr, size_t size, int device_num)
> +{
> +  if (device_num < 0 || device_num > gomp_get_num_devices ())
> +return false;
> +
> +  if (device_num == gomp_get_num_devices ())
> +return true;
> +
> +  struct gomp_device_descr *devicep = resolve_device (device_num);
> +  if (devicep == NULL)
> +return false;
> +
> +  /* TODO: Unified shared memory must be handled when available.  */
> +
> +  return devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM;

I guess for now it is reasonable, but I wonder if even without
GOMP_OFFLOAD_CAP_SHARED_MEM one can't for CUDA or GCN allocate host
memory (not all, but just some subset) that will be accessible on the
device (I bet that means accessible through the same address on the host and
device, aka partial shared mem).

So, ok for trunk.

OT, tried to look how libomptarget implements it and they don't at least
on llvm-project trunk, but while looking at that, noticed that for
omp_target_is_present they do return false from omp_target_is_present
while we return true.  It is unclear if NULL has corresponding storage
on the device (NULL always corresponds to NULL on the device) or not.

Jakub



Re: [PATCH] Skip constant folding for fmin/max when either argument is sNaN [PR105414]

2022-05-05 Thread HAO CHEN GUI via Gcc-patches



On 5/5/2022 下午 4:30, Kewen.Lin wrote:
> on 2022/5/5 16:09, Richard Biener via Gcc-patches wrote:
>> On Thu, May 5, 2022 at 10:07 AM HAO CHEN GUI via Gcc-patches
>>  wrote:
>>>
>>> Hi,
>>>This patch skips constant folding for fmin/max when either argument
>>> is sNaN. According to C standard,
>>>fmin(sNaN, sNaN)= qNaN, fmin(sNaN, NaN) = qNaN
>>>So signaling NaN should be tested and skipped for fmin/max in match.pd.
>>>
>>>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
>>> Is this okay for trunk? Any recommendations? Thanks a lot.
>>
>> OK.
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog
>>>
>>> 2022-05-05 Haochen Gui 
>>>
>>> gcc/
>>> PR target/105414
>>> * match.pd (minmax): Skip constant folding for fmin/fmax when both
>>> arguments are sNaN or one is sNaN and another is NaN.
>>>
>>> gcc/testsuite/
>>> PR target/105414
>>> * gcc.dg/pr105414.c: New.
>>>
>>> patch.diff
>>>
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index cad61848daa..f256bcbb483 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -3093,7 +3093,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>>  (for minmax (min max FMIN_ALL FMAX_ALL)
>>>   (simplify
>>>(minmax @0 @0)
>>> -  @0))
>>> +  /* if both are sNaN, it should return qNaN.  */
>>> +  (if (!tree_expr_maybe_signaling_nan_p (@0))
>>> +@0)))
> 
> Sorry for chiming in.
> 
> IIUC this patch is mainly for libc function fmin/fmax and the iterator here
> covers min/max and fmin/fmax.  I wonder if it's intent to make this change
> for min/max as well?
> 
> As tree.def, "if either operand is NaN, then it is unspecified", the 
> optimization
> for min/max seems still acceptable?

For MIN/MAX_EXPR, the result is undefined with NaN. So I think we shouldn't do
constant folding. We should let target decide how to deal with it. The 
"undefined"
here means the result depends on targets as far as I understand.
> 
> BR,
> Kewen


Re: [PATCH, OpenMP, C++] Allow classes with static members to be mappable

2022-05-05 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 09, 2022 at 07:04:24PM +0800, Chung-Lin Tang wrote:
> Now in OpenMP 5.x, static members are supposed to be not a barrier for a class
> to be target-mapped.
> 
> There is the related issue of actually providing access to static 
> const/constexpr
> members on the GPU (probably a case of 
> https://github.com/OpenMP/spec/issues/2158)
> but that is for later.
> 
> This patch basically just removes the check for static members inside
> cp_omp_mappable_type_1, and adjusts a testcase. Not sure if more tests are 
> needed.
> Tested on trunk without regressions, okay when stage1 reopens?
> 
> Thanks,
> Chung-Lin
> 
> 2022-03-09  Chung-Lin Tang  
> 
> gcc/cp/ChangeLog:
> 
>   * decl2.cc (cp_omp_mappable_type_1): Remove requirement that all
>   members must be non-static; remove check for static fields.

I don't see anything useful left in cp_omp_mappable_type{,_1}.
In particular, starting with OpenMP 5.0, for both C and C++ we just say
that a mappable type is a complete type.  True, for C++ there is also the
"All member functions accessed in any target region must appear in a
declare target directive."
and similarly for Fortran, but that isn't something we really can check when
we ask whether something is a mappable type, that isn't a property of a
type, but a property of the target region.
In OpenMP 4.5 the special C++ mappable_type langhooks was useful, both for
the non-static data members and for virtual methods.

So, I think instead of your patch, we should just throw away
cp_omp_mappable_type{,_1}, and as C++ was the only one that had a special
langhook, I think we should kill the langhook altogether too, move
lhd_omp_mappable_type from langhooks.cc to omp-general.cc or so
as omp_mappable_type and use that instead of the langhooks or
cp_omp_mappable_type.
Now, the C++ FE has also cp_omp_emit_unmappable_type_notes while other FEs
don't, either we can just say that type doesn't have mappable type
like the C FE does, or perhaps just can emit a note that it isn't a mappable
type because it is incomplete (but then it would be nice to do the same
thing in the C FE too).

Jakub



Re: [PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections

2022-05-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 21, 2022 at 11:18:57PM +0800, Chung-Lin Tang wrote:
> as encountered in cases where a program constructs its own deep-copying
> for arrays-of-pointers, e.g:
> 
>#pragma omp target enter data map(to:level->vectors[:N])
>for (i = 0; i < N; i++)
>  #pragma omp target enter data map(to:level->vectors[i][:N])
> 
> We need to treat the part of the array reference before the array section
> as a base-pointer (here 'level->vectors[i]'), providing pointer-attachment 
> behavior.
> 
> This patch adds this inside handle_omp_array_sections(), tracing the whole
> sequence of array dimensions, creating a whole base-pointer reference
> iteratively using build_array_ref(). The conditions are that each of the
> "absorbed" dimensions must be length==1, and the final reference must be
> of pointer-type (so that pointer attachment makes sense).
> 
> There's also a little patch in gimplify_scan_omp_clauses(), to make sure
> the array-ref base-pointer goes down the right path.
> 
> This case was encountered when working to make 534.hpgmgfv_t from
> SPEChpc 2021 properly compile. Tested without regressions on trunk.
> Okay to go in once stage1 opens?

I'm afraid this is going in the wrong direction.  The OpenMP 5.0 change
that:
"A locator list item is any lvalue expression, including variables, or an array
section."
is much more general than just allowing -> in the expressions, there can be
function calls and many other.  So, the more code like this patch we add,
the more we'll need to throw away again.  And as it is in OpenMP 5.0, we
really need to throw it away in the GCC 13 cycle.

So, what we really need is add OMP_ARRAY_SECTION tree code, some parser flag
to know that we are inside of an OpenMP clause that allows array sections,
and just where we normally parse ARRAY_REFs if that flag is on also parse
array sections and parse the map/to/from clauses just as normal expressions.
At least for the C FE maybe we'll need to arrange for less folding to be
done because C still folds too much stuff prematurely.
Then when finishing clauses verify that OMP_ARRAY_SECTION trees appear only
where we allow them and not elsewhere (say
foo (1, 2, 3)[:36]
would be ok if foo returns a pointer, but
foo (ptr[0:13], 2, 3)
would not) and then need to differentiate between the cases listed in the
standard which we handle for each . -> [idx] when starting from a var
(in such a case I vaguely recall there are rules for pointer attachments
etc.) or other arbitrary expressions (in that case we just evaluate those
expressions and e.g. in the foo (1, 2, 3)[:36] case basically do
tmp = foo (1, 2, 3);
and mapping of tmp[:36].

> 2022-02-21  Chung-Lin Tang  
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (handle_omp_array_sections): Add handling for
>   creating array-reference base-pointer attachment clause.
> 
> gcc/cp/ChangeLog:
> 
>   * semantics.cc (handle_omp_array_sections): Add handling for
>   creating array-reference base-pointer attachment clause.
> 
> gcc/ChangeLog:
> 
>   * gimplify.cc (gimplify_scan_omp_clauses): Add case for
>   attach/detach map kind for ARRAY_REF of POINTER_TYPE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/target-enter-data-1.c: Adjust testcase.
> 
> libgomp/testsuite/ChangeLog:
> 
>   * libgomp.c-c++-common/ptr-attach-2.c: New test.

Jakub



[PATCH] testsuite/105486 - adjust testcase to avoid misaligned accesses

2022-05-05 Thread Richard Biener via Gcc-patches
This properly aligns data, increasing test coverage.

Tested on x86_64-unknown-linux-gnu, pushed.

2022-05-05  Richard Biener  

PR testsuite/105486
* gcc.dg/vect/bb-slp-pr104240.c: Align all data.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr104240.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr104240.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr104240.c
index 78905a468e0..1045f31d4d0 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr104240.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr104240.c
@@ -5,6 +5,9 @@
 
 void foo (int *c, float *x, float *y)
 {
+  c = __builtin_assume_aligned (c, __BIGGEST_ALIGNMENT__);
+  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
+  y = __builtin_assume_aligned (y, __BIGGEST_ALIGNMENT__);
   c[0] = x[0] < y[0];
   c[1] = y[1] > x[1];
   c[2] = x[2] < y[2];
-- 
2.35.3


Re: [PATCH] Skip constant folding for fmin/max when either argument is sNaN [PR105414]

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 10:30 AM Kewen.Lin  wrote:
>
> on 2022/5/5 16:09, Richard Biener via Gcc-patches wrote:
> > On Thu, May 5, 2022 at 10:07 AM HAO CHEN GUI via Gcc-patches
> >  wrote:
> >>
> >> Hi,
> >>This patch skips constant folding for fmin/max when either argument
> >> is sNaN. According to C standard,
> >>fmin(sNaN, sNaN)= qNaN, fmin(sNaN, NaN) = qNaN
> >>So signaling NaN should be tested and skipped for fmin/max in match.pd.
> >>
> >>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> >> Is this okay for trunk? Any recommendations? Thanks a lot.
> >
> > OK.
> >
> > Thanks,
> > Richard.
> >
> >> ChangeLog
> >>
> >> 2022-05-05 Haochen Gui 
> >>
> >> gcc/
> >> PR target/105414
> >> * match.pd (minmax): Skip constant folding for fmin/fmax when both
> >> arguments are sNaN or one is sNaN and another is NaN.
> >>
> >> gcc/testsuite/
> >> PR target/105414
> >> * gcc.dg/pr105414.c: New.
> >>
> >> patch.diff
> >>
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index cad61848daa..f256bcbb483 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -3093,7 +3093,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>  (for minmax (min max FMIN_ALL FMAX_ALL)
> >>   (simplify
> >>(minmax @0 @0)
> >> -  @0))
> >> +  /* if both are sNaN, it should return qNaN.  */
> >> +  (if (!tree_expr_maybe_signaling_nan_p (@0))
> >> +@0)))
>
> Sorry for chiming in.
>
> IIUC this patch is mainly for libc function fmin/fmax and the iterator here
> covers min/max and fmin/fmax.  I wonder if it's intent to make this change
> for min/max as well?
>
> As tree.def, "if either operand is NaN, then it is unspecified", the 
> optimization
> for min/max seems still acceptable?

MIN/MAX_EXPR shouldn't even appear with -fsignalling-nans for this
reason, at least that's what I thought.  But yes, you might have a point
here (but maybe it's also not strictly enough specified).  One option would
be to do (minmax == MAX_EXPR || minmax == MIN_EXPR || !tree_expr ...)

Joseph - are MIN_EXPR and MAX_EXPR supposed to turn sNaN into qNaN
and the 'undefinedness' is merely as to which operand is chosen?

Richard.

>
> BR,
> Kewen


[PATCH] tree-optimization/105484 - VEC_SET and EH

2022-05-05 Thread Richard Biener via Gcc-patches
When the IL representation of VEC_SET is marked as throwing
(unnecessarily), we need to clean that when replacing it with
the .VEC_SET internal function call which cannot throw.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

2022-05-05  Richard Biener  

PR tree-optimization/105484
* gimple-isel.cc (gimple_expand_vec_set_expr): Clean EH, return
whether the CFG changed.
(gimple_expand_vec_exprs): When the CFG changed, clean it up.

* gcc.dg/torture/pr105484.c: New testcase.
---
 gcc/gimple-isel.cc  | 19 ---
 gcc/testsuite/gcc.dg/torture/pr105484.c | 15 +++
 2 files changed, 27 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr105484.c

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index a8f7a0d25d0..4b309a05a9a 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -49,22 +49,23 @@ along with GCC; see the file COPYING3.  If not see
  _8 = .VEC_SET (_7, i_4(D), _1);
  u = _8;  */
 
-static gimple *
+static bool
 gimple_expand_vec_set_expr (struct function *fun, gimple_stmt_iterator *gsi)
 {
   enum tree_code code;
   gcall *new_stmt = NULL;
   gassign *ass_stmt = NULL;
+  bool cfg_changed = false;
 
   /* Only consider code == GIMPLE_ASSIGN.  */
   gassign *stmt = dyn_cast (gsi_stmt (*gsi));
   if (!stmt)
-return NULL;
+return false;
 
   tree lhs = gimple_assign_lhs (stmt);
   code = TREE_CODE (lhs);
   if (code != ARRAY_REF)
-return NULL;
+return false;
 
   tree val = gimple_assign_rhs1 (stmt);
   tree op0 = TREE_OPERAND (lhs, 0);
@@ -98,12 +99,15 @@ gimple_expand_vec_set_expr (struct function *fun, 
gimple_stmt_iterator *gsi)
  gimple_set_location (ass_stmt, loc);
  gsi_insert_before (gsi, ass_stmt, GSI_SAME_STMT);
 
+ basic_block bb = gimple_bb (stmt);
  gimple_move_vops (ass_stmt, stmt);
- gsi_remove (gsi, true);
+ if (gsi_remove (gsi, true)
+ && gimple_purge_dead_eh_edges (bb))
+   cfg_changed = true;
}
 }
 
-  return ass_stmt;
+  return cfg_changed;
 }
 
 /* Expand all VEC_COND_EXPR gimple assignments into calls to internal
@@ -297,6 +301,7 @@ gimple_expand_vec_exprs (struct function *fun)
   basic_block bb;
   hash_map vec_cond_ssa_name_uses;
   auto_bitmap dce_ssa_names;
+  bool cfg_changed = false;
 
   FOR_EACH_BB_FN (bb, fun)
 {
@@ -311,7 +316,7 @@ gimple_expand_vec_exprs (struct function *fun)
  gsi_replace (, g, false);
}
 
- gimple_expand_vec_set_expr (fun, );
+ cfg_changed |= gimple_expand_vec_set_expr (fun, );
  if (gsi_end_p (gsi))
break;
}
@@ -323,7 +328,7 @@ gimple_expand_vec_exprs (struct function *fun)
 
   simple_dce_from_worklist (dce_ssa_names);
 
-  return 0;
+  return cfg_changed ? TODO_cleanup_cfg : 0;
 }
 
 namespace {
diff --git a/gcc/testsuite/gcc.dg/torture/pr105484.c 
b/gcc/testsuite/gcc.dg/torture/pr105484.c
new file mode 100644
index 000..f2a5eb8a7ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr105484.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fnon-call-exceptions -fno-tree-dce 
-fno-tree-forwprop" } */
+/* { dg-additional-options "-march=cannonlake" { target x86_64-*-* i?86-*-* } 
} */
+
+typedef int __attribute__((__vector_size__ (16))) V;
+
+void bar (int i);
+
+void
+foo (int i)
+{
+  V v;
+  __builtin_mul_overflow (7, i, [i]);
+  bar ((V){}[3]);
+}
-- 
2.35.3


Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async

2022-05-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 21, 2022 at 12:19:20PM +0100, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
>   * omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
>   target_memcpy_rect_async to omp_runtime_apis array.
> 
> libgomp/ChangeLog:
> 
>   * libgomp.map: Added omp_target_memcpy_async and
>   omp_target_memcpy_rect_async.
>   * libgomp.texi: Both functions are now supported.
>   * omp.h.in: Added omp_target_memcpy_async and
>   omp_target_memcpy_rect_async.
>   * omp_lib.f90.in: Added interfaces for both new functions.
>   * omp_lib.h.in: Likewise.
>   * target.c (omp_target_memcpy): Restructured into check and copy part.
>   (omp_target_memcpy_check): New helper function for omp_target_memcpy and
>   omp_target_memcpy_async that checks requirements.
>   (omp_target_memcpy_copy): New helper function for omp_target_memcpy and
>   omp_target_memcpy_async that performs the memcpy.
>   (omp_target_memcpy_async_helper): New helper function that is used in
>   omp_target_memcpy_async for the asynchronous task.
>   (omp_target_memcpy_async): Added.
>   (omp_target_memcpy_rect): Restructured into check and copy part.
>   (omp_target_memcpy_rect_check): New helper function for
>   omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
>   requirements.
>   (omp_target_memcpy_rect_copy): New helper function for
>   omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
>   the memcpy.
>   (omp_target_memcpy_rect_async_helper): New helper function that is used
>   in omp_target_memcpy_rect_async for the asynchronous task.
>   (omp_target_memcpy_rect_async): Added.
>   * testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
>   * testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
>   * testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
>   * testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
>   * testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
>   * testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
>   * testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
>   * testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
> 
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -224,6 +224,8 @@ OMP_5.1 {
>   omp_set_teams_thread_limit_8_;
>   omp_get_teams_thread_limit;
>   omp_get_teams_thread_limit_;
> + omp_target_memcpy_async;
> + omp_target_memcpy_rect_async;
>  } OMP_5.0.2;

These should be added to OMP_5.1.1, not here.

> --- a/libgomp/omp.h.in
> +++ b/libgomp/omp.h.in
> @@ -272,6 +272,10 @@ extern int omp_target_is_present (const void *, int) 
> __GOMP_NOTHROW;
>  extern int omp_target_memcpy (void *, const void *, __SIZE_TYPE__,
> __SIZE_TYPE__, __SIZE_TYPE__, int, int)
>__GOMP_NOTHROW;
> +extern int omp_target_memcpy_async (void *, const void *, __SIZE_TYPE__,
> + __SIZE_TYPE__, __SIZE_TYPE__, int, int,
> + int, omp_depend_t*)

Formatting, space before *.

> +  __GOMP_NOTHROW;
>  extern int omp_target_memcpy_rect (void *, const void *, __SIZE_TYPE__, int,
>  const __SIZE_TYPE__ *,
>  const __SIZE_TYPE__ *,
> @@ -279,6 +283,14 @@ extern int omp_target_memcpy_rect (void *, const void *, 
> __SIZE_TYPE__, int,
>  const __SIZE_TYPE__ *,
>  const __SIZE_TYPE__ *, int, int)
>__GOMP_NOTHROW;
> +extern int omp_target_memcpy_rect_async (void *, const void *, __SIZE_TYPE__,
> +  int, const __SIZE_TYPE__ *,
> +  const __SIZE_TYPE__ *,
> +  const __SIZE_TYPE__ *,
> +  const __SIZE_TYPE__ *,
> +  const __SIZE_TYPE__ *, int, int, int,
> +  omp_depend_t*)

Likewise.

> -int
> -omp_target_memcpy (void *dst, const void *src, size_t length,
> -size_t dst_offset, size_t src_offset, int dst_device_num,
> -int src_device_num)
> +static int
> +omp_target_memcpy_check (void *dst, const void *src, int dst_device_num,
> +  int src_device_num,
> +  struct gomp_device_descr **dst_devicep,
> +  struct gomp_device_descr **src_devicep)
>  {

Why does omp_target_memcpy_check need the dst and src arguments?  From what
I can see, they aren't used by it.

> +typedef struct
> +{
> +  void *dst;
> +  const void *src;
> +  size_t length;
> +  size_t dst_offset;
> +  size_t src_offset;
> +  struct gomp_device_descr *dst_devicep;
> +  struct gomp_device_descr *src_devicep;
> +} memcpy_t;

Please come up with 

Re: [PATCH] Skip constant folding for fmin/max when either argument is sNaN [PR105414]

2022-05-05 Thread Kewen.Lin via Gcc-patches
on 2022/5/5 16:09, Richard Biener via Gcc-patches wrote:
> On Thu, May 5, 2022 at 10:07 AM HAO CHEN GUI via Gcc-patches
>  wrote:
>>
>> Hi,
>>This patch skips constant folding for fmin/max when either argument
>> is sNaN. According to C standard,
>>fmin(sNaN, sNaN)= qNaN, fmin(sNaN, NaN) = qNaN
>>So signaling NaN should be tested and skipped for fmin/max in match.pd.
>>
>>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
>> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> OK.
> 
> Thanks,
> Richard.
> 
>> ChangeLog
>>
>> 2022-05-05 Haochen Gui 
>>
>> gcc/
>> PR target/105414
>> * match.pd (minmax): Skip constant folding for fmin/fmax when both
>> arguments are sNaN or one is sNaN and another is NaN.
>>
>> gcc/testsuite/
>> PR target/105414
>> * gcc.dg/pr105414.c: New.
>>
>> patch.diff
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index cad61848daa..f256bcbb483 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -3093,7 +3093,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>  (for minmax (min max FMIN_ALL FMAX_ALL)
>>   (simplify
>>(minmax @0 @0)
>> -  @0))
>> +  /* if both are sNaN, it should return qNaN.  */
>> +  (if (!tree_expr_maybe_signaling_nan_p (@0))
>> +@0)))

Sorry for chiming in.

IIUC this patch is mainly for libc function fmin/fmax and the iterator here
covers min/max and fmin/fmax.  I wonder if it's intent to make this change
for min/max as well?

As tree.def, "if either operand is NaN, then it is unspecified", the 
optimization
for min/max seems still acceptable?

BR,
Kewen


Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OI/TImode.

2022-05-05 Thread Uros Bizjak via Gcc-patches
On Thu, May 5, 2022 at 10:23 AM Hongtao Liu  wrote:
>
> On Thu, May 5, 2022 at 4:09 PM Uros Bizjak via Gcc-patches
>  wrote:
> >
> > On Thu, May 5, 2022 at 9:50 AM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Thu, May 5, 2022 at 9:37 AM liuhongt via Gcc-patches
> > >  wrote:
> > > >
> > > > Enable optimization for TImode only under 32-bit target, for 64-bit
> > > > target there could be extra ineteger <-> sse move regarding psABI,
> > > > not efficient.
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > > > Ok for trunk?
> > >
> > > I wonder if this is better done in STV where we could assess this extra 
> > > cost?
> >
> > Yes, this should be handled via STV, Roger Sayle (CC'd) has proposed a
> > patch that does just that.
> >
> My patch also handles OImode, I think that part could be a separate patch.

Yes, OImode (and TImode on x86_32) can't be implemented using integer registers.

Uros,


Re: [PATCH] Strip of a vector load which is only used partially.

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 7:04 AM liuhongt  wrote:
>
> Optimize
>
>   _1 = *srcp_3(D);
>   _4 = VEC_PERM_EXPR <_1, _1, { 4, 5, 6, 7, 4, 5, 6, 7 }>;
>   _5 = BIT_FIELD_REF <_4, 128, 0>;
>
> to
>
>   _1 = *srcp_3(D);
>   _5 = BIT_FIELD_REF <_1, 128, 128>;
>
> the upper will finally be optimized to
>
> _5 = BIT_FIELD_REF <*srcp_3(D), 128, 128>;
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{m32,}.
> Ok for trunk?

Hmm, tree-ssa-forwprop.cc:simplify_bitfield_ref should already
handle this in the

  if (code == VEC_PERM_EXPR
  && constant_multiple_p (bit_field_offset (op), size, ))
{

part of the code - maybe that needs to be enhanced to cover
a contiguous stride in the VEC_PERM_EXPR.  I see
we have

  size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
  if (maybe_ne (bit_field_size (op), size))
return false;

where it will currently bail, so adjust that to check for a
constant multiple.  I also think we should only handle the
case where the new bit_field_offset alignment is not
worse than the original one.

That said, I'd prefer if you integrate this transform with
simplify_bitfield_ref.

Richard.

>
> gcc/ChangeLog:
>
> PR tree-optimization/102583
> * gimple.h (gate_optimize_vector_load): Declare.
> * match.pd: Simplify (BIT_FIELD_REF (vec_perm *p *p { 4, 5, 6,
> 7, 4, 5, 6, 7 }) 128 0) to (BIT_FIELD_REF *p 128 128).
> * tree-ssa-forwprop.cc (gate_optimize_vector_load): New
> function.
> (pass_forwprop::execute): Put condition codes in the upper new
> function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr102583.c: New test.
> ---
>  gcc/gimple.h |  1 +
>  gcc/match.pd | 56 
>  gcc/testsuite/gcc.target/i386/pr102583.c | 30 +
>  gcc/tree-ssa-forwprop.cc | 32 +-
>  4 files changed, 109 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102583.c
>
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 6b1e89ad74e..1747dae1193 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -1638,6 +1638,7 @@ extern void maybe_remove_unused_call_args (struct 
> function *, gimple *);
>  extern bool gimple_inexpensive_call_p (gcall *);
>  extern bool stmt_can_terminate_bb_p (gimple *);
>  extern location_t gimple_or_expr_nonartificial_location (gimple *, tree);
> +extern bool gate_optimize_vector_load (gimple *);
>
>  /* Return the disposition for a warning (or all warnings by default)
> for a statement.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 6d691d302b3..ac214310251 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6832,6 +6832,62 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> }
> (cmp @0 { res; })
>
> +#if GIMPLE
> +/* Simplify partail vector access, transform
> +
> +   V8SI A;
> +   V4SI B;
> +   A = *PA;
> +   B = VEC_PERM_EXPR (A, A, { 4, 5, 6, 7, 4, 5, 6, 7 });
> +   C = BIT_FIELD_REF (B, 128, 0)
> +
> +to
> +
> +   A = *PA;
> +   C = BIT_FIELD_REF (B, 128, 128);
> +
> +optimize_vector_load will eventually optimize the upper to
> +
> +   C = BIT_FIELD_REF (*PA, 128, 128);  */
> +
> +(simplify
> + (BIT_FIELD_REF (vec_perm@2 SSA_NAME@0 @0 VECTOR_CST@1) @rsize @rpos)
> + (if (VECTOR_TYPE_P (type)
> + && TYPE_MODE (type) != BLKmode
> + && single_use (@2)
> + && gate_optimize_vector_load (SSA_NAME_DEF_STMT (@0))
> + && types_match (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@0
> +  (with
> +   {
> + unsigned HOST_WIDE_INT nelts = -1;
> + if (!VECTOR_CST_NELTS (@1).is_constant ())
> +   return NULL_TREE;
> + tree inner_type = TREE_TYPE (type);
> + unsigned HOST_WIDE_INT elt_w = tree_to_uhwi (TYPE_SIZE (inner_type));
> + unsigned HOST_WIDE_INT pos = tree_to_uhwi (@rpos);
> + unsigned HOST_WIDE_INT size = tree_to_uhwi (@rsize);
> + unsigned HOST_WIDE_INT start
> +   = tree_to_uhwi (vector_cst_elt (@1, pos / elt_w));
> +
> + for (unsigned HOST_WIDE_INT i  = pos / elt_w + 1; i != size / elt_w; 
> i++)
> +   {
> +/* Continuous area.  */
> +if (tree_to_uhwi (vector_cst_elt (@1, i)) - 1
> +!= tree_to_uhwi (vector_cst_elt (@1, i - 1)))
> +  return NULL_TREE;
> +   }
> +
> + /* Aligned or support movmisalign_optab.  */
> + unsigned HOST_WIDE_INT dest_align = tree_to_uhwi (TYPE_SIZE (type));
> + if ((TYPE_ALIGN (TREE_TYPE (@0)) % dest_align
> + || start * elt_w % dest_align)
> +   && (optab_handler (movmisalign_optab, TYPE_MODE (type))
> +   == CODE_FOR_nothing))
> +   return NULL_TREE;
> +   }
> +   (BIT_FIELD_REF @0 @rsize { bitsize_int (start * elt_w); }
> +#endif
> +
>  /* Canonicalizations of BIT_FIELD_REFs.  */
>
>  (simplify
> diff --git a/gcc/testsuite/gcc.target/i386/pr102583.c 
> b/gcc/testsuite/gcc.target/i386/pr102583.c
> new file mode 100644
> index 000..ff2ffb5e671
> --- /dev/null
> +++ 

Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OI/TImode.

2022-05-05 Thread Hongtao Liu via Gcc-patches
On Thu, May 5, 2022 at 4:09 PM Uros Bizjak via Gcc-patches
 wrote:
>
> On Thu, May 5, 2022 at 9:50 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, May 5, 2022 at 9:37 AM liuhongt via Gcc-patches
> >  wrote:
> > >
> > > Enable optimization for TImode only under 32-bit target, for 64-bit
> > > target there could be extra ineteger <-> sse move regarding psABI,
> > > not efficient.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > > Ok for trunk?
> >
> > I wonder if this is better done in STV where we could assess this extra 
> > cost?
>
> Yes, this should be handled via STV, Roger Sayle (CC'd) has proposed a
> patch that does just that.
>
My patch also handles OImode, I think that part could be a separate patch.
>
> Uros.
>
> > > gcc/ChangeLog:
> > >
> > > PR target/104610
> > > * config/i386/i386-expand.cc (ix86_expand_branch): Use ptest
> > > for TI/QImode when code is EQ or NE.
> > > * config/i386/i386.md (SDWIM1248): New iterator.
> > > (cbranch4): Split TImode into a separate expander.
> > > (cbranchti4): New expander.
> > > * config/i386/predicates.md (timode_comparison_operator): New
> > > predicate.
> > > * config/i386/sse.md (cbranch4): Extend to OImode.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr104610.c: New test.
> > > ---
> > >  gcc/config/i386/i386-expand.cc   | 13 +++-
> > >  gcc/config/i386/i386.md  | 27 ++--
> > >  gcc/config/i386/predicates.md|  6 ++
> > >  gcc/config/i386/sse.md   | 10 +++--
> > >  gcc/testsuite/gcc.target/i386/pr104610.c | 23 
> > >  5 files changed, 74 insertions(+), 5 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104610.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index bc806ffa283..a2012a158ae 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -2264,13 +2264,24 @@ ix86_expand_branch (enum rtx_code code, rtx op0, 
> > > rtx op1, rtx label)
> > >  {
> > >machine_mode mode = GET_MODE (op0);
> > >rtx tmp;
> > > +  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
> > > +  /* Using ptest for TImode only for 32-bit target since it's splitted 
> > > into
> > > + 4 comparisons. For 64-bit target there could be extra ineteger <-> 
> > > sse
> > > + move regarding psABI, not efficient.  */
> > > +  if ((code == EQ || code == NE)
> > > +   && ((mode == OImode && TARGET_AVX)
> > > +  || (mode == TImode && !TARGET_64BIT && TARGET_SSE4_1)))
> > > +{
> > > +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > > +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > > +  mode = p_mode;
> > > +}
> > >
> > >/* Handle special case - vector comparsion with boolean result, 
> > > transform
> > >   it using ptest instruction.  */
> > >if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> > >  {
> > >rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> > > -  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : 
> > > V2DImode;
> > >
> > >gcc_assert (code == EQ || code == NE);
> > >/* Generate XOR since we can't check that one operand is zero 
> > > vector.  */
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index b321cda1f22..f91325015c9 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -1069,6 +1069,10 @@ (define_mode_iterator SDWIM [(QI 
> > > "TARGET_QIMODE_MATH")
> > >  (HI "TARGET_HIMODE_MATH")
> > >  SI DI (TI "TARGET_64BIT")])
> > >
> > > +(define_mode_iterator SDWIM1248 [(QI "TARGET_QIMODE_MATH")
> > > + (HI "TARGET_HIMODE_MATH")
> > > + SI DI])
> > > +
> > >  ;; Math-dependant single word integer modes.
> > >  (define_mode_iterator SWIM [(QI "TARGET_QIMODE_MATH")
> > > (HI "TARGET_HIMODE_MATH")
> > > @@ -1322,8 +1326,8 @@ (define_mode_iterator PTR
> > >
> > >  (define_expand "cbranch4"
> > >[(set (reg:CC FLAGS_REG)
> > > -   (compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
> > > -   (match_operand:SDWIM 2 "")))
> > > +   (compare:CC (match_operand:SDWIM1248 1 "nonimmediate_operand")
> > > +   (match_operand:SDWIM1248 2 "")))
> > > (set (pc) (if_then_else
> > >(match_operator 0 "ordered_comparison_operator"
> > > [(reg:CC FLAGS_REG) (const_int 0)])
> > > @@ -1338,6 +1342,25 @@ (define_expand "cbranch4"
> > >DONE;
> > >  })
> > >
> > > +(define_expand "cbranchti4"
> > > +  [(set (reg:CC FLAGS_REG)
> > > +   (compare:CC (match_operand:TI 1 "nonimmediate_operand")
> > > 

Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OI/TImode.

2022-05-05 Thread Uros Bizjak via Gcc-patches
On Thu, May 5, 2022 at 10:08 AM Uros Bizjak  wrote:
>
> On Thu, May 5, 2022 at 9:50 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, May 5, 2022 at 9:37 AM liuhongt via Gcc-patches
> >  wrote:
> > >
> > > Enable optimization for TImode only under 32-bit target, for 64-bit
> > > target there could be extra ineteger <-> sse move regarding psABI,
> > > not efficient.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > > Ok for trunk?
> >
> > I wonder if this is better done in STV where we could assess this extra 
> > cost?
>
> Yes, this should be handled via STV, Roger Sayle (CC'd) has proposed a
> patch that does just that.

https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593174.html
Uros.

>
> > > gcc/ChangeLog:
> > >
> > > PR target/104610
> > > * config/i386/i386-expand.cc (ix86_expand_branch): Use ptest
> > > for TI/QImode when code is EQ or NE.
> > > * config/i386/i386.md (SDWIM1248): New iterator.
> > > (cbranch4): Split TImode into a separate expander.
> > > (cbranchti4): New expander.
> > > * config/i386/predicates.md (timode_comparison_operator): New
> > > predicate.
> > > * config/i386/sse.md (cbranch4): Extend to OImode.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr104610.c: New test.
> > > ---
> > >  gcc/config/i386/i386-expand.cc   | 13 +++-
> > >  gcc/config/i386/i386.md  | 27 ++--
> > >  gcc/config/i386/predicates.md|  6 ++
> > >  gcc/config/i386/sse.md   | 10 +++--
> > >  gcc/testsuite/gcc.target/i386/pr104610.c | 23 
> > >  5 files changed, 74 insertions(+), 5 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104610.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index bc806ffa283..a2012a158ae 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -2264,13 +2264,24 @@ ix86_expand_branch (enum rtx_code code, rtx op0, 
> > > rtx op1, rtx label)
> > >  {
> > >machine_mode mode = GET_MODE (op0);
> > >rtx tmp;
> > > +  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
> > > +  /* Using ptest for TImode only for 32-bit target since it's splitted 
> > > into
> > > + 4 comparisons. For 64-bit target there could be extra ineteger <-> 
> > > sse
> > > + move regarding psABI, not efficient.  */
> > > +  if ((code == EQ || code == NE)
> > > +   && ((mode == OImode && TARGET_AVX)
> > > +  || (mode == TImode && !TARGET_64BIT && TARGET_SSE4_1)))
> > > +{
> > > +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > > +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > > +  mode = p_mode;
> > > +}
> > >
> > >/* Handle special case - vector comparsion with boolean result, 
> > > transform
> > >   it using ptest instruction.  */
> > >if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> > >  {
> > >rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> > > -  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : 
> > > V2DImode;
> > >
> > >gcc_assert (code == EQ || code == NE);
> > >/* Generate XOR since we can't check that one operand is zero 
> > > vector.  */
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index b321cda1f22..f91325015c9 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -1069,6 +1069,10 @@ (define_mode_iterator SDWIM [(QI 
> > > "TARGET_QIMODE_MATH")
> > >  (HI "TARGET_HIMODE_MATH")
> > >  SI DI (TI "TARGET_64BIT")])
> > >
> > > +(define_mode_iterator SDWIM1248 [(QI "TARGET_QIMODE_MATH")
> > > + (HI "TARGET_HIMODE_MATH")
> > > + SI DI])
> > > +
> > >  ;; Math-dependant single word integer modes.
> > >  (define_mode_iterator SWIM [(QI "TARGET_QIMODE_MATH")
> > > (HI "TARGET_HIMODE_MATH")
> > > @@ -1322,8 +1326,8 @@ (define_mode_iterator PTR
> > >
> > >  (define_expand "cbranch4"
> > >[(set (reg:CC FLAGS_REG)
> > > -   (compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
> > > -   (match_operand:SDWIM 2 "")))
> > > +   (compare:CC (match_operand:SDWIM1248 1 "nonimmediate_operand")
> > > +   (match_operand:SDWIM1248 2 "")))
> > > (set (pc) (if_then_else
> > >(match_operator 0 "ordered_comparison_operator"
> > > [(reg:CC FLAGS_REG) (const_int 0)])
> > > @@ -1338,6 +1342,25 @@ (define_expand "cbranch4"
> > >DONE;
> > >  })
> > >
> > > +(define_expand "cbranchti4"
> > > +  [(set (reg:CC FLAGS_REG)
> > > +   (compare:CC (match_operand:TI 1 "nonimmediate_operand")
> > > +   

Re: [PATCH] Skip constant folding for fmin/max when either argument is sNaN [PR105414]

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 10:07 AM HAO CHEN GUI via Gcc-patches
 wrote:
>
> Hi,
>This patch skips constant folding for fmin/max when either argument
> is sNaN. According to C standard,
>fmin(sNaN, sNaN)= qNaN, fmin(sNaN, NaN) = qNaN
>So signaling NaN should be tested and skipped for fmin/max in match.pd.
>
>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.

OK.

Thanks,
Richard.

> ChangeLog
>
> 2022-05-05 Haochen Gui 
>
> gcc/
> PR target/105414
> * match.pd (minmax): Skip constant folding for fmin/fmax when both
> arguments are sNaN or one is sNaN and another is NaN.
>
> gcc/testsuite/
> PR target/105414
> * gcc.dg/pr105414.c: New.
>
> patch.diff
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cad61848daa..f256bcbb483 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3093,7 +3093,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (for minmax (min max FMIN_ALL FMAX_ALL)
>   (simplify
>(minmax @0 @0)
> -  @0))
> +  /* if both are sNaN, it should return qNaN.  */
> +  (if (!tree_expr_maybe_signaling_nan_p (@0))
> +@0)))
>  /* min(max(x,y),y) -> y.  */
>  (simplify
>   (min:c (max:c @0 @1) @1)
> @@ -3193,12 +3195,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (minmax @1 (convert @2)
>
>  (for minmax (FMIN_ALL FMAX_ALL)
> - /* If either argument is NaN, return the other one.  Avoid the
> -transformation if we get (and honor) a signalling NaN.  */
> + /* If either argument is NaN and other one is not sNaN, return the other
> +one.  Avoid the transformation if we get (and honor) a signalling NaN.  
> */
>   (simplify
>(minmax:c @0 REAL_CST@1)
> -  (if (real_isnan (TREE_REAL_CST_PTR (@1))
> -   && (!HONOR_SNANS (@1) || !TREE_REAL_CST (@1).signalling))
> +   (if (real_isnan (TREE_REAL_CST_PTR (@1))
> +   && (!HONOR_SNANS (@1) || !TREE_REAL_CST (@1).signalling)
> +   && !tree_expr_maybe_signaling_nan_p (@0))
> @0)))
>  /* Convert fmin/fmax to MIN_EXPR/MAX_EXPR.  C99 requires these
> functions to return the numeric arg if the other one is NaN.
> diff --git a/gcc/testsuite/gcc.dg/pr105414.c b/gcc/testsuite/gcc.dg/pr105414.c
> new file mode 100644
> index 000..78772700acf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr105414.c
> @@ -0,0 +1,30 @@
> +/* { dg-do run { target { *-*-linux* *-*-gnu* } } } */
> +/* { dg-options "-O1 -fsignaling-nans -lm" } */
> +/* { dg-add-options ieee } */
> +/* { dg-require-effective-target issignaling } */
> +
> +
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +
> +int main()
> +{
> +  double a = __builtin_nans ("");
> +
> +  if (issignaling (fmin (a, a)))
> +__builtin_abort ();
> +
> +  if (issignaling (fmax (a, a)))
> +__builtin_abort ();
> +
> +  double b = __builtin_nan ("");
> +
> +  if (issignaling (fmin (a, b)))
> +__builtin_abort ();
> +
> +  if (issignaling (fmax (a, b)))
> +__builtin_abort ();
> +
> +  return 0;
> +}


Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OI/TImode.

2022-05-05 Thread Uros Bizjak via Gcc-patches
On Thu, May 5, 2022 at 9:50 AM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, May 5, 2022 at 9:37 AM liuhongt via Gcc-patches
>  wrote:
> >
> > Enable optimization for TImode only under 32-bit target, for 64-bit
> > target there could be extra ineteger <-> sse move regarding psABI,
> > not efficient.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > Ok for trunk?
>
> I wonder if this is better done in STV where we could assess this extra cost?

Yes, this should be handled via STV, Roger Sayle (CC'd) has proposed a
patch that does just that.


Uros.

> > gcc/ChangeLog:
> >
> > PR target/104610
> > * config/i386/i386-expand.cc (ix86_expand_branch): Use ptest
> > for TI/QImode when code is EQ or NE.
> > * config/i386/i386.md (SDWIM1248): New iterator.
> > (cbranch4): Split TImode into a separate expander.
> > (cbranchti4): New expander.
> > * config/i386/predicates.md (timode_comparison_operator): New
> > predicate.
> > * config/i386/sse.md (cbranch4): Extend to OImode.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr104610.c: New test.
> > ---
> >  gcc/config/i386/i386-expand.cc   | 13 +++-
> >  gcc/config/i386/i386.md  | 27 ++--
> >  gcc/config/i386/predicates.md|  6 ++
> >  gcc/config/i386/sse.md   | 10 +++--
> >  gcc/testsuite/gcc.target/i386/pr104610.c | 23 
> >  5 files changed, 74 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104610.c
> >
> > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index bc806ffa283..a2012a158ae 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -2264,13 +2264,24 @@ ix86_expand_branch (enum rtx_code code, rtx op0, 
> > rtx op1, rtx label)
> >  {
> >machine_mode mode = GET_MODE (op0);
> >rtx tmp;
> > +  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
> > +  /* Using ptest for TImode only for 32-bit target since it's splitted into
> > + 4 comparisons. For 64-bit target there could be extra ineteger <-> sse
> > + move regarding psABI, not efficient.  */
> > +  if ((code == EQ || code == NE)
> > +   && ((mode == OImode && TARGET_AVX)
> > +  || (mode == TImode && !TARGET_64BIT && TARGET_SSE4_1)))
> > +{
> > +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > +  mode = p_mode;
> > +}
> >
> >/* Handle special case - vector comparsion with boolean result, transform
> >   it using ptest instruction.  */
> >if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> >  {
> >rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> > -  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : 
> > V2DImode;
> >
> >gcc_assert (code == EQ || code == NE);
> >/* Generate XOR since we can't check that one operand is zero 
> > vector.  */
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index b321cda1f22..f91325015c9 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -1069,6 +1069,10 @@ (define_mode_iterator SDWIM [(QI 
> > "TARGET_QIMODE_MATH")
> >  (HI "TARGET_HIMODE_MATH")
> >  SI DI (TI "TARGET_64BIT")])
> >
> > +(define_mode_iterator SDWIM1248 [(QI "TARGET_QIMODE_MATH")
> > + (HI "TARGET_HIMODE_MATH")
> > + SI DI])
> > +
> >  ;; Math-dependant single word integer modes.
> >  (define_mode_iterator SWIM [(QI "TARGET_QIMODE_MATH")
> > (HI "TARGET_HIMODE_MATH")
> > @@ -1322,8 +1326,8 @@ (define_mode_iterator PTR
> >
> >  (define_expand "cbranch4"
> >[(set (reg:CC FLAGS_REG)
> > -   (compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
> > -   (match_operand:SDWIM 2 "")))
> > +   (compare:CC (match_operand:SDWIM1248 1 "nonimmediate_operand")
> > +   (match_operand:SDWIM1248 2 "")))
> > (set (pc) (if_then_else
> >(match_operator 0 "ordered_comparison_operator"
> > [(reg:CC FLAGS_REG) (const_int 0)])
> > @@ -1338,6 +1342,25 @@ (define_expand "cbranch4"
> >DONE;
> >  })
> >
> > +(define_expand "cbranchti4"
> > +  [(set (reg:CC FLAGS_REG)
> > +   (compare:CC (match_operand:TI 1 "nonimmediate_operand")
> > +   (match_operand:TI 2 "x86_64_general_operand")))
> > +   (set (pc) (if_then_else
> > +  (match_operator 0 "timode_comparison_operator"
> > +   [(reg:CC FLAGS_REG) (const_int 0)])
> > +  (label_ref (match_operand 3))
> > +  (pc)))]
> > +  "TARGET_64BIT || TARGET_SSE4_1"
> > +{
> > +  if (MEM_P (operands[1]) && MEM_P (operands[2]))

[PATCH] Skip constant folding for fmin/max when either argument is sNaN [PR105414]

2022-05-05 Thread HAO CHEN GUI via Gcc-patches
Hi,
   This patch skips constant folding for fmin/max when either argument
is sNaN. According to C standard,
   fmin(sNaN, sNaN)= qNaN, fmin(sNaN, NaN) = qNaN
   So signaling NaN should be tested and skipped for fmin/max in match.pd.

   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog

2022-05-05 Haochen Gui 

gcc/
PR target/105414
* match.pd (minmax): Skip constant folding for fmin/fmax when both
arguments are sNaN or one is sNaN and another is NaN.

gcc/testsuite/
PR target/105414
* gcc.dg/pr105414.c: New.

patch.diff

diff --git a/gcc/match.pd b/gcc/match.pd
index cad61848daa..f256bcbb483 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3093,7 +3093,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for minmax (min max FMIN_ALL FMAX_ALL)
  (simplify
   (minmax @0 @0)
-  @0))
+  /* if both are sNaN, it should return qNaN.  */
+  (if (!tree_expr_maybe_signaling_nan_p (@0))
+@0)))
 /* min(max(x,y),y) -> y.  */
 (simplify
  (min:c (max:c @0 @1) @1)
@@ -3193,12 +3195,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(minmax @1 (convert @2)

 (for minmax (FMIN_ALL FMAX_ALL)
- /* If either argument is NaN, return the other one.  Avoid the
-transformation if we get (and honor) a signalling NaN.  */
+ /* If either argument is NaN and other one is not sNaN, return the other
+one.  Avoid the transformation if we get (and honor) a signalling NaN.  */
  (simplify
   (minmax:c @0 REAL_CST@1)
-  (if (real_isnan (TREE_REAL_CST_PTR (@1))
-   && (!HONOR_SNANS (@1) || !TREE_REAL_CST (@1).signalling))
+   (if (real_isnan (TREE_REAL_CST_PTR (@1))
+   && (!HONOR_SNANS (@1) || !TREE_REAL_CST (@1).signalling)
+   && !tree_expr_maybe_signaling_nan_p (@0))
@0)))
 /* Convert fmin/fmax to MIN_EXPR/MAX_EXPR.  C99 requires these
functions to return the numeric arg if the other one is NaN.
diff --git a/gcc/testsuite/gcc.dg/pr105414.c b/gcc/testsuite/gcc.dg/pr105414.c
new file mode 100644
index 000..78772700acf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr105414.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { *-*-linux* *-*-gnu* } } } */
+/* { dg-options "-O1 -fsignaling-nans -lm" } */
+/* { dg-add-options ieee } */
+/* { dg-require-effective-target issignaling } */
+
+
+#define _GNU_SOURCE
+#include 
+#include 
+
+int main()
+{
+  double a = __builtin_nans ("");
+
+  if (issignaling (fmin (a, a)))
+__builtin_abort ();
+
+  if (issignaling (fmax (a, a)))
+__builtin_abort ();
+
+  double b = __builtin_nan ("");
+
+  if (issignaling (fmin (a, b)))
+__builtin_abort ();
+
+  if (issignaling (fmax (a, b)))
+__builtin_abort ();
+
+  return 0;
+}


Re: [PATCH] [PR100106] Reject unaligned subregs when strict alignment is required

2022-05-05 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva via Gcc-patches  writes:
> The testcase for pr100106, compiled with optimization for 32-bit
> powerpc -mcpu=604 with -mstrict-align expands the initialization of a
> union from a float _Complex value into a load from an SCmode
> constant pool entry, aligned to 4 bytes, into a DImode pseudo,
> requiring 8-byte alignment.
>
> The patch that introduced the testcase modified simplify_subreg to
> avoid changing the MEM to outermode, but simplify_gen_subreg still
> creates a SUBREG or a MEM that would require stricter alignment than
> MEM's, and lra_constraints appears to get confused by that, repeatedly
> creating unsatisfiable reloads for the SUBREG until it exceeds the
> insn count.
>
> Avoiding the unaligned SUBREG, expand splits the DImode dest into
> SUBREGs and loads each SImode word of the constant pool with the
> proper alignment.
>
>
> At the time of posting this patch, it occurred to me that maybe the test
> should allow paradoxical subregs of mems, or even that non-paradoxical
> subregs of mems should be allowed to change to a mode with stricter
> alignment, and the register allocator should deal with that somehow.
> WDYT?
>
>
> Regstrapped on x86_64-linux-gnu and ppc64le-linux-gnu, also tested
> targeting ppc- and ppc64-vx7r2.  Ok to install?
>
>
> for  gcc/ChangeLog
>
>   PR target/100106
>   * emit-rtl.c (validate_subreg): Reject a SUBREG of a MEM that
>   requires stricter alignment than MEM's.

I know this is the best being the enemy of the good, but given
that we're at the start of stage 1, would it be feasible to try
to get rid of (subreg (mem)) altogether for GCC 13?  We could do
it target-by-target, with a target macro (yes, macro :-)) that opts
in to keeping the existing behaviour.  (subreg (mem)) would then be
unconditionally invalid when the macro isn't defined.  (Even in
debug expressions, since those ought to narrow to a mem anyway.)

Thanks,
Richard

> for  gcc/testsuite/ChangeLog
>
>   PR target/100106
>   * gcc.target/powerpc/pr100106-sa.c: New.
> ---
>  gcc/emit-rtl.cc|3 +++
>  gcc/testsuite/gcc.target/powerpc/pr100106-sa.c |4 
>  2 files changed, 7 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 1e02ae254d012..642e47eada0d7 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -982,6 +982,9 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  
>return subreg_offset_representable_p (regno, imode, offset, omode);
>  }
> +  else if (reg && MEM_P (reg)
> +&& STRICT_ALIGNMENT && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
> +return false;
>  
>/* The outer size must be ordered wrt the register size, otherwise
>   we wouldn't know at compile time how many registers the outer
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c 
> b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
> new file mode 100644
> index 0..6cc29595c8b25
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile { target { ilp32 } } } */
> +/* { dg-options "-mcpu=604 -O -mstrict-align" } */
> +
> +#include "../../gcc.c-torture/compile/pr100106.c"


Re: [PATCH] gengtype: do not skip char after escape sequnce

2022-05-05 Thread Martin Liška
On 5/4/22 21:38, Iain Sandoe wrote:
> 
> 
>> On 4 May 2022, at 20:14, Martin Liška  wrote:
>>
>> Right now, when a \$x escape sequence occures, the
>> next character after $x is skipped, which is bogus.
>>
>> The code has very low coverage right now.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> 
> … just curious ...
> 
> Is there no way to test this?

There is and as mentioned, I verified the current behavior is wrong for one
case where '\n' is being handled.

> or to identify a target where the behaviour would be changed with/without the 
> patch?
> (and confirm the expected result).

I've done that.

Martin

> 
> thanks
> Iain
> 
> 
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>>  * gengtype-state.cc (read_a_state_token): Do not skip extra
>>  character after escaped sequence.
>> ---
>> gcc/gengtype-state.cc | 10 --
>> 1 file changed, 10 deletions(-)
>>
>> diff --git a/gcc/gengtype-state.cc b/gcc/gengtype-state.cc
>> index dfd9ea52785..2dfe8edf1a5 100644
>> --- a/gcc/gengtype-state.cc
>> +++ b/gcc/gengtype-state.cc
>> @@ -473,43 +473,33 @@ read_a_state_token (void)
>>  {
>>  case 'a':
>>obstack_1grow (_obstack, '\a');
>> -  getc (state_file);
>>break;
>>  case 'b':
>>obstack_1grow (_obstack, '\b');
>> -  getc (state_file);
>>break;
>>  case 't':
>>obstack_1grow (_obstack, '\t');
>> -  getc (state_file);
>>break;
>>  case 'n':
>>obstack_1grow (_obstack, '\n');
>> -  getc (state_file);
>>break;
>>  case 'v':
>>obstack_1grow (_obstack, '\v');
>> -  getc (state_file);
>>break;
>>  case 'f':
>>obstack_1grow (_obstack, '\f');
>> -  getc (state_file);
>>break;
>>  case 'r':
>>obstack_1grow (_obstack, '\r');
>> -  getc (state_file);
>>break;
>>  case '"':
>>obstack_1grow (_obstack, '\"');
>> -  getc (state_file);
>>break;
>>  case '\\':
>>obstack_1grow (_obstack, '\\');
>> -  getc (state_file);
>>break;
>>  case ' ':
>>obstack_1grow (_obstack, ' ');
>> -  getc (state_file);
>>break;
>>  case 'x':
>>{
>> -- 
>> 2.36.0
>>
> 



Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OI/TImode.

2022-05-05 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 9:37 AM liuhongt via Gcc-patches
 wrote:
>
> Enable optimization for TImode only under 32-bit target, for 64-bit
> target there could be extra ineteger <-> sse move regarding psABI,
> not efficient.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> Ok for trunk?

I wonder if this is better done in STV where we could assess this extra cost?

> gcc/ChangeLog:
>
> PR target/104610
> * config/i386/i386-expand.cc (ix86_expand_branch): Use ptest
> for TI/QImode when code is EQ or NE.
> * config/i386/i386.md (SDWIM1248): New iterator.
> (cbranch4): Split TImode into a separate expander.
> (cbranchti4): New expander.
> * config/i386/predicates.md (timode_comparison_operator): New
> predicate.
> * config/i386/sse.md (cbranch4): Extend to OImode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr104610.c: New test.
> ---
>  gcc/config/i386/i386-expand.cc   | 13 +++-
>  gcc/config/i386/i386.md  | 27 ++--
>  gcc/config/i386/predicates.md|  6 ++
>  gcc/config/i386/sse.md   | 10 +++--
>  gcc/testsuite/gcc.target/i386/pr104610.c | 23 
>  5 files changed, 74 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104610.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index bc806ffa283..a2012a158ae 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -2264,13 +2264,24 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx 
> op1, rtx label)
>  {
>machine_mode mode = GET_MODE (op0);
>rtx tmp;
> +  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
> +  /* Using ptest for TImode only for 32-bit target since it's splitted into
> + 4 comparisons. For 64-bit target there could be extra ineteger <-> sse
> + move regarding psABI, not efficient.  */
> +  if ((code == EQ || code == NE)
> +   && ((mode == OImode && TARGET_AVX)
> +  || (mode == TImode && !TARGET_64BIT && TARGET_SSE4_1)))
> +{
> +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> +  mode = p_mode;
> +}
>
>/* Handle special case - vector comparsion with boolean result, transform
>   it using ptest instruction.  */
>if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
>  {
>rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> -  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
>
>gcc_assert (code == EQ || code == NE);
>/* Generate XOR since we can't check that one operand is zero vector.  
> */
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index b321cda1f22..f91325015c9 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1069,6 +1069,10 @@ (define_mode_iterator SDWIM [(QI "TARGET_QIMODE_MATH")
>  (HI "TARGET_HIMODE_MATH")
>  SI DI (TI "TARGET_64BIT")])
>
> +(define_mode_iterator SDWIM1248 [(QI "TARGET_QIMODE_MATH")
> + (HI "TARGET_HIMODE_MATH")
> + SI DI])
> +
>  ;; Math-dependant single word integer modes.
>  (define_mode_iterator SWIM [(QI "TARGET_QIMODE_MATH")
> (HI "TARGET_HIMODE_MATH")
> @@ -1322,8 +1326,8 @@ (define_mode_iterator PTR
>
>  (define_expand "cbranch4"
>[(set (reg:CC FLAGS_REG)
> -   (compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
> -   (match_operand:SDWIM 2 "")))
> +   (compare:CC (match_operand:SDWIM1248 1 "nonimmediate_operand")
> +   (match_operand:SDWIM1248 2 "")))
> (set (pc) (if_then_else
>(match_operator 0 "ordered_comparison_operator"
> [(reg:CC FLAGS_REG) (const_int 0)])
> @@ -1338,6 +1342,25 @@ (define_expand "cbranch4"
>DONE;
>  })
>
> +(define_expand "cbranchti4"
> +  [(set (reg:CC FLAGS_REG)
> +   (compare:CC (match_operand:TI 1 "nonimmediate_operand")
> +   (match_operand:TI 2 "x86_64_general_operand")))
> +   (set (pc) (if_then_else
> +  (match_operator 0 "timode_comparison_operator"
> +   [(reg:CC FLAGS_REG) (const_int 0)])
> +  (label_ref (match_operand 3))
> +  (pc)))]
> +  "TARGET_64BIT || TARGET_SSE4_1"
> +{
> +  if (MEM_P (operands[1]) && MEM_P (operands[2]))
> +operands[1] = force_reg (TImode, operands[1]);
> +
> +  ix86_expand_branch (GET_CODE (operands[0]),
> + operands[1], operands[2], operands[3]);
> +  DONE;
> +})
> +
>  (define_expand "cstore4"
>[(set (reg:CC FLAGS_REG)
> (compare:CC (match_operand:SWIM 2 "nonimmediate_operand")
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index 

  1   2   >