from:"David Edelsohn via Gcc\-patches"

Re: [PATCH v2 1/2] libstdc++: Implement more maintainable header

2023-08-16 Thread David Edelsohn via Gcc-patches

Was the dependency added to the dependencies in contrib/gcc_update?
Otherwise the timestamp can get out of sync in a Git checkout.

Thanks, David


On Wed, Aug 16, 2023 at 6:20 PM Jonathan Wakely  wrote:

> On Wed, 16 Aug 2023 at 22:56, Jonathan Wakely  wrote:
> >
> > On Wed, 16 Aug 2023 at 22:39, David Edelsohn  wrote:
> > >
> > > Hi, Arsen
> > >
> > > This patch broke bootstrap because it has introduced a new GCC build
> requirement for autogen that is not a previous requirement to build GCC.
> Previously the repository has included post-processed files.
> >
> > The repo does include the generated bits/version.h file. autogen
> > should only be needed if you modify version.dep
>
> And I've just checked again with an x86_64-pc-linux-gnu bootstrap on a
> box without autogen, and it worked.
>
> >
> > >
> > > +# AutoGen .
> > > +.PHONY: update-version
> > > +update-version:
> > > + cd ${bits_srcdir} && \
> > > + autogen version.def
> > > +
> > >
> > >
> > > Thanks, David
> > >
> > >
>
>

Re: [PATCH v2 1/2] libstdc++: Implement more maintainable header

2023-08-16 Thread David Edelsohn via Gcc-patches

Hi, Arsen

This patch broke bootstrap because it has introduced a new GCC build
requirement for autogen that is not a previous requirement to build GCC.
Previously the repository has included post-processed files.

+# AutoGen .
+.PHONY: update-version
+update-version:
+   cd ${bits_srcdir} && \
+   autogen version.def
+


Thanks, David

Re: [RFC] GCC Security policy

2023-08-15 Thread David Edelsohn via Gcc-patches

On Tue, Aug 15, 2023 at 7:07 PM Alexander Monakov 
wrote:

>
> On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:
>
> > > Thanks, this is nicer (see notes below). My main concern is that we
> > > shouldn't pretend there's some method of verifying that arbitrary
> source
> > > code is "safe" to pass to an unsandboxed compiler, nor should we push
> > > the responsibility of doing that on users.
> >
> > But responsibility would be pushed to users, wouldn't it?
>
> Making users responsible for verifying that sources are "safe" is not okay
> (we cannot teach them how to do that since there's no general method).
> Making users responsible for sandboxing the compiler is fine (there's
> a range of sandboxing solutions, from which they can choose according
> to their requirements and threat model). Sorry about the ambiguity.
>

Alex.

The compiler should faithfully implement the algorithms described by the
programmer.  The compiler is responsible if it generates incorrect code for
a well-defined, language-conforming program.  The compiler cannot be
responsible for security issues inherent in the user code, whether that
causes the compiler to function in a manner that deteriorates adversely
affects the system or generates code that behaves in a manner that
adversely affects the system.

If "safe" is the wrong word. What word would you suggest?


> > So:
> >
> > The compiler driver processes source code, invokes other programs such
> as the
> > assembler and linker and generates the output result, which may be
> assembly
> > code or machine code.  Compiling untrusted sources can result in
> arbitrary
> > code execution and unconstrained resource consumption in the compiler.
> As a
> > result, compilation of such code should be done inside a sandboxed
> environment
> > to ensure that it does not compromise the development environment.
>
> I'm happy with this, thanks for bearing with me.
>
> > >> inside a sandboxed environment to ensure that it does not compromise
> the
> > >> development environment.  Note that this still does not guarantee
> safety of
> > >> the produced output programs and that such programs should still
> either be
> > >> analyzed thoroughly for safety or run only inside a sandbox or an
> isolated
> > >> system to avoid compromising the execution environment.
> > >
> > > The last statement seems to be a new addition. It is too broad and
> again
> > > makes a reference to analysis that appears quite theoretical. It might
> be
> > > better to drop this (and instead talk in more specific terms about any
> > > guarantees that produced binary code matches security properties
> intended
> > > by the sources; I believe Richard Sandiford raised this previously).
> >
> > OK, so I actually cover this at the end of the section; Richard's point
> AFAICT
> > was about hardening, which I added another note for to make it explicit
> that
> > missed hardening does not constitute a CVE-worthy threat:
>
> Thanks for the reminder. To illustrate what I was talking about, let me
> give
> two examples:
>
> 1) safety w.r.t timing attacks: even if the source code is written in
> a manner that looks timing-safe, it might be transformed in a way that
> mounting a timing attack on the resulting machine code is possible;
>
> 2) safety w.r.t information leaks: even if the source code attempts
> to discard sensitive data (such as passwords and keys) immediately
> after use, (partial) copies of that data may be left on stack and
> in registers, to be leaked later via a different vulnerability.
>
> For both 1) and 2), GCC is not engineered to respect such properties
> during optimization and code generation, so it's not appropriate for such
> tasks (a possible solution is to isolate such sensitive functions to
> separate files, compile to assembly, inspect the assembly to check that it
> still has the required properties, and use the inspected asm in subsequent
> builds instead of the original high-level source).
>

At some point the system tools need to respect the programmer or operator.
There is a difference between writing "Hello, World" and writing
performance critical or safety critical code.  That is the responsibility
of the programmer and the development team to choose the right software
engineers and right tools.  And to have the development environment and
checks in place to ensure that the results are meeting the requirements.

It is not the role of GCC or its security policy to tell people how to do
their job or hobby.  This isn't a safety tag required to be attached to a
new mattress.

Thanks, David


>
> Cheers.
> Alexander
>

Re: [RFC] GCC Security policy

2023-08-11 Thread David Edelsohn via Gcc-patches

On Wed, Aug 9, 2023 at 1:33 PM Siddhesh Poyarekar 
wrote:

> On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
> >> Do you have a suggestion for the language to address libgcc,
> >> libstdc++, etc. and libiberty, libbacktrace, etc.?
> >
> > I'll work on this a bit and share a draft.
>
> Hi David,
>
> Here's what I came up with for different parts of GCC, including the
> runtime libraries.  Over time we may find that specific parts of runtime
> libraries simply cannot be used safely in some contexts and flag that.
>
> Sid
>
> """
> What is a GCC security bug?
> ===
>
>  A security bug is one that threatens the security of a system or
>  network, or might compromise the security of data stored on it.
>  In the context of GCC there are multiple ways in which this might
>  happen and they're detailed below.
>
> Compiler drivers, programs, libgccjit and support libraries
> ---
>
>  The compiler driver processes source code, invokes other programs
>  such as the assembler and linker and generates the output result,
>  which may be assembly code or machine code.  It is necessary that
>  all source code inputs to the compiler are trusted, since it is
>  impossible for the driver to validate input source code beyond
>  conformance to a programming language standard.
>
>  The GCC JIT implementation, libgccjit, is intended to be plugged
>  into applications to translate input source code in the application
>  context.  Limitations that apply to the compiler
>  driver, apply here too in terms of sanitizing inputs, so it is
>  recommended that inputs are either sanitized by an external program
>  to allow only trusted, safe execution in the context of the
>  application or the JIT execution context is appropriately sandboxed
>  to contain the effects of any bugs in the JIT or its generated code
>  to the sandboxed environment.
>
>  Support libraries such as libiberty, libcc1 libvtv and libcpp have
>  been developed separately to share code with other tools such as
>  binutils and gdb.  These libraries again have similar challenges to
>  compiler drivers.  While they are expected to be robust against
>  arbitrary input, they should only be used with trusted inputs.
>
>  Libraries such as zlib and libffi that bundled into GCC to build it
>  will be treated the same as the compiler drivers and programs as far
>  as security coverage is concerned.
>
>  As a result, the only case for a potential security issue in all
>  these cases is when it ends up generating vulnerable output for
>  valid input source code.
>
> Language runtime libraries
> --
>
>  GCC also builds and distributes libraries that are intended to be
>  used widely to implement runtime support for various programming
>  languages.  These include the following:
>
>  * libada
>  * libatomic
>  * libbacktrace
>  * libcc1
>  * libcody
>  * libcpp
>  * libdecnumber
>  * libgcc
>  * libgfortran
>  * libgm2
>  * libgo
>  * libgomp
>  * libiberty
>  * libitm
>  * libobjc
>  * libphobos
>  * libquadmath
>  * libssp
>  * libstdc++
>
>  These libraries are intended to be used in arbitrary contexts and as
>  a result, bugs in these libraries may be evaluated for security
>  impact.  However, some of these libraries, e.g. libgo, libphobos,
>  etc.  are not maintained in the GCC project, due to which the GCC
>  project may not be the correct point of contact for them.  You are
>  encouraged to look at README files within those library directories
>  to locate the canonical security contact point for those projects.
>

Hi, Sid

The text above states "bugs in these libraries may be evaluated for
security impact", but there is no comment about the criteria for a security
impact, unlike the GLIBC SECURITY.md document.  The text seems to imply the
"What is a security bug?" definitions from GLIBC, but the definitions are
not explicitly stated in the GCC Security policy.

Should this "Language runtime libraries" section include some of the GLIBC
"What is a security bug?" text or should the GCC "What is a security bug?"
section earlier in this document include the text with a qualification that
issues like buffer overflow, memory leaks, information disclosure, etc.
specifically apply to "Language runtime libraries" and not all components
of GCC?

Thanks, David


>
> Diagnostic libraries
> 
>
>  The sanitizer library bundled in GCC is intended to be used in
>  diagnostic cases and not intended for use in sensitive environments.
>  As a result, bugs in the sanitizer will not be considered security
>  sensitive.
>
> GCC plugins
> ---
>
>  It should be noted that GCC may execute arbitrary code loaded by a

Re: [RFC] GCC Security policy

2023-08-09 Thread David Edelsohn via Gcc-patches

On Wed, Aug 9, 2023 at 1:33 PM Siddhesh Poyarekar 
wrote:

> On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
> >> Do you have a suggestion for the language to address libgcc,
> >> libstdc++, etc. and libiberty, libbacktrace, etc.?
> >
> > I'll work on this a bit and share a draft.
>
> Hi David,
>
> Here's what I came up with for different parts of GCC, including the
> runtime libraries.  Over time we may find that specific parts of runtime
> libraries simply cannot be used safely in some contexts and flag that.
>
> Sid
>

Hi, Sid

Thanks for iterating on this.


>
> """
> What is a GCC security bug?
> ===
>
>  A security bug is one that threatens the security of a system or
>  network, or might compromise the security of data stored on it.
>  In the context of GCC there are multiple ways in which this might
>  happen and they're detailed below.
>
> Compiler drivers, programs, libgccjit and support libraries
> ---
>
>  The compiler driver processes source code, invokes other programs
>  such as the assembler and linker and generates the output result,
>  which may be assembly code or machine code.  It is necessary that
>  all source code inputs to the compiler are trusted, since it is
>  impossible for the driver to validate input source code beyond
>  conformance to a programming language standard.
>
>  The GCC JIT implementation, libgccjit, is intended to be plugged
>  into applications to translate input source code in the application
>  context.  Limitations that apply to the compiler
>  driver, apply here too in terms of sanitizing inputs, so it is
>  recommended that inputs are either sanitized by an external program
>  to allow only trusted, safe execution in the context of the
>  application or the JIT execution context is appropriately sandboxed
>  to contain the effects of any bugs in the JIT or its generated code
>  to the sandboxed environment.
>
>  Support libraries such as libiberty, libcc1 libvtv and libcpp have
>  been developed separately to share code with other tools such as
>  binutils and gdb.  These libraries again have similar challenges to
>  compiler drivers.  While they are expected to be robust against
>  arbitrary input, they should only be used with trusted inputs.
>
>  Libraries such as zlib and libffi that bundled into GCC to build it
>  will be treated the same as the compiler drivers and programs as far
>  as security coverage is concerned.
>

Should we direct people to the upstream projects for their security
policies?


>  As a result, the only case for a potential security issue in all
>  these cases is when it ends up generating vulnerable output for
>  valid input source code.


> Language runtime libraries
> --
>
>  GCC also builds and distributes libraries that are intended to be
>  used widely to implement runtime support for various programming
>  languages.  These include the following:
>
>  * libada
>  * libatomic
>  * libbacktrace
>  * libcc1
>  * libcody
>  * libcpp
>  * libdecnumber
>  * libgcc
>  * libgfortran
>  * libgm2
>  * libgo
>  * libgomp
>  * libiberty
>  * libitm
>  * libobjc
>  * libphobos
>  * libquadmath
>  * libssp
>  * libstdc++
>
>  These libraries are intended to be used in arbitrary contexts and as
>  a result, bugs in these libraries may be evaluated for security
>  impact.  However, some of these libraries, e.g. libgo, libphobos,
>  etc.  are not maintained in the GCC project, due to which the GCC
>  project may not be the correct point of contact for them.  You are
>  encouraged to look at README files within those library directories
>  to locate the canonical security contact point for those projects.
>

As Richard mentioned, should GCC make a specific statement about the
security policy / response for issues that are discovered and fixed in the
upstream projects from which the GCC libraries are imported?


>
> Diagnostic libraries
> 
>
>  The sanitizer library bundled in GCC is intended to be used in
>  diagnostic cases and not intended for use in sensitive environments.
>  As a result, bugs in the sanitizer will not be considered security
>  sensitive.
>
> GCC plugins
> ---
>
>  It should be noted that GCC may execute arbitrary code loaded by a
>  user through the GCC plugin mechanism or through system preloading
>  mechanism.  Such custom code should be vetted by the user for safety
>  as bugs exposed through such code will not be considered security
>  issues.
>

Thanks, David

Re: [RFC] GCC Security policy

2023-08-08 Thread David Edelsohn via Gcc-patches

On Tue, Aug 8, 2023 at 1:36 PM Ian Lance Taylor  wrote:

> On Tue, Aug 8, 2023 at 7:37 AM Jakub Jelinek  wrote:
> >
> > BTW, I think we should perhaps differentiate between production ready
> > libraries (e.g. libgcc, libstdc++, libgomp, libatomic, libgfortran,
> libquadmath,
> > libssp) vs. e.g. the sanitizer libraries which are meant for debugging
> and
> > I believe it is highly risky to run them in programs with extra
> priviledges
> > - e.g. I think they use getenv rather than *secure_getenv to get at
> various
> > tweaks for their behavior including where logging will happen and
> upstream
> > doesn't really care.
> > And not really sure what to say about lesser used language support
> > libraries, libada, libphobos, libgo, libgm2, ... nor what to say about
> > libvtv etc.
>
> libgo is a complicated case because it has a lot of components
> including a web server with TLS support, so there are a lot of
> potential security issues for programs that use libgo.  The upstream
> security policy is https://go.dev/security/policy.  I'm not sure what
> to say about libgo in GCC, since realistically the support for
> security problems is best-effort.  I guess we should at least accept
> security reports, even if we can't promise to fix them quickly.
>

 I believe that upstream projects for components that are imported into GCC
should be responsible for their security policy, including libgo,
gofrontend, libsanitizer (other than local patches), zlib, libtool,
libphobos, libcody, libffi, eventually Rust libcore, etc.

Thanks, David

Re: [RFC] GCC Security policy

2023-08-08 Thread David Edelsohn via Gcc-patches

On Tue, Aug 8, 2023 at 10:07 AM Siddhesh Poyarekar 
wrote:

> On 2023-08-08 10:04, Richard Biener wrote:
> > On Tue, Aug 8, 2023 at 3:35 PM Ian Lance Taylor  wrote:
> >>
> >> On Tue, Aug 8, 2023 at 6:02 AM Jakub Jelinek via Gcc-patches
> >>  wrote:
> >>>
> >>> On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via
> Gcc-patches wrote:
>  There's probably external tools to do this, not sure if we should
> replicate
>  things in the driver for this.
> 
>  But sure, I think the driver is the proper point to address any of
> such
>  issues - iff we want to address them at all.  Maybe a nice little
>  google summer-of-code project ;)
> >>>
> >>> What I'd really like to avoid is having all compiler bugs (primarily
> ICEs)
> >>> considered to be security bugs (e.g. DoS category), it would be
> terrible to
> >>> release every week a new compiler because of the "security" issues.
> >>> Running compiler on untrusted sources can trigger ICEs (which we want
> to fix
> >>> but there will always be some), or run into some compile time and/or
> compile
> >>> memory issue (we have various quadratic or worse spots), compiler stack
> >>> limits (deeply nested stuff e.g. during parsing but other areas as
> well).
> >>> So, people running fuzzers and reporting issues is great, but if
> they'd get
> >>> a CVE assigned for each ice-on-invalid-code, ice-on-valid-code,
> >>> each compile-time-hog and each memory-hog, that wouldn't be useful.
> >>> Runtime libraries or security issues in the code we generate for valid
> >>> sources are of course a different thing.
> >>
> >>
> >> I wonder if a security policy should say something about the -fplugin
> >> option.  I agree that an ICE is not a security issue, but I wonder how
> >> many people are aware that a poorly chosen command line option can
> >> direct the compiler to run arbitrary code.  For that matter the same
> >> is true of setting the GCC_EXEC_PREFIX environment variable, and no
> >> doubt several other environment variables.  My point is not that we
> >> should change these, but that a security policy should draw attention
> >> to the fact that there are cases in which the compiler will
> >> unexpectedly run other programs.
> >
> > Well, if you run an arbitrary commandline from the internet you get
> > what you deserve, running "echo "Hello World" | gcc -xc - -o /dev/sda"
> > as root doesn't need plugins to shoot yourself in the foot.  You need to
> > know what you're doing, otherwise you are basically executing an
> > arbitrary shell script with whatever privileges you have.
>
> I think it would be useful to mention caveats with plugins though, just
> like it would be useful to mention exceptions for libiberty and similar
> libraries that gcc builds.  It only helps makes things clearer in terms
> of what security coverage the project provides.
>

I have added a line to the Note section in the proposed text:

GCC and its tools provide features and options that can run arbitrary
user code (e.g., -fplugin).

I believe that the security implication already is addressed because the
program is not tricked into a direct compromise of security.

Do you have a suggestion for the language to address libgcc, libstdc++,
etc. and libiberty, libbacktrace, etc.?

Thanks, David

[RFC] GCC Security policy

2023-08-07 Thread David Edelsohn via Gcc-patches

FOSS Best Practices recommends that projects have an official Security
policy stated in a SECURITY.md or SECURITY.txt file at the root of the
repository.  GLIBC and Binutils have added such documents.

Appended is a prototype for a Security policy file for GCC based on the
Binutils document because GCC seems to have more affinity with Binutils as
a tool. Do the runtime libraries distributed with GCC, especially libgcc,
require additional security policies?

[ ] Is it appropriate to use the Binutils SECURITY.txt as the starting
point or should GCC use GLIBC SECURITY.md as the starting point for the GCC
Security policy?

[ ] Does GCC, or some components of GCC, require additional care because of
runtime libraries like libgcc and libstdc++, and because of gcov and
profile-directed feedback?

Thoughts?

Thanks, David

GCC Security Process


What is a GCC security bug?
===

A security bug is one that threatens the security of a system or
network, or might compromise the security of data stored on it.
In the context of GCC there are two ways in which such
bugs might occur.  In the first, the programs themselves might be
tricked into a direct compromise of security.  In the second, the
tools might introduce a vulnerability in the generated output that
was not already present in the files used as input.

Other than that, all other bugs will be treated as non-security
issues.  This does not mean that they will be ignored, just that
they will not be given the priority that is given to security bugs.

This stance applies to the creation tools in the GCC (e.g.,
gcc, g++, gfortran, gccgo, gccrs, gnat, cpp, gcov, etc.) and the
libraries that they use.

Notes:
==

None of the programs in GCC need elevated privileges to operate and
it is recommended that users do not use them from accounts where such
privileges are automatically available.

Reporting private security bugs


   *All bugs reported in the GCC Bugzilla are public.*

   In order to report a private security bug that is not immediately
   public, please contact one of the downstream distributions with
   security teams.  The following teams have volunteered to handle
   such bugs:

  Debian:  secur...@debian.org
  Red Hat: secal...@redhat.com
  SUSE:secur...@suse.de

   Please report the bug to just one of these teams.  It will be shared
   with other teams as necessary.

   The team contacted will take care of details such as vulnerability
   rating and CVE assignment (http://cve.mitre.org/about/).  It is likely
   that the team will ask to file a public bug because the issue is
   sufficiently minor and does not warrant an embargo.  An embargo is not
   a requirement for being credited with the discovery of a security
   vulnerability.

Reporting public security bugs
==

   It is expected that critical security bugs will be rare, and that most
   security bugs can be reported in GCC, thus making
   them public immediately.  The system can be found here:

  https://gcc.gnu.org/bugzilla/

Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-25 Thread David Edelsohn via Gcc-patches

Hi, Drew

Thanks for addressing this missed optimization.

The testcase includes an incorrect assumption: signed char, which
causes the testcase to fail on PowerPC.

Should the testcase be updated to specify signed char in the function
signatures or should -fsigned-char be added to the command line
options?

Thanks, David

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread David Edelsohn via Gcc-patches

On Tue, Jun 13, 2023 at 2:16 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Tue, Jun 13, 2023 at 10:15:49AM +0800, Jiufu Guo wrote:
> > David Edelsohn  writes:
> > >
> > > This definitely seems to be a better solution.
> > >
> > > The TARGET_CONST_ANCHOR change should not be part of this patch.  Also
> > > there is no ChangeLog for the patch.
> >
> > Thanks a lot for your quick review!! And sorry for the sending this patch
> > in a hurry.  I would update the patch accordingly.
>
> > > This generally looks correct and consistent with other ports. I want
> > > to give Segher a chance to double check it, if he wishes.
>
> The documentation is very clear that the only thing for which you can
> have BLKmode is "mem".  Not unspec, only "mem".
>
> Let's not do this.  The existing code has clear and obvious semantics,
> which is documented as well -- there is no reason to make it worse in
> every respect.

Segher,

Unfortunately, GCC now is inconsistent and this response is incorrect.
The documentation is out of date or was ignored and the "facts on the
ground" contradict your review.

Yes, (const_int 0) is supposed to be a general no-op and BLKmode only
is supposed to be used for MEM, but other major targets (arm, aarch64,
riscv, s390) all use unspec:BLK and specifically UNSPEC_TIE.  rs6000
is the only port that does not follow this convention.  The middle-end
has adapted to the behavior of all of the other targets, whether that
conformed to the documentation or not.  The rs6000 port needs to be
fixed and Jiufu's approach is the correct one, consistent with all
other targets for stack tie.  If the documentation differs, the
documentation needs to be updated, not a different approach for the
rs6000 port.  Jiufu's patch is correct.

Thanks, David

Re: [PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-13 Thread David Edelsohn via Gcc-patches

On Mon, Jun 12, 2023 at 11:30 PM Jiufu Guo  wrote:
>
>
> Hi David,
>
> David Edelsohn  writes:
> > On Wed, Jun 7, 2023 at 9:55 PM Jiufu Guo  wrote:
> >
> >  Hi,
> >
> >  This patch checks if a constant is possible to be rotated to/from a 
> > positive
> >  or negative value from "li". If so, we could use "li;rotldi" to build it.
> >
> >  Bootstrap and regtest pass on ppc64{,le}.
> >  Is this ok for trunk?
> >
> >  BR,
> >  Jeff (Jiufu)
> >
> >  gcc/ChangeLog:
> >
> >  * config/rs6000/rs6000.cc (can_be_rotated_to_positive_li): New 
> > function.
> >  (can_be_rotated_to_negative_li): New function.
> >  (can_be_built_by_li_and_rotldi): New function.
> >  (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.
> >
> >  gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/powerpc/const-build.c: New test.
> >  ---
> >   gcc/config/rs6000/rs6000.cc   | 64 +--
> >   .../gcc.target/powerpc/const-build.c  | 54 
> >   2 files changed, 112 insertions(+), 6 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c
> >
> >  diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >  index 42f49e4a56b..1dd0072350a 100644
> >  --- a/gcc/config/rs6000/rs6000.cc
> >  +++ b/gcc/config/rs6000/rs6000.cc
> >  @@ -10258,6 +10258,48 @@ rs6000_emit_set_const (rtx dest, rtx source)
> > return true;
> >   }
> >
> >  +/* Check if C can be rotated to a positive value which 'li' instruction
> >  +   is able to load.  If so, set *ROT to the number by which C is rotated,
> >  +   and return true.  Return false otherwise.  */
> >  +
> >  +static bool
> >  +can_be_rotated_to_positive_li (HOST_WIDE_INT c, int *rot)
> >  +{
> >  +  /* 49 leading zeros and 15 low bits on the positive value
> >  + generated by 'li' instruction.  */
> >  +  return can_be_rotated_to_lowbits (c, 15, rot);
> >  +}
> >  +
> >  +/* Like can_be_rotated_to_positive_li, but check the negative value of 
> > 'li'.  */
> >  +
> >  +static bool
> >  +can_be_rotated_to_negative_li (HOST_WIDE_INT c, int *rot)
> >  +{
> >  +  return can_be_rotated_to_lowbits (~c, 15, rot);
> >  +}
> >  +
> >  +/* Check if value C can be built by 2 instructions: one is 'li', another 
> > is
> >  +   rotldi.
> >  +
> >  +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
> >  +   is set to -1, and return true.  Return false otherwise.  */
> >  +
> >
> > I look at this feature and it's good, but I don't fully understand the 
> > benefit of this level of abstraction.  Ideally all of the above functions 
> > would
> > be inlined.  They aren't reused.
> >
> >  +static bool
> >  +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
> >  +  HOST_WIDE_INT *mask)
> >  +{
> >  +  int n;
> >  +  if (can_be_rotated_to_positive_li (c, )
> >  +  || can_be_rotated_to_negative_li (c, ))
> >
> > Why not
> >
> > /* Check if C or ~C can be rotated to a positive or negative value
> > which 'li' instruction is able to load.  */
> > if (can_be_rotated_to_lowbits (c, 15, )
> > || can_be_rotated_to_lowbits (~c, 15, ))
>
>
> Thanks a lot for your review!!
>
> Your suggestions could also achieve my goal of using a new function:
> Using "can_be_rotated_to_positive_li" is just trying to get a
> straightforward name.  Like yours, the code's comments would also
> make it easy to understand.

I recognize that you are trying to be consistent with the other
functions that you add in later patches, but it feels like overkill in
abstraction to me.  Or maybe combine postive_li and negative_li into a
single function so that the abstraction serves a purpose other than a
tail call and creating an alias for a specific invocation of
can_be_rotated_to_lowbits.

Thanks, David

>
> BR,
> Jeff (Jiufu Guo)
> >
> > ...
> >
> > This is a style of software engineering, but it seems overkill to me when 
> > the function is a single line that tail calls another function.  Am I 
> > missing
> > something?
> >
> > The rest of this patch looks good.
> >
> > Thanks, David
> >
> >  +{
> >  +  *mask = HOST_WIDE_INT_M1;
> >  +  *shift = HOST_BITS_PER_WIDE_INT - n;
> >  +  return true;
> >  +}
> >  +
> >  +  return false;
> >  +}
> >  +
> >   /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> >  Output insns to set DEST equal to the constant C as a series of
> >  lis, ori and shl instructions.  */
> >  @@ -10266,15 +10308,14 @@ static void
> >   rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
> >   {
> > rtx temp;
> >  +  int shift;
> >  +  HOST_WIDE_INT mask;
> > HOST_WIDE_INT ud1, ud2, ud3, ud4;
> >
> > ud1 = c & 0x;
> >  -  c = c >> 16;
> >  -  ud2 = c & 0x;
> >  -  c = c >> 16;
> >  -  ud3 = c & 0x;
> >  -  c = c >> 16;
> >  -  ud4 = c & 0x;
> >  +  ud2 = (c >> 16) & 0x;
> >  +  ud3 = (c >> 32) & 0x;
> >  +  ud4 = (c >> 48) & 0x;
> >

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-12 Thread David Edelsohn via Gcc-patches

Hi, Jiufu

This definitely seems to be a better solution.

The TARGET_CONST_ANCHOR change should not be part of this patch.  Also
there is no ChangeLog for the patch.

This generally looks correct and consistent with other ports. I want
to give Segher a chance to double check it, if he wishes.

Thanks David

On Mon, Jun 12, 2023 at 9:19 AM Jiufu Guo  wrote:
>
> Hi,
>
> For stack_tie, currently below insn is generated:
> (insn 15 14 16 3 (parallel [
>  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>  (const_int 0 [0]))
>  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>   (nil))
>
> It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".  This maybe
> looks like "a memory block is zerored", while actually stack_tie
> may be more like a placeholder, and does not generate any thing.
>
> To avoid potential misunderstand, "UNPSEC:BLK [(const_int 0)].." could
> be used here like other ports.
>
> This patch does this.  Bootstrap pass on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
> ---
>  gcc/config/rs6000/predicates.md   | 11 +++
>  gcc/config/rs6000/rs6000-logue.cc |  4 +++-
>  gcc/config/rs6000/rs6000.cc   |  4 
>  gcc/config/rs6000/rs6000.md   | 14 ++
>  4 files changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index a16ee30f0c0..4748cb37ce8 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -1854,10 +1854,13 @@ (define_predicate "stmw_operation"
>  (define_predicate "tie_operand"
>(match_code "parallel")
>  {
> -  return (GET_CODE (XVECEXP (op, 0, 0)) == SET
> - && MEM_P (XEXP (XVECEXP (op, 0, 0), 0))
> - && GET_MODE (XEXP (XVECEXP (op, 0, 0), 0)) == BLKmode
> - && XEXP (XVECEXP (op, 0, 0), 1) == const0_rtx);
> +  rtx set = XVECEXP (op, 0, 0);
> +  return (GET_CODE (set) == SET
> + && MEM_P (SET_DEST (set))
> + && GET_MODE (SET_DEST (set)) == BLKmode
> + && GET_CODE (SET_SRC (set)) == UNSPEC
> + && XINT (SET_SRC (set), 1) == UNSPEC_TIE
> + && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
>  })
>
>  ;; Match a small code model toc reference (or medium and large
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index bc6b153b59f..b99f43a8282 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -1463,7 +1463,9 @@ rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
>while (--i >= 0)
>  {
>rtx mem = gen_frame_mem (BLKmode, regs[i]);
> -  RTVEC_ELT (p, i) = gen_rtx_SET (mem, const0_rtx);
> +  RTVEC_ELT (p, i)
> +   = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, 
> const0_rtx),
> +   UNSPEC_TIE));
>  }
>
>emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p)));
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index d197c3f3289..0c81ebea711 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1760,6 +1760,10 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>
>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
> +
> +#undef TARGET_CONST_ANCHOR
> +#define TARGET_CONST_ANCHOR 0x8000
> +
>
>
>  /* Processor table.  */
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index b0db8ae508d..fdcf8347812 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,7 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_TIE
>])
>
>  ;;
> @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
>operands[4] = gen_frame_mem (Pmode, operands[1]);
>p = rtvec_alloc (1);
>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
> - const0_rtx);
> + gen_rtx_UNSPEC (BLKmode,
> + gen_rtvec (1, const0_rtx),
> + UNSPEC_TIE));
>operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
>  })
>
> @@ -10866,7 +10869,9 @@ (define_expand "restore_stack_nonlocal"
>operands[5] = gen_frame_mem (Pmode, operands[3]);
>p = rtvec_alloc (1);
>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
> - const0_rtx);
> + gen_rtx_UNSPEC (BLKmode,
> + gen_rtvec (1, const0_rtx),
> + UNSPEC_TIE));
>operands[6] = gen_rtx_PARALLEL (VOIDmode, p);
>  })
>
> @@ -13898,7 +13903,8 @@ (define_insn "*save_fpregs__r1"
>  ; not be moved over loads from or stores to stack memory.
>

[PATCH, AIX] Debugging does not require a stack frame.

2023-06-11 Thread David Edelsohn via Gcc-patches

The rs6000 port has allocated a stack frame when debugging is enabled
on AIX since the earliest versions of the port.  Apparently the
earliest versions of the debuggers for AIX had difficulty with
stackless frames.

Both AIX DBX and GDB support stackless frames on AIX, and IBM XLC,
OpenXL and LLVM for AIX do not generate an extraneous stack frame when
debugging is enabled.  This patch updates the rs6000 stack info
function to not set the.stack frame flag when debugging is enabled for
AIX.

Bootstrapped on powerpc-ibm-aix7.2.5.0

Committed.

Thanks, David

* gcc/config/rs6000/rs6000-logue.cc (rs6000_stack_info):
Do not require a stack frame when debugging is enabled for AIX.

index bc6b153b59f..98846f781ec 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -928,9 +928,6 @@ rs6000_stack_info (void)
   else if (frame_pointer_needed)
 info->push_p = 1;

-  else if (TARGET_XCOFF && write_symbols != NO_DEBUG && !flag_compare_debug)
-info->push_p = 1;
-
   else
 info->push_p = non_fixed_size > (TARGET_32BIT ? 220 : 288);

Re: [PATCH] rs6000: Guard __builtin_{un, }pack_vector_int128 with vsx [PR109932]

2023-06-11 Thread David Edelsohn via Gcc-patches

On Tue, Jun 6, 2023 at 5:19 AM Kewen.Lin  wrote:

> Hi,
>
> As PR109932 shows, builtins __builtin_{un,}pack_vector_int128
> should be guarded under vsx rather than power7, as their
> corresponding bif patterns have the conditions TARGET_VSX
> and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode).  This patch is to
> move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure
> their supports.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P7/P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
>
> I'll push this next week if no objections.
>
> BR,
> Kewen
> -
> PR target/109932
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128,
> __builtin_unpack_vector_int128): Move from stanza power7 to vsx.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr109932-1.c: New test.
> * gcc.target/powerpc/pr109932-2.c: New test.
>

This is okay.

Thanks, David


> ---
>  gcc/config/rs6000/rs6000-builtins.def | 14 +++---
>  gcc/testsuite/gcc.target/powerpc/pr109932-1.c | 16 
>  gcc/testsuite/gcc.target/powerpc/pr109932-2.c | 16 
>  3 files changed, 39 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr109932-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr109932-2.c
>
> diff --git a/gcc/config/rs6000/rs6000-builtins.def
> b/gcc/config/rs6000/rs6000-builtins.def
> index 92d9b46e1b9..a38184b0ef9 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2009,6 +2009,13 @@
>const vsll __builtin_vsx_xxspltd_2di (vsll, const int<1>);
>  XXSPLTD_V2DI vsx_xxspltd_v2di {}
>
> +  const vsq __builtin_pack_vector_int128 (unsigned long long, \
> +  unsigned long long);
> +PACK_V1TI packv1ti {}
> +
> +  const unsigned long __builtin_unpack_vector_int128 (vsq, const int<1>);
> +UNPACK_V1TI unpackv1ti {}
> +
>
>  ; Power7 builtins (ISA 2.06).
>  [power7]
> @@ -2030,16 +2037,9 @@
>const unsigned int __builtin_divweu (unsigned int, unsigned int);
>  DIVWEU diveu_si {}
>
> -  const vsq __builtin_pack_vector_int128 (unsigned long long, \
> -  unsigned long long);
> -PACK_V1TI packv1ti {}
> -
>void __builtin_ppc_speculation_barrier ();
>  SPECBARR speculation_barrier {}
>
> -  const unsigned long __builtin_unpack_vector_int128 (vsq, const int<1>);
> -UNPACK_V1TI unpackv1ti {}
> -
>
>  ; Power7 builtins requiring 64-bit GPRs (even with 32-bit addressing).
>  [power7-64]
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr109932-1.c
> b/gcc/testsuite/gcc.target/powerpc/pr109932-1.c
> new file mode 100644
> index 000..3e3f9eaa65e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr109932-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec -mno-vsx" } */
> +
> +/* Verify there is no ICE but one expected error message instead.  */
> +
> +#include 
> +
> +extern vector signed __int128 res_vslll;
> +extern unsigned long long aull[2];
> +
> +void
> +testVectorInt128Pack ()
> +{
> +  res_vslll = __builtin_pack_vector_int128 (aull[0], aull[1]); /* {
> dg-error "'__builtin_pack_vector_int128' requires the '-mvsx' option" } */
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr109932-2.c
> b/gcc/testsuite/gcc.target/powerpc/pr109932-2.c
> new file mode 100644
> index 000..3e3f9eaa65e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr109932-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec -mno-vsx" } */
> +
> +/* Verify there is no ICE but one expected error message instead.  */
> +
> +#include 
> +
> +extern vector signed __int128 res_vslll;
> +extern unsigned long long aull[2];
> +
> +void
> +testVectorInt128Pack ()
> +{
> +  res_vslll = __builtin_pack_vector_int128 (aull[0], aull[1]); /* {
> dg-error "'__builtin_pack_vector_int128' requires the '-mvsx' option" } */
> +}
> +
> --
> 2.25.1
>

Re: [PATCH] rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011]

2023-06-10 Thread David Edelsohn via Gcc-patches

On Tue, Jun 6, 2023 at 5:20 AM Kewen.Lin  wrote:

> Hi,
>
> As PR110011 shows, when encoding 128 bits fp constant into
> toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is
> to find the first float mode with LONG_DOUBLE_TYPE_SIZE
> bits of precision, it would be TFmode here.  But the 128
> bits fp constant can be with mode IFmode or KFmode, which
> doesn't necessarily have the same underlying float format
> as the one of TFmode, like this PR exposes, with option
> -mabi=ibmlongdouble TFmode has ibm_extended_format while
> KFmode has ieee_quad_format, mixing up the formats (the
> encoding/decoding ways) would cause unexpected results.
>
> This patch is to make it use constant's own mode instead
> of TFmode for real_to_target call.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P7/P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
>
> I'll push this next week if no objections.
>
> BR,
> Kewen
> -
> PR target/110011
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.cc (output_toc): Use its own mode of the
> 128-bit float constant for real_to_target call.
>

The comment wording can be worded better.  Maybe

Use the mode of the 128-bit floating constant itself for real_to_target
call.

This is okay.

Thanks, David


> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr110011.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr110011.c | 42 +
>  2 files changed, 43 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr110011.c
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 3f129ea37d2..330c6a6fa5f 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -17314,7 +17314,7 @@ output_toc (FILE *file, rtx x, int labelno,
> machine_mode mode)
>if (DECIMAL_FLOAT_MODE_P (GET_MODE (x)))
> REAL_VALUE_TO_TARGET_DECIMAL128 (*CONST_DOUBLE_REAL_VALUE (x), k);
>else
> -   REAL_VALUE_TO_TARGET_LONG_DOUBLE (*CONST_DOUBLE_REAL_VALUE (x), k);
> +   real_to_target (k, CONST_DOUBLE_REAL_VALUE (x), GET_MODE (x));
>
>if (TARGET_64BIT)
> {
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110011.c
> b/gcc/testsuite/gcc.target/powerpc/pr110011.c
> new file mode 100644
> index 000..5b04d3e298a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110011.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target float128_runtime } */
> +/* Force long double to be with IBM format here, to verify
> +   _Float128 constant still uses its own format (IEEE) for
> +   encoding rather than IBM format.  */
> +/* { dg-options "-mfp-in-toc -mabi=ibmlongdouble" } */
> +/* { dg-add-options float128 } */
> +
> +#define MPFR_FLOAT128_MAX 0x1.p+16383f128
> +
> +__attribute__ ((noipa))
> +_Float128 f128_max ()
> +{
> +  return MPFR_FLOAT128_MAX;
> +}
> +
> +typedef union
> +{
> +  int w[4];
> +  _Float128 f128;
> +} U;
> +
> +int main ()
> +{
> +
> +  U umax;
> +  umax.f128 = f128_max ();
> +  /* ieee float128 max:
> + 7ffe   .  */
> +  if (umax.w[1] != 0x || umax.w[2] != 0x)
> +__builtin_abort ();
> +#ifdef __LITTLE_ENDIAN__
> +  if (umax.w[0] != 0x || umax.w[3] != 0x7ffe)
> +__builtin_abort ();
> +#else
> +  if (umax.w[3] != 0x || umax.w[0] != 0x7ffe)
> +__builtin_abort ();
> +#endif
> +
> +  return 0;
> +}
> +
> --
> 2.31.1
>

Re: [PATCH 4/4] rs6000: build constant via li/lis;rldic

2023-06-10 Thread David Edelsohn via Gcc-patches

On Wed, Jun 7, 2023 at 9:56 PM Jiufu Guo  wrote:

> Hi,
>
> This patch checks if a constant is possible to be built by "li;rldic".
> We only need to take care of "negative li", other forms do not need to
> check.
> For example, "negative lis" is just a "negative li" with an additional
> shift.
>
> Bootstrap and regtest pass on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New
> function.
> (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic.
>

This is okay.

Do you have any measurement of how expensive it is to test all of these
additional methods to generate a constant?  How much does this affect the
compile time?

Thanks, David



>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/const-build.c: Add more tests.
> ---
>  gcc/config/rs6000/rs6000.cc   | 61 ++-
>  .../gcc.target/powerpc/const-build.c  | 28 +
>  2 files changed, 88 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2a3fa733b45..cd04b6b5c82 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10387,6 +10387,64 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT
> c, int *shift,
>return false;
>  }
>
> +/* Check if value C can be built by 2 instructions: one is 'li', another
> is
> +   rldic.
> +
> +   If so, *SHIFT is set to the 'shift' operand of rldic; and *MASK is set
> +   to the mask value about the 'mb' operand of rldic; and return true.
> +   Return false otherwise.  */
> +
> +static bool
> +can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT
> *mask)
> +{
> +  /* There are 49 successive ones in the negative value of 'li'.  */
> +  int ones = 49;
> +
> +  /* 1..1xx1..1: negative value of li --> 0..01..1xx0..0:
> + right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
> +  int tz = ctz_hwi (c);
> +  int lz = clz_hwi (c);
> +  int middle_ones = clz_hwi (~(c << lz));
> +  if (tz + lz + middle_ones >= ones)
> +{
> +  *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
> +  *shift = tz;
> +  return true;
> +}
> +
> +  /* 1..1xx1..1 --> 1..1xx0..01..1: some 1's(following x's) are cleaned.
> */
> +  int leading_ones = clz_hwi (~c);
> +  int tailing_ones = ctz_hwi (~c);
> +  int middle_zeros = ctz_hwi (c >> tailing_ones);
> +  if (leading_ones + tailing_ones + middle_zeros >= ones)
> +{
> +  *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
> +  *shift = tailing_ones + middle_zeros;
> +  return true;
> +}
> +
> +  /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
> +  /* Get the position for the first bit of successive 1.
> + The 24th bit would be in successive 0 or 1.  */
> +  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
> +  int pos_first_1 = ((c & (low_mask + 1)) == 0)
> + ? clz_hwi (c & low_mask)
> + : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
> +  middle_ones = clz_hwi (~c << pos_first_1);
> +  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
> +  if (pos_first_1 < HOST_BITS_PER_WIDE_INT
> +  && middle_ones + middle_zeros < HOST_BITS_PER_WIDE_INT
> +  && middle_ones + middle_zeros >= ones)
> +{
> +  *mask = ~(((1ULL << middle_zeros) - 1LL)
> +   << (HOST_BITS_PER_WIDE_INT - pos_first_1));
> +  *shift = HOST_BITS_PER_WIDE_INT - pos_first_1 + middle_zeros;
> +  return true;
> +}
> +
> +  return false;
> +}
> +
>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> Output insns to set DEST equal to the constant C as a series of
> lis, ori and shl instructions.  */
> @@ -10435,7 +10493,8 @@ rs6000_emit_set_long_const (rtx dest,
> HOST_WIDE_INT c)
>  }
>else if (can_be_built_by_li_lis_and_rotldi (c, , )
>|| can_be_built_by_li_lis_and_rldicl (c, , )
> -  || can_be_built_by_li_lis_and_rldicr (c, , ))
> +  || can_be_built_by_li_lis_and_rldicr (c, , )
> +  || can_be_built_by_li_and_rldic (c, , ))
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>unsigned HOST_WIDE_INT imm = (c | ~mask);
> diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c
> b/gcc/testsuite/gcc.target/powerpc/const-build.c
> index 8c209921d41..b503ee31c7c 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const-build.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
> @@ -82,6 +82,29 @@ lis_rldicr_12 (void)
>return 0x5310LL;
>  }
>
> +long long NOIPA
> +li_rldic_13 (void)
> +{
> +  return 0x000f8531LL;
> +}
> +long long NOIPA
> +li_rldic_14 (void)
> +{
> +  return 0x853100ffLL;
> +}
> +
> +long long NOIPA
> +li_rldic_15 (void)
> +{
> +  return 0x8031LL;
> +}
> +
> +long long NOIPA
> +li_rldic_16 (void)
> +{
>

Re: [PATCH 3/4] rs6000: build constant via li/lis;rldicl/rldicr

2023-06-10 Thread David Edelsohn via Gcc-patches

On Wed, Jun 7, 2023 at 9:56 PM Jiufu Guo  wrote:

> Hi,
>
> This patch checks if a constant is possible left/right cleaned on a rotated
> value from a negative value of "li/lis".  If so, we can build the constant
> through "li/lis ; rldicl/rldicr".
>
> Bootstrap and regtest pass on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
> function.
> (can_be_built_by_li_lis_and_rldicr): New function.
> (rs6000_emit_set_long_const): Call
> can_be_built_by_li_lis_and_rldicr and
> can_be_built_by_li_lis_and_rldicl.
>

This is okay.  See below.

Thanks, David



>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/const-build.c: Add more tests.
> ---
>  gcc/config/rs6000/rs6000.cc   | 61 ++-
>  .../gcc.target/powerpc/const-build.c  | 44 +
>  2 files changed, 104 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 03cd9d5e952..2a3fa733b45 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10332,6 +10332,61 @@ can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT
> c, int *shift,
>return false;
>  }
>
> +/* Check if value C can be built by 2 instructions: one is 'li or lis',
> +   another is rldicl.
> +
> +   If so, *SHIFT is set to the shift operand of rldicl, and *MASK is set
> to
> +   the mask operand of rldicl, and return true.
> +   Return false otherwise.  */
> +
> +static bool
> +can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, int *shift,
> +  HOST_WIDE_INT *mask)
> +{
> +  /* Leading zeros may be cleaned by rldicl with a mask.  Change leading
> zeros
> + to ones and then recheck it.  */
> +  int lz = clz_hwi (c);
> +  HOST_WIDE_INT unmask_c
> += c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
> +  int n;
> +  if (can_be_rotated_to_negative_li (unmask_c, )
>

using can_be_rotated_to_lowbits (~unmask_c, 15, )

Maybe Segher would want the abstraction, but it seems more wasteful to me.


> +  || can_be_rotated_to_negative_lis (unmask_c, ))
> +{
> +  *mask = HOST_WIDE_INT_M1U >> lz;
> +  *shift = n == 0 ? 0 : HOST_BITS_PER_WIDE_INT - n;
> +  return true;
> +}
> +
> +  return false;
> +}
> +
> +/* Check if value C can be built by 2 instructions: one is 'li or lis',
> +   another is rldicr.
> +
> +   If so, *SHIFT is set to the shift operand of rldicr, and *MASK is set
> to
> +   the mask operand of rldicr, and return true.
> +   Return false otherwise.  */
> +
> +static bool
> +can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, int *shift,
> +  HOST_WIDE_INT *mask)
> +{
> +  /* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing
> zeros
> + to ones and then recheck it.  */
> +  int tz = ctz_hwi (c);
> +  HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
> +  int n;
> +  if (can_be_rotated_to_negative_li (unmask_c, )
> +  || can_be_rotated_to_negative_lis (unmask_c, ))
> +{
> +  *mask = HOST_WIDE_INT_M1U << tz;
> +  *shift = HOST_BITS_PER_WIDE_INT - n;
> +  return true;
> +}
> +
> +  return false;
> +}
> +
>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> Output insns to set DEST equal to the constant C as a series of
> lis, ori and shl instructions.  */
> @@ -10378,7 +10433,9 @@ rs6000_emit_set_long_const (rtx dest,
> HOST_WIDE_INT c)
>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>  GEN_INT ((ud2 ^ 0x) << 16)));
>  }
> -  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
> +  else if (can_be_built_by_li_lis_and_rotldi (c, , )
> +  || can_be_built_by_li_lis_and_rldicl (c, , )
> +  || can_be_built_by_li_lis_and_rldicr (c, , ))
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>unsigned HOST_WIDE_INT imm = (c | ~mask);
> @@ -10387,6 +10444,8 @@ rs6000_emit_set_long_const (rtx dest,
> HOST_WIDE_INT c)
>emit_move_insn (temp, GEN_INT (imm));
>if (shift != 0)
> temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
> +  if (mask != HOST_WIDE_INT_M1)
> +   temp = gen_rtx_AND (DImode, temp, GEN_INT (mask));
>emit_move_insn (dest, temp);
>  }
>else if (ud3 == 0 && ud4 == 0)
> diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c
> b/gcc/testsuite/gcc.target/powerpc/const-build.c
> index c38a1dd91f2..8c209921d41 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const-build.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
> @@ -46,6 +46,42 @@ lis_rotldi_6 (void)
>return 0x5318LL;
>  }
>
> +long long NOIPA
> +li_rldicl_7 (void)
> +{
> +  return 0x3ffa1LL;
> +}
> +
> +long long NOIPA
> +li_rldicl_8 (void)
> +{
> +  return 0xff8531LL;
> +}
> +
>

Re: [PATCH 2/4] rs6000: build constant via lis;rotldi

2023-06-10 Thread David Edelsohn via Gcc-patches

On Wed, Jun 7, 2023 at 9:55 PM Jiufu Guo  wrote:

> Hi,
>
> This patch checks if a constant is possible to be rotated to/from a
> negative
> value from "lis".  If so, we could use "lis;rotldi" to build it.
> The positive value of "lis" does not need to be analyzed.  Because if a
> constant can be rotated from the positive value of "lis", it also can be
> rotated from a positive value of "li".
>
> Bootstrap and regtest pass on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): New
> function.
> (can_be_built_by_li_and_rotldi): Rename to ...
> (can_be_built_by_li_lis_and_rotldi): ... this function.
> (rs6000_emit_set_long_const): Call
> can_be_built_by_li_lis_and_rotldi.
>

This patch is okay.

Thanks, David


>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/const-build.c: Add more tests.
> ---
>  gcc/config/rs6000/rs6000.cc   | 42 ---
>  .../gcc.target/powerpc/const-build.c  | 16 ++-
>  2 files changed, 52 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 1dd0072350a..03cd9d5e952 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10278,19 +10278,51 @@ can_be_rotated_to_negative_li (HOST_WIDE_INT c,
> int *rot)
>return can_be_rotated_to_lowbits (~c, 15, rot);
>  }
>
> -/* Check if value C can be built by 2 instructions: one is 'li', another
> is
> -   rotldi.
> +/* Check if C can be rotated to a negative value which 'lis' instruction
> is
> +   able to load: 1..1xx0..0.  If so, set *ROT to the number by which C is
> +   rotated, and return true.  Return false otherwise.  */
> +
> +static bool
> +can_be_rotated_to_negative_lis (HOST_WIDE_INT c, int *rot)
> +{
> +  /* case a. 1..1xxx0..01..1: up to 15 x's, at least 16 0's.  */
> +  int leading_ones = clz_hwi (~c);
> +  int tailing_ones = ctz_hwi (~c);
> +  int middle_zeros = ctz_hwi (c >> tailing_ones);
> +  if (middle_zeros >= 16 && leading_ones + tailing_ones >= 33)
> +{
> +  *rot = HOST_BITS_PER_WIDE_INT - tailing_ones;
> +  return true;
> +}
> +
> +  /* case b. xx0..01..1xx: some of 15 x's (and some of 16 0's) are
> + rotated over the highest bit.  */
> +  int pos_one = clz_hwi ((c << 16) >> 16);
> +  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
> +  int middle_ones = clz_hwi (~(c << pos_one));
> +  if (middle_zeros >= 16 && middle_ones >= 33)
> +{
> +  *rot = pos_one;
> +  return true;
> +}
> +
> +  return false;
> +}
> +
> +/* Check if value C can be built by 2 instructions: one is 'li or lis',
> +   another is rotldi.
>
> If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
> is set to -1, and return true.  Return false otherwise.  */
>
>  static bool
> -can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
> +can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, int *shift,
>HOST_WIDE_INT *mask)
>  {
>int n;
>if (can_be_rotated_to_positive_li (c, )
> -  || can_be_rotated_to_negative_li (c, ))
> +  || can_be_rotated_to_negative_li (c, )
> +  || can_be_rotated_to_negative_lis (c, ))
>  {
>*mask = HOST_WIDE_INT_M1;
>*shift = HOST_BITS_PER_WIDE_INT - n;
> @@ -10346,7 +10378,7 @@ rs6000_emit_set_long_const (rtx dest,
> HOST_WIDE_INT c)
>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>  GEN_INT ((ud2 ^ 0x) << 16)));
>  }
> -  else if (can_be_built_by_li_and_rotldi (c, , ))
> +  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>unsigned HOST_WIDE_INT imm = (c | ~mask);
> diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c
> b/gcc/testsuite/gcc.target/powerpc/const-build.c
> index 70f095f6bf2..c38a1dd91f2 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const-build.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
> @@ -34,14 +34,28 @@ li_rotldi_4 (void)
>return 0x2194LL;
>  }
>
> +long long NOIPA
> +lis_rotldi_5 (void)
> +{
> +  return 0x8531LL;
> +}
> +
> +long long NOIPA
> +lis_rotldi_6 (void)
> +{
> +  return 0x5318LL;
> +}
> +
>  struct fun arr[] = {
>{li_rotldi_1, 0x75310LL},
>{li_rotldi_2, 0x2164LL},
>{li_rotldi_3, 0x8531LL},
>{li_rotldi_4, 0x2194LL},
> +  {lis_rotldi_5, 0x8531LL},
> +  {lis_rotldi_6, 0x5318LL},
>  };
>
> -/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mrotldi\M} 6 } } */
>
>  int
>  main ()
> --
> 2.39.1
>
>

Re: [PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-10 Thread David Edelsohn via Gcc-patches

On Wed, Jun 7, 2023 at 9:55 PM Jiufu Guo  wrote:

> Hi,
>
> This patch checks if a constant is possible to be rotated to/from a
> positive
> or negative value from "li". If so, we could use "li;rotldi" to build it.
>
> Bootstrap and regtest pass on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.cc (can_be_rotated_to_positive_li): New
> function.
> (can_be_rotated_to_negative_li): New function.
> (can_be_built_by_li_and_rotldi): New function.
> (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/const-build.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc   | 64 +--
>  .../gcc.target/powerpc/const-build.c  | 54 
>  2 files changed, 112 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 42f49e4a56b..1dd0072350a 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10258,6 +10258,48 @@ rs6000_emit_set_const (rtx dest, rtx source)
>return true;
>  }
>
> +/* Check if C can be rotated to a positive value which 'li' instruction
> +   is able to load.  If so, set *ROT to the number by which C is rotated,
> +   and return true.  Return false otherwise.  */
> +
> +static bool
> +can_be_rotated_to_positive_li (HOST_WIDE_INT c, int *rot)
> +{
> +  /* 49 leading zeros and 15 low bits on the positive value
> + generated by 'li' instruction.  */
> +  return can_be_rotated_to_lowbits (c, 15, rot);
> +}
> +
> +/* Like can_be_rotated_to_positive_li, but check the negative value of
> 'li'.  */
> +
> +static bool
> +can_be_rotated_to_negative_li (HOST_WIDE_INT c, int *rot)
> +{
> +  return can_be_rotated_to_lowbits (~c, 15, rot);
> +}
> +
> +/* Check if value C can be built by 2 instructions: one is 'li', another
> is
> +   rotldi.
> +
> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
> +   is set to -1, and return true.  Return false otherwise.  */
> +
>

I look at this feature and it's good, but I don't fully understand the
benefit of this level of abstraction.  Ideally all of the above functions
would be inlined.  They aren't reused.


> +static bool
> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
> +  HOST_WIDE_INT *mask)
> +{
> +  int n;
> +  if (can_be_rotated_to_positive_li (c, )
> +  || can_be_rotated_to_negative_li (c, ))
>

Why not

/* Check if C or ~C can be rotated to a positive or negative value
which 'li' instruction is able to load.  */
if (can_be_rotated_to_lowbits (c, 15, )
|| can_be_rotated_to_lowbits (~c, 15, ))
...

This is a style of software engineering, but it seems overkill to me when
the function is a single line that tail calls another function.  Am I
missing something?

The rest of this patch looks good.

Thanks, David


> +{
> +  *mask = HOST_WIDE_INT_M1;
> +  *shift = HOST_BITS_PER_WIDE_INT - n;
> +  return true;
> +}
> +
> +  return false;
> +}
> +
>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> Output insns to set DEST equal to the constant C as a series of
> lis, ori and shl instructions.  */
> @@ -10266,15 +10308,14 @@ static void
>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>  {
>rtx temp;
> +  int shift;
> +  HOST_WIDE_INT mask;
>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>
>ud1 = c & 0x;
> -  c = c >> 16;
> -  ud2 = c & 0x;
> -  c = c >> 16;
> -  ud3 = c & 0x;
> -  c = c >> 16;
> -  ud4 = c & 0x;
> +  ud2 = (c >> 16) & 0x;
> +  ud3 = (c >> 32) & 0x;
> +  ud4 = (c >> 48) & 0x;
>
>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
> @@ -10305,6 +10346,17 @@ rs6000_emit_set_long_const (rtx dest,
> HOST_WIDE_INT c)
>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>  GEN_INT ((ud2 ^ 0x) << 16)));
>  }
> +  else if (can_be_built_by_li_and_rotldi (c, , ))
> +{
> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
> +
> +  emit_move_insn (temp, GEN_INT (imm));
> +  if (shift != 0)
> +   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
> +  emit_move_insn (dest, temp);
> +}
>else if (ud3 == 0 && ud4 == 0)
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c
> b/gcc/testsuite/gcc.target/powerpc/const-build.c
> new file mode 100644
> index 000..70f095f6bf2
> --- /dev/null
> +++

Re: [PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-02 Thread David Edelsohn via Gcc-patches

Hi, Jiufu

* config/rs6000/rs6000.cc (can_be_rotated_to_possitive_li): New 
function.
(can_be_rotated_to_negative_li): New function.
(can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

In English the word "positive" contains one "s", not two.  Please
correct throughout the patches.

Also a style issue, comments before a function should be followed by a
blank line.

> +/* Check if C can be rotated to a possitive value which 'li' instruction

positive
> +   is able to load.  If so, set *ROT to the number by which C is rotated,
> +   and return true.  Return false otherwise.  */

Add a blank line here
> +static bool
> +can_be_rotated_to_possitive_li (HOST_WIDE_INT c, int *rot)

positive
> +{
> +  /* 49 leading zeros and 15 lowbits on the possitive value

low bits, positive

> + generated by 'li' instruction.  */
> +  return can_be_rotated_to_lowbits (c, 15, rot);
> +}

> +/* Check if value C can be built by 2 instructions: one is 'li', another is
> +   rotldi.
> +
> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
> +   is set to -1, and return true.  Return false otherwise.  */
> +static bool
> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
> +HOST_WIDE_INT *mask)
> +{
> +  int n;
> +  if (can_be_rotated_to_possitive_li (c, )
> +  || can_be_rotated_to_negative_li (c, ))
> +{
> +  *mask = HOST_WIDE_INT_M1;
> +  *shift = HOST_BITS_PER_WIDE_INT - n;
> +  return true;
> +}
> +
> +  return false;
> +}
> +
>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> Output insns to set DEST equal to the constant C as a series of
> lis, ori and shl instructions.  */
> @@ -10246,15 +10285,14 @@ static void>  rs6000_emit_set_long_const (rtx dest, 
> HOST_WIDE_INT c)>  {>rtx temp;> +  int shift;> +  HOST_WIDE_INT mask;>
> HOST_WIDE_INT ud1, ud2, ud3, ud4;>  >ud1 = c & 0x;> -  c = c >> 16;> 
> -  ud2 = c & 0x;> -  c = c >> 16;> -  ud3 = c & 0x;> -  c = c >> 16;> 
> -  ud4 = c & 0x;> +  ud2 = (c >> 16) & 0x;> +  ud3 = (c >> 32) & 
> 0x;> +  ud4 = (c >> 48) & 0x;>  >if ((ud4 == 0x && ud3 == 
> 0x && ud2 == 0x && (ud1 & 0x8000))>|| (ud4 == 0 && ud3 == 0 
> && ud2 == 0 && ! (ud1 & 0x8000)))> @@ -10278,6 +10316,19 @@ 
> rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)>emit_move_insn 
> (dest, gen_rtx_XOR (DImode, temp,> 
> GEN_INT ((ud2 ^ 0x) << 16)));>  }> +  else if 
> (can_be_built_by_li_and_rotldi (c, , ))> +{> +  temp = 
> !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);> +  unsigned 
> HOST_WIDE_INT imm = (c | ~mask);> +  imm = (imm >> shift) | (imm << 
> (HOST_BITS_PER_WIDE_INT - shift));> +> +  emit_move_insn (temp, GEN_INT 
> (imm));> +  if (shift != 0)> + temp = gen_rtx_ROTATE (DImode, temp, 
> GEN_INT (shift));> +  if (mask != HOST_WIDE_INT_M1)

How is mask != HOST_WIDE_INT_M1? The call to
can_by_built_by_li_and_rotldi() set it

to that value and it is not modified in the interim statements.

> + temp = gen_rtx_AND (DImode, temp, GEN_INT (mask));> +  
> emit_move_insn (dest, temp);> +}>else if (ud3 == 0 && ud4 == 0)>  
> {>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);

Thanks, David

Re: ping^^: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-05-31 Thread David Edelsohn via Gcc-patches

On Tue, May 30, 2023 at 11:00 PM Jiufu Guo  wrote:

>
> Gentle ping...
>
> Jiufu Guo via Gcc-patches  writes:
>
> > Gentle ping...
> >
> > Jiufu Guo via Gcc-patches  writes:
> >
> >> Hi,
> >>
> >> I'm thinking that we may enable this patch for stage1, so ping it.
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html
> >>
> >> BR,
> >> Jeff (Jiufu)
> >>
> >> Jiufu Guo  writes:
> >>
> >>> Hi,
> >>>
> >>> There is a functionality as const_anchor in cse.cc.  This const_anchor
> >>> supports to generate new constants through adding small gap/offsets to
> >>> existing constant.  For example:
> >>>
> >>> void __attribute__ ((noinline)) foo (long long *a)
> >>> {
> >>>   *a++ = 0x2351847027482577LL;
> >>>   *a++ = 0x2351847027482578LL;
> >>> }
> >>> The second constant (0x2351847027482578LL) can be compated by adding
> '1'
> >>> to the first constant (0x2351847027482577LL).
> >>> This is profitable if more than one instructions are need to build the
> >>> second constant.
> >>>
> >>> * For rs6000, we can enable this functionality, as the instruction
> >>> 'addi' is just for this when gap is smaller than 0x8000.
> >>>
> >>> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
> >>> one issue. The issue is:
> >>> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
> >>> "try_const_anchors".
> >>>
> >>> * One potential side effect of this patch:
> >>> Comparing with
> >>> "r101=0x2351847027482577LL
> >>> ...
> >>> r201=0x2351847027482578LL"
> >>> The new r201 will be "r201=r101+1", and then r101 will live longer,
> >>> and would increase pressure when allocating registers.
> >>> But I feel, this would be acceptable for this const_anchor feature.
> >>>
> >>> * With this patch, I checked the performance change on SPEC2017, while,
> >>> and the performance is not aggressive, since this functionality is not
> >>> hit on any hot path. There are runtime wavings/noise(e.g. on
> >>> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
> >>>
> >>> With this patch, I also checked the changes in object files (from
> >>> GCC bootstrap and SPEC), the significant changes are the improvement
> >>> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
> >>> other optimizations opportunities: like combine/jump2. While the
> >>> code to store/load one more register is also occurring in few cases,
> >>> but it does not impact overall performance.
> >>>
> >>> * To refine this patch, some history discussions are referenced:
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
> >>>
> >>>
> >>> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
> >>> Is this ok for trunk?
>

Hi, Jiufu

Thanks for developing this patch and your persistence.

The rs6000.cc part of the patch (TARGET_CONST_ANCHOR) is okay for Stage 1.
This is approved.

I don't have the authority to approve the change to cse_insn.  Is the
cse_insn change a prerequisite?  Will the rs6000 change break or produce
wrong code without the cse change?  The second part of the patch should be
posted separately to the mailing list, with a cc for appropriate
maintainers, because most maintainers will not be following this specific
thread to approve the other part of the patch.

Thanks, David


> >>>
> >>>
> >>> BR,
> >>> Jeff (Jiufu)
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
> >>> * cse.cc (cse_insn): Add guard condition.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> * gcc.target/powerpc/const_anchors.c: New test.
> >>> * gcc.target/powerpc/try_const_anchors_ice.c: New test.
> >>>
> >>> ---
> >>>  gcc/config/rs6000/rs6000.cc   |  4 
> >>>  gcc/cse.cc|  3 ++-
> >>>  .../gcc.target/powerpc/const_anchors.c| 20 +++
> >>>  .../powerpc/try_const_anchors_ice.c   | 16 +++
> >>>  4 files changed, 42 insertions(+), 1 deletion(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
> >>>  create mode 100644
> gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
> >>>
> >>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >>> index d2743f7bce6..80cded6dec1 100644
> >>> --- a/gcc/config/rs6000/rs6000.cc
> >>> +++ b/gcc/config/rs6000/rs6000.cc
> >>> @@ -1760,6 +1760,10 @@ static const struct attribute_spec
> rs6000_attribute_table[] =
> >>>
> >>>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
> >>>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO
> rs6000_update_ipa_fn_target_info
> >>> +
> >>> +#undef TARGET_CONST_ANCHOR
> >>> +#define TARGET_CONST_ANCHOR 0x8000
> >>> +
> >>>
> >>>
> >>>  /* Processor table.  */
> >>> diff --git a/gcc/cse.cc b/gcc/cse.cc
> >>> index b13afd4ba72..56542b91c1e 100644
> >>> --- a/gcc/cse.cc
> >>> +++ b/gcc/cse.cc
> >>> @@

Re: [PATCH 7/7] Expand directly for single bit test

2023-05-21 Thread David Edelsohn via Gcc-patches

On Sun, May 21, 2023 at 11:25 AM Andrew Pinski  wrote:

> On Sun, May 21, 2023 at 11:17 AM David Edelsohn via Gcc-patches
>  wrote:
> >
> > Hi, Andrew
> >
> > Thanks for this series of patches to improve do_store_flag.
> Unfortunately
> > this specific patch in the series has caused a bootstrap failure on
> > powerpc-aix.  I bisected this failure to this specific patch. Note that I
> > am building as 32 bit, so this could be a specific issue about bit size.
> >
> > * expr.cc (fold_single_bit_test): Rename to ...
> > (expand_single_bit_test): This and expand directly.
> > (do_store_flag): Update for the rename function.
>
> Did this include the fix I did for big-endian at
> r14-1022-g7f3df8e65c71e5 ? I had found that I broke big-endian last
> night with that patch and pushed the fix once I figured out what I did
> wrong.
> If you already tried post the fix, I will try to look into it as soon
> as possible.
>
>
The big-endian patch fixed the issue for Power also.

Thanks, David

Re: [PATCH 7/7] Expand directly for single bit test

2023-05-21 Thread David Edelsohn via Gcc-patches

Hi, Andrew

Thanks for this series of patches to improve do_store_flag.  Unfortunately
this specific patch in the series has caused a bootstrap failure on
powerpc-aix.  I bisected this failure to this specific patch. Note that I
am building as 32 bit, so this could be a specific issue about bit size.

* expr.cc (fold_single_bit_test): Rename to ...
(expand_single_bit_test): This and expand directly.
(do_store_flag): Update for the rename function.


Thanks, David

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread David Edelsohn via Gcc-patches

On Fri, May 5, 2023 at 11:38 AM Tamar Christina 
wrote:

> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Friday, May 5, 2023 4:33 PM
> > To: Tamar Christina 
> > Cc: Jeff Law ; David Edelsohn  >;
> > GCC Patches 
> > Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> > configurable.
> >
> > On Fri, May 05, 2023 at 03:22:11PM +, Tamar Christina wrote:
> > > > We require GNU make, so perhaps we could use something like
> > > > $(wordlist
> > > > 1,$(NUM_MATCH_SPLITS),$(check_p_numbers))
> > > > instead of
> > > > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > > > provided we move the check_p_numbers definition earlier (or perhaps
> > > > bettter rename it to something more generic, so that it is clear
> > > > that is a variable holding numbers from 1 to .
> > >
> > > I'm currently testing
> > >
> > > NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@ -
> > MATCH_SPLITS_SEQ =
> > > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > > +MATCH_SPLITS_SEQ = $(shell echo {1..$(NUM_MATCH_SPLITS)})
> > >
> > > Which seems to work since it looks like we require an sh compatible
> shell.
> > >
> > > Question is this right? From the existing
> >
> > AIX /bin/sh certainly doesn't handle that.
>
> Wow, wonder what sh version it has..
>
> >
> > But what do I know about AIX...
>
> Same..
>

AIX defaults to Korn Shell.

I always use Bash on AIX to build GCC and recommend Bash in the GCC build
instructions for AIX.

Do we want to require Bash?  Bash is a more self-contained requirement than
seq from coreutils.

Thanks, David


>
> >
> > This seems to work and we use it already in the Makefile.
> > If something else works portably, we could change both spots...
> >
> > 2023-05-05  Jakub Jelinek  
> >
> >   * Makefile.in (check_p_numbers): Rename to one_to_, move
> >   earlier with helper variables also renamed.
> >   (MATCH_SPLUT_SEQ): Use $(wordlist
> > 1,$(NUM_MATCH_SPLITS),$(one_to_))
> >   instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
> >   (check_p_subdirs): Use $(one_to_) instead of
> > $(check_p_numbers).
> >
> > --- gcc/Makefile.in.jj2023-05-05 16:02:37.180575333 +0200
> > +++ gcc/Makefile.in   2023-05-05 17:20:27.923251821 +0200
> > @@ -214,9 +214,19 @@ rtl-ssa-warn = $(STRICT_WARN)
> > GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn)
> > $(if $(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN))
> > $(NOCOMMON_FLAG) $($@-warn)  GCC_WARN_CXXFLAGS =
> > $(LOOSE_WARN) $($(@D)-warn) $(NOCOMMON_FLAG) $($@-warn)
> >
> > +# 1 2 3 ... 
> > +one_to__0:=1 2 3 4 5 6 7 8 9
> > +one_to__1:=0 $(one_to__0)
> > +one_to__2:=$(foreach i,$(one_to__0),$(addprefix
> > +$(i),$(one_to__1))) one_to__3:=$(addprefix
> > 0,$(one_to__1))
> > +$(one_to__2) one_to__4:=$(foreach
> > +i,$(one_to__0),$(addprefix $(i),$(one_to__3)))
> > +one_to__5:=$(addprefix 0,$(one_to__3)) $(one_to__4)
> > +one_to__6:=$(foreach i,$(one_to__0),$(addprefix
> > +$(i),$(one_to__5)))
> > +one_to_:=$(one_to__0) $(one_to__2) $(one_to__4)
> > +$(one_to__6)
> > +
> >  # The number of splits to be made for the match.pd files.
> >  NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@ -
> > MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
> > +MATCH_SPLITS_SEQ = $(wordlist
> > 1,$(NUM_MATCH_SPLITS),$(one_to_))
> >  GIMPLE_MATCH_PD_SEQ_SRC = $(patsubst %, gimple-match-%.cc,
> > $(MATCH_SPLITS_SEQ))  GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-
> > match-%.o, $(MATCH_SPLITS_SEQ))  GENERIC_MATCH_PD_SEQ_SRC =
> > $(patsubst %, generic-match-%.cc, $(MATCH_SPLITS_SEQ)) @@ -4234,18
> > +4244,10 @@ $(patsubst %,%-subtargets,$(lang_checks)
> > check_p_tool=$(firstword $(subst _, ,$*))
> >  check_p_count=$(check_$(check_p_tool)_parallelize)
> >  check_p_subno=$(word 2,$(subst _, ,$*))
> > -check_p_numbers0:=1 2 3 4 5 6 7 8 9
> > -check_p_numbers1:=0 $(check_p_numbers0) -
> > check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix
> > $(i),$(check_p_numbers1))) -check_p_numbers3:=$(addprefix
> > 0,$(check_p_numbers1)) $(check_p_numbers2) -
> > check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix
> > $(i),$(check_p_numbers3))) -check_p_numbers5:=$(addprefix
> > 0,$(check_p_numbers3)) $(check_p_numbers4) -
> > check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix
> > $(i),$(check_p_numbers5)))
> > -check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2)
> > $(check_p_numbers4) $(check_p_numbers6)  check_p_subdir=$(subst _,,$*)
> > check_p_subdirs=$(wordlist 1,$(check_p_count),$(wordlist 1, \
> >   $(if
> > $(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128), \
> > - $(check_p_numbers)))
> > + $(one_to_)))
>
> Thanks, If it works I'm happy, I can rebase my other patches to use this.
>
> Thank you!
>
> Regards,
> Tamar
>
> >
> >  # For parallelized check-% targets, this decides whether
> parallelization  # is
> > desirable (if -jN is

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread David Edelsohn via Gcc-patches

This patch has broken GCC bootstrap on AIX.  It appears to rely upon, or
complain about, the command "seq":

/nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
-Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
-fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -static-libstdc++
-static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genmatch \
build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
build/errors.o build/vec.o build/hash-table.o build/sort.o
../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
/usr/bin/bash: seq: command not found
/usr/bin/bash: seq: command not found
build/genmatch --gimple \
--header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
/nasfarm/edelsohn/src/src/gcc/match.pd

All of the match files are dumped to stdout.

Thanks, David

Re: [PATCH, wwwdocs] readings: Update AIX linker links

2023-01-30 Thread David Edelsohn via Gcc-patches

On Mon, Jan 30, 2023 at 4:52 PM Gerald Pfeifer  wrote:

> Hi David,
>
> the noticed the links to the AIX 4.3 and AIX 5L docs were broken and could
> not find a good replacement, though I did find one for AIX 7.2.
>
> How about the patch below?
>

Hi, Gerald

That change is fine with me.

Thanks, David


>
> Or should we omit such links? (Or do you have recommendations?)
>
> Thanks,
> Gerald
>
>
> diff --git a/htdocs/readings.html b/htdocs/readings.html
> index 3f41ef2a..0a978e8f 100644
> --- a/htdocs/readings.html
> +++ b/htdocs/readings.html
> @@ -270,8 +270,7 @@ names.
>Manufacturer: IBM, Motorola
>https://www.ibm.com/systems/power/openpower/;>Power
> ISA
>https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture;>64-Bit
> ELF V2 ABI - OpenPOWER ABI
> -  http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/43_docs/aixassem/alangref/toc.htm;>AIX
> V4.3 Assembler Language Ref.
> -  http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixassem/alangref/alangreftfrm.htm;>AIX
> 5L Assembler Language Ref.
> +  https://www.ibm.com/docs/en/ssw_aix_72/assembler/assembler_pdf.pdf;>AIX
> 7.2 Assembler Language Reference
>   
>
>   rx
>

Re: [PATCH RFC] c++: implement P1492 contracts

2022-11-21 Thread David Edelsohn via Gcc-patches

This patch broke bootstrap on AIX due to its use of strchrnul().  It is an
extension available in Linux, but not all targets.

/src/src/gcc/cp/contracts.cc:213:21: error: 'strchrnul' was not declared in
this scope; did you mean 'strchr'?
  213 |   size_t role_len = strchrnul (role, ':') - role;
  | ^
  | strchr

Would you please avoid the extension or provide an implementation?

Thanks David

Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-18 Thread David Edelsohn via Gcc-patches

On Fri, Nov 18, 2022 at 7:20 AM Segher Boessenkool <
seg...@kernel.crashing.org> wrote:

> On Fri, Nov 18, 2022 at 02:35:30PM +0800, HAO CHEN GUI wrote:
> > 在 2022/11/17 21:24, David Edelsohn 写道:
> > > Why are you using zero_constant predicate instead of matching
> (const_int 0) for operand 2?
> > The "const_int 0" is an operand other than a predicate. We need a
> predicate here.
>
> Said differently, it is passed as an operand to this named pattern or
> optab, so you need a match_operand here.
>

Earlier versions of patterns for other targets used (const_int 0), but they
seem to have changed that, so match_operand is needed.

Thanks, David


>
> > > Why does this need the new all_branch_comparison_operator?  Can the
> ifcvt optimization correctly elide the 2 insn sequence?
> > Because rs6000 defines "*cbranch_2insn" insn, such insns are generated
> after expand.
> >
> > (jump_insn 50 47 51 11 (set (pc)
> > (if_then_else (ge (reg:CCFP 156)
> > (const_int 0 [0]))
> > (label_ref 53)
> > (pc)))
> "/home/guihaoc/gcc/gcc-mainline-base/gmp/mpz/cmpabs_d.c":80:7 884
> {*cbranch_2insn}
> >  (expr_list:REG_DEAD (reg:CCFP 156)
> > (int_list:REG_BR_PROB 633507684 (nil)))
> >  -> 53)
>
> But notice the cost of *cbranch_2insn -- ifcvt should never generate
> cbranchcc4 with such composite conditions!
>
> > In prepare_cmp_insn, the comparison is verified by insn_operand_matches.
> If
> > extra_insn_branch_comparison_operator is not included in "cbranchcc4"
> predicate,
> > it hits ICE here.
> >
> >   if (GET_MODE_CLASS (mode) == MODE_CC)
> > {
> >   enum insn_code icode = optab_handler (cbranch_optab, CCmode);
> >   test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
> >   gcc_assert (icode != CODE_FOR_nothing
> >   && insn_operand_matches (icode, 0, test));
> >   *ptest = test;
> >   return;
> > }
> >
> > The real conditional move is generated by emit_conditional_move_1.
> Commonly
> > "*cbranch_2insn" can't be optimized out and it returns NULL_RTX.
> >
> >   if (COMPARISON_P (comparison))
> > {
> >   saved_pending_stack_adjust save;
> >   save_pending_stack_adjust ();
> >   last = get_last_insn ();
> >   do_pending_stack_adjust ();
> >   machine_mode cmpmode = comp.mode;
> >   prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> > GET_CODE (comparison), NULL_RTX, unsignedp,
> > OPTAB_WIDEN, , );
> >   if (comparison)
> > {
> >rtx res = emit_conditional_move_1 (target, comparison,
> >   op2, op3, mode);
> >if (res != NULL_RTX)
> >  return res;
> > }
> >   delete_insns_since (last);
> >   restore_pending_stack_adjust ();
> >
> > I think that extra_insn_branch_comparison_operator should be included in
> > "cbranchcc4" predicates as such insns exist. And leave it to
> > emit_conditional_move which decides whether it can be optimized or not.
>
> I don't think we should pretend we have any conditional jumps the
> machine does not actually have, in cbranchcc4.  When would this ever be
> useful?  cror;beq can be quite expensive, compared to the code it would
> replace anyway.
>
> If something generates those here (which then ICEs later), that is
> wrong, fix *that*?  Is it ifcvt doing it?
>
>
> Segher
>

Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-17 Thread David Edelsohn via Gcc-patches

On Thu, Nov 17, 2022 at 1:39 AM HAO CHEN GUI  wrote:

> Hi,
>   The patch enables have_cbrnachcc4 which is a flag in ifcvt.cc to
> indicate if branch by CC bits is invalid or not. The new expand pattern
> "cbranchcc4" is created which intend to match the pattern defined in
> "*cbranch", "*cbranch_2insn" and "*creturn". The operand sequence in
> "cbranchcc4" is inline with the definition in gccint. And the operand
> sequence doesn't matter in pattern matching. So I think it should work.
>
>   Compared to last version, one new predicate and one new expander are
> created.
>
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-11-17  Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (all_branch_comparison_operator):
> New,
> and includes operators in branch_comparison_operator and
> extra_insn_branch_comparison_operator.
> * config/rs6000/rs6000.md (cbranchcc4): New expand pattern.
>
> gcc/testsuite/
> * gcc.target/powerpc/cbranchcc4.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md
> b/gcc/config/rs6000/predicates.md
> index b1fcc69bb60..843b6f39b84 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -1308,6 +1308,7 @@ (define_special_predicate "equality_operator"
>
>  ;; Return 1 if OP is a comparison operation that is valid for a branch
>  ;; instruction.  We check the opcode against the mode of the CC value.
> +
>  ;; validate_condition_mode is an assertion.
>  (define_predicate "branch_comparison_operator"
> (and (match_operand 0 "comparison_operator")
> @@ -1331,6 +1332,11 @@ (define_predicate
> "extra_insn_branch_comparison_operator"
>   GET_MODE (XEXP (op, 0))),
>  1")))
>
> +;; Return 1 if OP is a comparison operation that is valid for a branch.
> +(define_predicate "all_branch_comparison_operator"
> +   (ior (match_operand 0 "branch_comparison_operator")
> +   (match_operand 0 "extra_insn_branch_comparison_operator")))
> +
>  ;; Return 1 if OP is an unsigned comparison operator.
>  (define_predicate "unsigned_comparison_operator"
>(match_code "ltu,gtu,leu,geu"))
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index e9e5cd1e54d..7b7d747a85d 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -13067,6 +13067,16 @@ (define_insn_and_split "*_cc"
>  ;; Conditional branches.
>  ;; These either are a single bc insn, or a bc around a b.
>
> +(define_expand "cbranchcc4"
> +  [(set (pc)
> +   (if_then_else (match_operator 0 "all_branch_comparison_operator"
> +   [(match_operand 1 "cc_reg_operand")
> +(match_operand 2 "zero_constant")])
> + (label_ref (match_operand 3))
> + (pc)))]
> +  ""
> +  "")
> +
>

This is better, but the pattern should be near and after the existing
cbranch4 patterns earlier in the file, not the *cbranch pattern.  It
doesn't match the comment.

Why are you using zero_constant predicate instead of matching (const_int 0)
for operand 2?

Why does this need the new all_branch_comparison_operator?  Can the ifcvt
optimization correctly elide the 2 insn sequence?

Thanks, David


>  (define_insn "*cbranch"
>[(set (pc)
> (if_then_else (match_operator 1 "branch_comparison_operator"
> diff --git a/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c
> b/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c
> new file mode 100644
> index 000..528ba1a878d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ce1" } */
> +/* { dg-final { scan-rtl-dump "noce_try_store_flag_constants" "ce1" } } */
> +
> +/* The inner branch should be detected by ifcvt then be converted to a
> setcc
> +   with a plus by noce_try_store_flag_constants.  */
> +
> +int test (unsigned int a, unsigned int b)
> +{
> +return (a < b ? 0 : (a > b ? 2 : 1));
> +}
>

Re: [rs6000, patch] Enable have_cbranchcc4 on rs6000

2022-11-16 Thread David Edelsohn via Gcc-patches

Hao

cbranchcc4 is a named pattern and requires a specific operand ordering.  If
you change *cbranch to cbranchcc4, you must change the order of the
operands, not a quick and dirty hack to *cbranch.  Also, you should change
*cbranch_2insn and *creturn as well so that all of the patterns are
consistent.

See for example the aarch64.md implementation.  and the documentation in
Standard Names

https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html

which mentions cbranch4 and, briefly, cbranchcc4.

You seemed to want to make the minimal change so that the pattern would
work with ifcvt without considering the impact on the existing pattern and
without understanding what a named pattern with specific operands really
means.  You changed the pattern predicate so that the operands in the wrong
positions would match the pattern.

Thanks, David

On Wed, Nov 16, 2022 at 12:56 AM HAO CHEN GUI  wrote:

> Hi David,
>   I found definition of the operands in 'cbranch'. The argumnets matters.
> I will create a new expand pattern for cbranchcc4. Thanks a lot for your
> comments.
>
> 'cbranchmode4’
> Conditional branch instruction combined with a compare instruction.
> Operand 0 is a comparison operator. Operand 1 and operand 2 are the
> first and second operands of the comparison, respectively. Operand 3
> is the code_label to jump to.
>
> Gui Haochen
> Thanks
>
> 在 2022/11/16 11:04, David Edelsohn 写道:
> > It's great to add cbranchcc4 to the Power port where it definitely was
> an omission, but adapting *cbranch for that purpose is the wrong approach.
> The changes to the pattern are incorrect because they are covering up a
> difference in ordering of the operands.  One can argue that the named
> pattern only enables the functionality in ifcvt and the pattern otherwise
> is used in its previous role.  But this is a Frankenstein monster
> approach.  You're trying to twist the existing pattern so that it triggers
> as cbranchcc4, but creating a pattern that messes up its arguments and only
> works because the new, named pattern never is called.  This is too ugly.
> Please fix.
>

Re: [rs6000, patch] Enable have_cbranchcc4 on rs6000

2022-11-15 Thread David Edelsohn via Gcc-patches

On Tue, Nov 15, 2022 at 9:32 PM HAO CHEN GUI  wrote:

> Hi,
>   The patch enables have_cbrnachcc4 which is a flag in ifcvt.cc to
> indicate if branch by CC bits is invalid or not. As rs6000 already has
> "*cbranch" insn which does branching according to CC bits, the flag
> should be enabled and relevant branches can be optimized out. The test
> case illustrates the optimization.
>
>   "*cbranch" is an anonymous insn which can't be generated directly.
> So changing "const_int 0" to the third operand predicated by
> "zero_constant" won't cause ICEs as orginal patterns still can be matched.
>
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
>
> ChangeLog
> 2022-11-16  Haochen Gui 
>
> gcc/
> * config/rs6000/rs6000.md (*cbranch): Rename to...
> (cbranchcc4): ...this, and set const_int 0 to the third operand.
>
> gcc/testsuite/
> * gcc.target/powerpc/cbranchcc4.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index e9e5cd1e54d..ee171f21f6a 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -13067,11 +13067,11 @@ (define_insn_and_split "*_cc"
>  ;; Conditional branches.
>  ;; These either are a single bc insn, or a bc around a b.
>
> -(define_insn "*cbranch"
> +(define_insn "cbranchcc4"
>[(set (pc)
> (if_then_else (match_operator 1 "branch_comparison_operator"
>   [(match_operand 2 "cc_reg_operand"
> "y")
> -  (const_int 0)])
> +  (match_operand 3 "zero_constant")])
>   (label_ref (match_operand 0))
>   (pc)))]
>""
>

Shouldn't cbranchcc4 be a separate pattern?  This pattern has an existing
purpose and an expected ordering of operands.

cbranchcc4 is passed the comparison operator as operand 0.  Other ports
ignore the second comparison operator and use (const_int 0).  Operand 3 is
the label, which seems to be the reason that you needed to change it to
match_operand 3.

It's great to add cbranchcc4 to the Power port where it definitely was an
omission, but adapting *cbranch for that purpose is the wrong approach.
The changes to the pattern are incorrect because they are covering up a
difference in ordering of the operands.  One can argue that the named
pattern only enables the functionality in ifcvt and the pattern otherwise
is used in its previous role.  But this is a Frankenstein monster
approach.  You're trying to twist the existing pattern so that it triggers
as cbranchcc4, but creating a pattern that messes up its arguments and only
works because the new, named pattern never is called.  This is too ugly.
Please fix.

Thanks, David

diff --git a/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c
> b/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c
> new file mode 100644
> index 000..1751d274bbf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ce1" } */
> +/* { dg-final {scan-rtl-dump "noce_try_store_flag_constants" "ce1" } } */
> +
> +int test (unsigned int a, unsigned int b)
> +{
> +return (a < b ? 0 : (a > b ? 2 : 1));
> +}
>

Re: [PATCH V2] Enable small loop unrolling for O2

2022-11-09 Thread David Edelsohn via Gcc-patches

> This patch does not change rs6000/s390 since I don't have machines to
> test them, but I suppose the default behavior is the same since they
> enable flag_unroll_loops at O2.

There are Power (rs6000) systems in the Compile Farm.

Trial Linux on Z (s390x) VMs are available through the Linux Community
Cloud.
https://linuxone.cloud.marist.edu/#/register?flag=VM

Thanks, David

Re: [PATCH] libstdc++: Allow emergency EH alloc pool size to be tuned [PR68606]

2022-10-11 Thread David Edelsohn via Gcc-patches

This patch seems to have broken bootstrap on AIX.  It seems to assume
methods that aren't guaranteed to be defined.

Thanks, David

libtool: compile:  /tmp/GCC/./gcc/xgcc -B/tmp/GCC/./gcc/
-B/nasfarm/edelsohn/ins
tall/GCC/powerpc-ibm-aix7.2.5.0/bin/
-B/nasfarm/edelsohn/install/GCC/powerpc-ibm
-aix7.2.5.0/lib/ -isystem
/nasfarm/edelsohn/install/GCC/powerpc-ibm-aix7.2.5.0/i
nclude -isystem
/nasfarm/edelsohn/install/GCC/powerpc-ibm-aix7.2.5.0/sys-include
 -fno-checking -DHAVE_CONFIG_H -I..
-I/nasfarm/edelsohn/src/src/libstdc++-v3/../
libiberty -I/nasfarm/edelsohn/src/src/libstdc++-v3/../include
-D_GLIBCXX_SHARED
-I/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/powerpc-ibm-aix7.2.5.0
-I
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include
-I/nasfarm/edelsohn/src/src
/libstdc++-v3/libsupc++ -I/nasfarm/edelsohn/install/include
-I/nasfarm/edelsohn/
install/include -g -O2 -DIN_GLIBCPP_V3 -Wno-error -c cp-demangle.c  -fPIC
-DPIC -o cp-demangle.o
/nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc: In member
function 'void* {anonymous}::pool::allocate(std::size_t)':
/nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc:239:54: error:
no matching function for call to
'__gnu_cxx::__scoped_lock::__scoped_lock(int&)'
  239 |   __gnu_cxx::__scoped_lock sentry(emergency_mutex);
  |  ^
In file included from
/nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc:37:
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:240:14:
note: candidate: '__gnu_cxx::__scoped_lock::__scoped_lock(__mutex_type&)'
  240 | explicit __scoped_lock(__mutex_type& __name) : _M_device(__name)
  |  ^
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:240:42:
note:   no known conversion for argument 1 from 'int' to
'__gnu_cxx::__scoped_lock::__mutex_type&'
  240 | explicit __scoped_lock(__mutex_type& __name) : _M_device(__name)
  |~~^~
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:236:5:
note: candidate: '__gnu_cxx::__scoped_lock::__scoped_lock(const
__gnu_cxx::__scoped_lock&)'
  236 | __scoped_lock(const __scoped_lock&);
  | ^
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:236:19:
note:   no known conversion for argument 1 from 'int' to 'const
__gnu_cxx::__scoped_lock&'
  236 | __scoped_lock(const __scoped_lock&);
  |   ^~~~
make[5]: *** [Makefile:778: eh_alloc.lo] Error 1

Re: [PATCH] Set discriminators for call stmts on the same line within the same basic block

2022-10-10 Thread David Edelsohn via Gcc-patches

This patch causes a bootstrap comparison failure on AIX.  It apparently
does not cause a failure on PPC64BE Linux with the same ABI, so I suspect
that the failure may be related to the way that function aliases are
implemented on AIX, which doesn't have ELF symbol alias semantics.

"This change will also simplify call site lookups since now location with
discriminator will uniquely identify the call site (no callee function name
is needed)."

I will open a PR with more information about the comparison difference now
that I have a work-around to bring AIX back to a bootstrappable state.  Any
thoughts about what could be going wrong?

Thanks, David

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread David Edelsohn via Gcc-patches

On Fri, Sep 23, 2022 at 10:12 AM Thomas Neumann  wrote:

> >
> > +static const bool in_shutdown = false;
> >
> > I'll let Jason or others decide if this is the right solution.  It seems
> > that in_shutdown also could be declared outside the #ifdef and
> > initialized as "false".
>
> sure, either is fine. Moving it outside the #ifdef wastes one byte in
> the executable (while the compiler can eliminate the const), but it does
> not really matter.
>
> I have verified that the patch below fixes builds for both fast-path and
> non-fast-path builds. But if you prefer I will move the in_shutdown
> definition instead.
>
> Best
>
> Thomas
>
> PS: in_shutdown is an int here instead of a bool because non-fast-path
> builds do not include stdbool. Not a good reason, of course, but I
> wanted to keep the patch minimal and it makes no difference in practice.
>
>
>  When using the atomic fast path deregistering can fail during
>  program shutdown if the lookup structures are already destroyed.
>  The assert in __deregister_frame_info_bases takes that into
>  account. In the non-fast-path case however is not aware of
>  program shutdown, which caused a compiler error on such platforms.
>  We fix that by introducing a constant for in_shutdown in
>  non-fast-path builds.
>
>  libgcc/ChangeLog:
>  * unwind-dw2-fde.c: Introduce a constant for in_shutdown
>  for the non-fast-path case.
>
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index d237179f4ea..0bcd5061d76 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -67,6 +67,8 @@ static void
>   init_object (struct object *ob);
>
>   #else
> +/* Without fast path frame deregistration must always succeed.  */
> +static const int in_shutdown = 0;
>
>   /* The unseen_objects list contains objects that have been registered
>  but not yet categorized in any way.  The seen_objects list has had
>

Thanks for the patch.  I'll let you and Jason decide which style solution
is preferred.

Thanks, David

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread David Edelsohn via Gcc-patches

On Fri, Sep 23, 2022 at 9:38 AM Thomas Neumann  wrote:

> > This patch broke bootstrap on AIX and probably other targets.
> >
> > #ifdef ATOMIC_FDE_FAST_PATH
> > #include "unwind-dw2-btree.h"
> >
> > static struct btree registered_frames;
> > static bool in_shutdown;
> > ...
> > #else
> >
> > in_shutdown only is defined for ATOMIC_FDE_FAST_PATH but used in code /
> > asserts not protected by that macro.
> >
> >gcc_assert (in_shutdown || ob);
> >return (void *) ob;
> > }
>
> I am sorry for that, I did not consider that my test machines all use
> the fast path.
>
> I think the problem can be fixed by the trivial patch below, I will
> commit that after I have tested builds both with and without fast path.
>
> Best
>
> Thomas
>
>
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index d237179f4ea..d6e347c5481 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -67,6 +67,8 @@ static void
>   init_object (struct object *ob);
>
>   #else
> +/* Without fast path frame lookup must always succeed */
> +static const bool in_shutdown = false;
>
>   /* The unseen_objects list contains objects that have been registered
>  but not yet categorized in any way.  The seen_objects list has had
>

I tried the patch but it still failed because the type name "bool"  is not
known.  This patch is the only use of "bool" in the libgcc source code,
which is C, not C++.

Thanks, David

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread David Edelsohn via Gcc-patches

On Fri, Sep 23, 2022 at 9:38 AM Thomas Neumann  wrote:

> > This patch broke bootstrap on AIX and probably other targets.
> >
> > #ifdef ATOMIC_FDE_FAST_PATH
> > #include "unwind-dw2-btree.h"
> >
> > static struct btree registered_frames;
> > static bool in_shutdown;
> > ...
> > #else
> >
> > in_shutdown only is defined for ATOMIC_FDE_FAST_PATH but used in code /
> > asserts not protected by that macro.
> >
> >gcc_assert (in_shutdown || ob);
> >return (void *) ob;
> > }
>
> I am sorry for that, I did not consider that my test machines all use
> the fast path.
>
> I think the problem can be fixed by the trivial patch below, I will
> commit that after I have tested builds both with and without fast path.
>
> Best
>
> Thomas
>
>
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index d237179f4ea..d6e347c5481 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -67,6 +67,8 @@ static void
>   init_object (struct object *ob);
>
>   #else
> +/* Without fast path frame lookup must always succeed */
>
The comment should end with full stop and two spaces.


> +static const bool in_shutdown = false;
>
I'll let Jason or others decide if this is the right solution.  It seems
that in_shutdown also could be declared outside the #ifdef and initialized
as "false".

Thanks, David


>
>   /* The unseen_objects list contains objects that have been registered
>  but not yet categorized in any way.  The seen_objects list has had
>

Re: [PATCH] Restore XCOFF for DWARF on AIX.

2022-09-07 Thread David Edelsohn via Gcc-patches

On Wed, Sep 7, 2022 at 7:45 AM Martin Liška  wrote:

> Hi.
>
> The patch restores DWARF support for AIX target where XCOFF file container
> is used.
> Verified before and after the patch, gcc119 machine (AIX) could not build
> any run-time library,
> now it can.
>
> Ready to be installed?
> Thanks,
> Martin
>
> PR bootstrap/106855
>
> gcc/ChangeLog:
>
> * collect2.cc (scan_prog_file): Restore if XCOFF_DEBUGGING_INFO.
> * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Restore usage of XCOFF_DEBUGGING_INFO.
> * config/rs6000/xcoff.h (XCOFF_DEBUGGING_INFO): Restore.
> * dwarf2asm.cc (XCOFF_DEBUGGING_INFO): Restore support for
>   XCOFF_DEBUGGING_INFO.
> (dw2_asm_output_nstring): Likewise.
> (USE_LINKONCE_INDIRECT): Likewise.
> * dwarf2out.cc (XCOFF_DEBUGGING_INFO): Likewise.
> (HAVE_XCOFF_DWARF_EXTRAS): Likewise.
> (output_fde): Likewise.
> (output_call_frame_info): Likewise.
> (have_macinfo): Likewise.
> (add_AT_loc_list): Likewise.
> (add_AT_view_list): Likewise.
> (output_compilation_unit_header): Likewise.
> (output_pubnames): Likewise.
> (output_aranges): Likewise.
> (output_line_info): Likewise.
> (output_macinfo): Likewise.
> (dwarf2out_finish): Likewise.
> (dwarf2out_early_finish): Likewise.
>

Hi, Martin

Thanks for the quick patch to fix this.  This restores bootstrap, but does
not completely restore functionality.  This patch did not restore the
gcc/configure test for AIX assembler XCOFF feature support that defines
HAVE_XCOFF_DWARF_EXTRAS.  That part of the removal patch also needs to be
reverted.

Thanks, David

Re: [PATCH 1/3] STABS: remove -gstabs and -gxcoff functionality

2022-09-06 Thread David Edelsohn via Gcc-patches

* dwarf2out.cc (XCOFF_DEBUGGING_INFO): Likewise.
(HAVE_XCOFF_DWARF_EXTRAS): Likewise.
(output_fde): Likewise.
(output_call_frame_info): Likewise.
(have_macinfo): Likewise.
(add_AT_loc_list): Likewise.
(add_AT_view_list): Likewise.
(output_compilation_unit_header): Likewise.
(output_pubnames): Likewise.
(output_aranges): Likewise.
(output_line_info): Likewise.
(output_macinfo): Likewise.
(dwarf2out_finish): Likewise.
(dwarf2out_early_finish): Likewise.

These changes are not correct and break AIX bootstrap.

Those changes are not related to stabs support.  We agreed to remove stabs and

XCOFF stabs support, not GCC DWARF debugging support for AIX.

Also

* configure: Regenerate. Likewise.
* configure.ac: Likewise.

does not list that tests for HAVE_XCOFF_DWARF_EXTRAS was removed, so
the ChangeLog was not accurate.

Again, that test is required for AIX is not part of stabs support.

Please revert this change so that AIX can continue to be bootstrapped
and tested, and we can work together to test a corrected patch.

Thanks, David

On Tue, Sep 6, 2022 at 12:31 PM David Edelsohn  wrote:

> I fully support the plan to remove stabs support, but this patch broke
> bootstrap on AIX.  It seems rather bad policy to remove support for a
> feature without ensuring that the removal does not negatively impact the
> targets touched by the patch.  I should have been explicitly copied on
> these patches and I should have been asked to test the patches before they
> were installed, if for no other reason than politeness and consideration.
>
> Thanks, David
>
>
>

Re: [PATCH 1/3] STABS: remove -gstabs and -gxcoff functionality

2022-09-06 Thread David Edelsohn via Gcc-patches

I fully support the plan to remove stabs support, but this patch broke
bootstrap on AIX.  It seems rather bad policy to remove support for a
feature without ensuring that the removal does not negatively impact the
targets touched by the patch.  I should have been explicitly copied on
these patches and I should have been asked to test the patches before they
were installed, if for no other reason than politeness and consideration.

Thanks, David

Re: [PATCH, rs6000] Change insn condition from TARGET_64BIT to TARGET_POWERPC64 for VSX scalar extract/insert instructions

2022-08-26 Thread David Edelsohn via Gcc-patches

On Thu, Aug 25, 2022 at 10:42 PM HAO CHEN GUI  wrote:
>
> Hi David,
>
> On 25/8/2022 下午 10:01, David Edelsohn wrote:
> > On Thu, Aug 25, 2022 at 1:22 AM Kewen.Lin  wrote:
> >>
> >> on 2022/8/25 11:37, HAO CHEN GUI wrote:
> >>> Hi,
> >>>
> >>> On 24/8/2022 下午 1:24, Kewen.Lin wrote:
>  Could you try to test with dg-options "-mdejagnu-cpu=power9 -mpowerpc64" 
>  all the time, but still
>  having that has_arch_ppc64 effective target on aix?
> 
>  I'd expect has_arch_ppc64 check to fail on aix 32bit, the error will not 
>  be a problem (turning
>  into an UNSUPPORTED then)?
> >>>
> >>> I tested it on AIX. "has_arch_ppc64" fails with dg-options 
> >>> "-mdejagnu-cpu=power9 -mpowerpc64" on
> >>> 32-bit AIX environment. It works as we expected.
> >>
> >> Nice, thanks for your time on testing.
> >>
> >>>
> >>> Also I found that AIX and Darwin are skipped for bfp test. So in 
> >>> testcase, it's no need to care
> >>> about them. Not sure if it's intention.
> >>>
> >>> In bfp.exp
> >>>
> >>> # Exit immediately if this isn't a PowerPC target or if the target is
> >>> # aix or Darwin.
> >>> if { (![istarget powerpc*-*-*] && ![istarget rs6000-*-*])
> >>>  || [istarget "powerpc*-*-aix*"]
> >>>  || [istarget "powerpc*-*-darwin*"]  } then {
> >>>   return
> >>> }
> >>
> >> I can't find a hint about why we wanted to disable bfp testing on aix, it 
> >> looks like a overkill to me.
> >>
> >> Could you help to further test if all test cases in this small bucket 
> >> available on aix?
> >>
> >> Maybe it can give us some evidences on why it's intentional or not.
> >>
> >> Hi David & Segher,
> >>
> >> Do you have some insights on this?
> >
> > AIX (and Darwin) are not Linux and not ELF.  There is no support for
> > BPF.  All of the tests fail, so they are skipped.
>
> Thanks so much for your info.
>
> Here are test results on P7 AIX7.1. I tested all scalar-extract-sig-* and 
> scalar-insert-exp-* cases in
> "testsuite/powerpc/bfp" fold. All compiling cases pass except those use 
> __ieee128. The runnable cases
> fail as expected. p9vector is not supported on P7 servers.
>
> So the __ieee128 blocks Binary floating-point on AIX?

AIX does not support IEEE 128 bit  floating point, only the IBM
double-double format.  Also GCC for AIX does not recognize the
attributes and options for other formats, although there are some
patches from Mike to address that.

Thanks, David

Re: [PATCH, rs6000] Change insn condition from TARGET_64BIT to TARGET_POWERPC64 for VSX scalar extract/insert instructions

2022-08-25 Thread David Edelsohn via Gcc-patches

On Thu, Aug 25, 2022 at 1:22 AM Kewen.Lin  wrote:
>
> on 2022/8/25 11:37, HAO CHEN GUI wrote:
> > Hi,
> >
> > On 24/8/2022 下午 1:24, Kewen.Lin wrote:
> >> Could you try to test with dg-options "-mdejagnu-cpu=power9 -mpowerpc64" 
> >> all the time, but still
> >> having that has_arch_ppc64 effective target on aix?
> >>
> >> I'd expect has_arch_ppc64 check to fail on aix 32bit, the error will not 
> >> be a problem (turning
> >> into an UNSUPPORTED then)?
> >
> > I tested it on AIX. "has_arch_ppc64" fails with dg-options 
> > "-mdejagnu-cpu=power9 -mpowerpc64" on
> > 32-bit AIX environment. It works as we expected.
>
> Nice, thanks for your time on testing.
>
> >
> > Also I found that AIX and Darwin are skipped for bfp test. So in testcase, 
> > it's no need to care
> > about them. Not sure if it's intention.
> >
> > In bfp.exp
> >
> > # Exit immediately if this isn't a PowerPC target or if the target is
> > # aix or Darwin.
> > if { (![istarget powerpc*-*-*] && ![istarget rs6000-*-*])
> >  || [istarget "powerpc*-*-aix*"]
> >  || [istarget "powerpc*-*-darwin*"]  } then {
> >   return
> > }
>
> I can't find a hint about why we wanted to disable bfp testing on aix, it 
> looks like a overkill to me.
>
> Could you help to further test if all test cases in this small bucket 
> available on aix?
>
> Maybe it can give us some evidences on why it's intentional or not.
>
> Hi David & Segher,
>
> Do you have some insights on this?

AIX (and Darwin) are not Linux and not ELF.  There is no support for
BPF.  All of the tests fail, so they are skipped.

Thanks, David

Re: [PATCH v4, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-05-26 Thread David Edelsohn via Gcc-patches

On Thu, May 26, 2022 at 1:52 AM Kewen.Lin  wrote:
>
> Hi Haochen,
>
> on 2022/5/26 13:30, HAO CHEN GUI wrote:
> > Kewen,
> >   Thanks so much for your advice. Just one question about effective-target.
> >
> >   For the test cases, it needs both power10_ok and int128 support. I saw 
> > some
> > existing test cases have these two checks as well. But I wonder if 
> > power10_ok
> > already covers int128 on powerpc targets? Can we save one check then?
> >
>
> Good question, IMHO the checks are orthogonal, it's doable to disable int128
> support by hacking the compiler, the int128 effective-target check then fails
> due to missing defined __SIZEOF_INT128__, but power10_ok check isn't able to
> catch that, the test case could end up with possible unexpected result without
> the explicit int128 check.
>
> To me, the int128 check is to ensure int128 type is available and the
> power10_ok check is to ensure the power10 specific instructions are supported.

Does Power10 fully support int128 in 32 bit mode?  I would expect no,
so the additional test is required.

Thanks, David

>
> BR,
> Kewen

Re: [PATCH] testsuite: Add test case for pack/unpack bifs at soft-float [PR105334]

2022-04-27 Thread David Edelsohn via Gcc-patches

On Wed, Apr 27, 2022 at 8:22 AM Segher Boessenkool
 wrote:
>
> Hi!
>
> Thank you for doing this testcase.
>
> On Wed, Apr 27, 2022 at 06:29:07PM +0800, Kewen.Lin wrote:
> > As discussed in PR105334, this patch is to add the test coverage for
> > the two recent fixes r12-8091 and r12-8226 from Segher, aix is skipped
> > since it takes soft-float and long-double-128 incompatible.
> >
> > I noticed the referred test case pack02.c skips if powerpc*-*-darwin*,
> > but it's for do-run and I didn't have one machine to test that triple,
> > so I didn't add that and hope it's unnecessary.
>
> That is the best thing to do if you aren't sure, the Darwin people are
> in a much better position to decide this themselves.  Cc:ed Iain.
>
> > Tested on powerpc64-linux-gnu P8, powerpc64le-linux-gnu P9/P10 and
> > powerpc-ibm-aix7.2.0.0.
> >
> > Is ok for trunk?
>
> Okay for trunk, thanks!  Please see if David has some more input?
>
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr105334.c
> > @@ -0,0 +1,31 @@
> > +/* Skip this on aix, since it takes soft-float and long-double-128
> > +   incompatible and warns it.  */
> > +/* { dg-skip-if "aix long-double-128 soft-float" { powerpc*-*-aix* } } */
> > +/* { dg-options "-mlong-double-128 -msoft-float" } */
>
> Maybe you just want to add "-w" and not skip on AIX, if the test
> generates the correct code anyway?  Either way works for me though.

The additional libgcc functions for ibm128 weren't added to the AIX
configuration.  And it' not clear that soft-float on AIX is
interesting anyway.

Thanks, David

Re: [PATCH] ppc: testsuite: require target effectively [PR104253]

2022-04-11 Thread David Edelsohn via Gcc-patches

On Mon, Apr 11, 2022 at 10:53 AM Alexandre Oliva  wrote:
>
>
> The testcase was missing dg- before require-effective-target.
>
> While at that, I'm also pruning the excess-error warning I got when
> the test failed to be disabled because of the above.  I suppose it
> might be useful for some target variants.
>
> Tested with target powerpc64-wrs-vxworks7r2.  Ok to install?  Trunk?
> gcc-11?  gcc-10?

Okay.  This probably counts as obvious.

Thanks, David

>
>
> for gcc/testsuite/ChangeLog
>
> PR target/104253
> * gcc.target/powerpc/pr104253.c: Add missing dg- before
> require-effective-target.  Prune warning about -mfloat128
> possibly not being fully supported.
> ---
>  gcc/testsuite/gcc.target/powerpc/pr104253.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr104253.c 
> b/gcc/testsuite/gcc.target/powerpc/pr104253.c
> index 02049cc978f05..e5f9499b7c881 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr104253.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr104253.c
> @@ -6,8 +6,9 @@
>   */
>
>  /* { dg-do run } */
> -/* { require-effective-target ppc_float128_sw } */
> +/* { dg-require-effective-target ppc_float128_sw } */
>  /* { dg-options "-O2 -mvsx -mfloat128" } */
> +/* { dg-prune-output ".-mfloat128. option may not be fully supported" } */
>
>  /*
>   * PR target/104253
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about

Re: [PATCH] rs6000: Skip overload instances with NULL fntype [PR104967]

2022-03-23 Thread David Edelsohn via Gcc-patches

On Wed, Mar 23, 2022 at 5:33 AM Kewen.Lin  wrote:
>
> Hi,
>
> As shown in PR104967, for some overload built-in function instance,
> if it requires a date type which isn't defined on the target, its
> fntype would be initialized as NULL.  This patch is to consider
> this possibility in function find_instance to avoid ICE.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?

Okay.

Thanks, David

>
> BR,
> Kewen
>
> PR target/104967
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-c.cc (find_instance): Skip instances with null
> function types.
>
> ---
>  gcc/config/rs6000/rs6000-c.cc | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 15251efc209..b0f9fce4b54 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -1678,6 +1678,10 @@ find_instance (bool *unsupported_builtin, ovlddata 
> **instance,
>
>ovlddata *inst = *instance;
>gcc_assert (inst != NULL);
> +  /* It is possible for an instance to require a data type that isn't
> + defined on this target, in which case inst->fntype will be NULL.  */
> +  if (!inst->fntype)
> +return error_mark_node;
>tree fntype = rs6000_builtin_info[inst->bifid].fntype;
>tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
>tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
> --
> 2.25.1

Re: [PATCH]rs6000: avoid peeking eof after __vector keyword

2022-03-21 Thread David Edelsohn via Gcc-patches

On Mon, Mar 21, 2022 at 5:13 AM Jiufu Guo  wrote:
>
> Hi!
>
> There is a rare corner case: where __vector is followed only with ";"
> and near the end of the file.
>
> Like the case in PR101168:
> using vdbl =  __vector double;
> #define BREAK 1
>
> For this case, "__vector double" is not followed by a PP_NAME, it is
> followed by CPP_SEMICOLON and then EOF.  In this case, there is no
> more tokens in the file.  Then, do not need to continue to parse the
> file.
>
> This patch pass bootstrap and regtest on ppc64 and ppc64le.

This is okay. Maybe a tweak to the comment, see below.

Thanks, David

>
>
> BR,
> Jiufu
>
>
> PR preprocessor/101168
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-c.cc (rs6000_macro_to_expand):
> Avoid empty identifier.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/powerpc/pr101168.C: New test.
>
>
> ---
>  gcc/config/rs6000/rs6000-c.cc   | 4 +++-
>  gcc/testsuite/g++.target/powerpc/pr101168.C | 6 ++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr101168.C
>
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 3b62b499df2..f8cc7bad812 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -282,7 +282,9 @@ rs6000_macro_to_expand (cpp_reader *pfile, const 
> cpp_token *tok)
> expand_bool_pixel = __pixel_keyword;
>   else if (ident == C_CPP_HASHNODE (__bool_keyword))
> expand_bool_pixel = __bool_keyword;
> - else
> +
> + /* If it needs to check tokens continue.  */

Maybe /* If there are more tokens to check.  */ ?

> + else if (ident)
> {
>   /* Try two tokens down, too.  */
>   do
> diff --git a/gcc/testsuite/g++.target/powerpc/pr101168.C 
> b/gcc/testsuite/g++.target/powerpc/pr101168.C
> new file mode 100644
> index 000..284e77fdc88
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr101168.C
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec" } */
> +
> +using vdbl =  __vector double;
> +#define BREAK 1
> --
> 2.25.1
>

Re: [PATCH] libgcc: allow building float128 libraries on FreeBSD

2022-03-03 Thread David Edelsohn via Gcc-patches

I don't have any objection, but the patch is FreeBSD-specific.  You
are sending the patch from the FreeBSD organization, but I don't know
the authority structure within the organization.   Andreas Tobler is
the FreeBSD maintainer for GCC, but I don't know his current status.

Thanks, David

On Sun, Feb 20, 2022 at 6:38 PM  wrote:
>
> From: Piotr Kubaj 
>
> While FreeBSD currently uses 64-bit long double, there should be no
> problem with adding support for float128.
>
> Signed-off-by: Piotr Kubaj 
> ---
>  libgcc/configure| 22 ++
>  libgcc/configure.ac | 11 +++
>  2 files changed, 33 insertions(+)
>
> diff --git a/libgcc/configure b/libgcc/configure
> index 4919a56f518..334d20d1fb1 100755
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -5300,6 +5300,28 @@ fi
>  { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $libgcc_cv_powerpc_3_1_float128_hw" >&5
>  $as_echo "$libgcc_cv_powerpc_3_1_float128_hw" >&6; }
>CFLAGS="$saved_CFLAGS"
> +;;
> +powerpc*-*-freebsd*)
> +  saved_CFLAGS="$CFLAGS"
> +  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PowerPC ISA 2.06 to 
> build __float128 libraries" >&5
> +$as_echo_n "checking for PowerPC ISA 2.06 to build __float128 libraries... " 
> >&6; }
> +if ${libgcc_cv_powerpc_float128+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +vector double dadd (vector double a, vector double b) { return a + b; }
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +  libgcc_cv_powerpc_float128=yes
> +else
> +  libgcc_cv_powerpc_float128=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $libgcc_cv_powerpc_float128" >&5
> +$as_echo "$libgcc_cv_powerpc_float128" >&6; }
>  esac
>
>  # Collect host-machine-specific information.
> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> index 13a80b2551b..99ec5d405a4 100644
> --- a/libgcc/configure.ac
> +++ b/libgcc/configure.ac
> @@ -483,6 +483,17 @@ powerpc*-*-linux*)
>  [libgcc_cv_powerpc_3_1_float128_hw=yes],
>  [libgcc_cv_powerpc_3_1_float128_hw=no])])
>CFLAGS="$saved_CFLAGS"
> +;;
> +powerpc*-*-freebsd*)
> +  saved_CFLAGS="$CFLAGS"
> +  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
> +  AC_CACHE_CHECK([for PowerPC ISA 2.06 to build __float128 libraries],
> + [libgcc_cv_powerpc_float128],
> + [AC_COMPILE_IFELSE(
> +[AC_LANG_SOURCE([vector double dadd (vector double a, vector double b) { 
> return a + b; }])],
> +[libgcc_cv_powerpc_float128=yes],
> +[libgcc_cv_powerpc_float128=no])])
> +  CFLAGS="$saved_CFLAGS"
>  esac
>
>  # Collect host-machine-specific information.
> --
> 2.35.1
>

Re: [PATCH v3, rs6000] Enable absolute jump table for PPC AIX and Linux

2022-03-01 Thread David Edelsohn via Gcc-patches

On Tue, Mar 1, 2022 at 12:41 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch enables absolute jump tables on PPC AIX and Linux. For AIX, the 
> jump
> table is placed in data section. For Linux, it is placed in RELRO section when
> relocation is needed.
>
>Bootstrapped and tested on AIX,Linux BE and LE with no regressions. Is 
> this okay for trunk?
> Any recommendations? Thanks a lot.

Hi, Haochen

Thanks for working on this patch and for revising it.  The patch looks
okay now, but it is not appropriate for the current bug fixing stage
in GCC development -- it needs to wait for GCC Stage 1 later this
Spring.

Also the current code uses the default readonly data section for the
jump table.  The new rs6000_xcoff_function_rodata_section follows the
existing simple behavior, which is correct, but it should support
named data sections instead of placing everything in the default
".data" section, similar to the default ELF code.  I will work on
that.

Thanks, David

>
> ChangeLog
> 2022-03-01 Haochen Gui 
>
> gcc/
> * config/rs6000/aix.h (JUMP_TABLES_IN_TEXT_SECTION): Define.
> * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Likewise.
> * config/rs6000/rs6000.cc (rs6000_option_override_internal): Enable
> absolute jump tables for AIX and Linux.
> (rs6000_xcoff_function_rodata_section): Implement.
> * config/rs6000/xcoff.h (TARGET_ASM_FUNCTION_RODATA_SECTION): Define.
>
> gcc/testsuite
> * gcc.target/powerpc/absolute-jump-table-section.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
> index ad3238bf09a..cf0708aa08b 100644
> --- a/gcc/config/rs6000/aix.h
> +++ b/gcc/config/rs6000/aix.h
> @@ -251,9 +251,9 @@
>  #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \
>(!(FIRST) ? PAD_UPWARD : targetm.calls.function_arg_padding (MODE, TYPE))
>
> -/* Indicate that jump tables go in the text section.  */
> +/* Indicate that jump tables go in the data section.  */
>
> -#define JUMP_TABLES_IN_TEXT_SECTION 1
> +#define JUMP_TABLES_IN_TEXT_SECTION 0
>
>  /* Define any extra SPECS that the compiler needs to generate.  */
>  #undef  SUBTARGET_EXTRA_SPECS
> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
> index b2a7afabc73..440e0fde52b 100644
> --- a/gcc/config/rs6000/linux64.h
> +++ b/gcc/config/rs6000/linux64.h
> @@ -237,9 +237,9 @@ extern int dot_symbols;
>  #define TARGET_ALIGN_NATURAL 1
>  #endif
>
> -/* Indicate that jump tables go in the text section.  */
> +/* Indicate that jump tables go in the rodata or RELRO section.  */
>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> +#define JUMP_TABLES_IN_TEXT_SECTION 0
>
>  /* The linux ppc64 ABI isn't explicit on whether aggregates smaller
> than a doubleword should be padded upward or downward.  You could
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index bc3ef0721a4..07f78d3a05b 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4954,6 +4954,10 @@ rs6000_option_override_internal (bool global_init_p)
>  warning (0, "%qs is deprecated and not recommended in any circumstances",
>  "-mno-speculate-indirect-jumps");
>
> +  /* Enable absolute jump tables for AIX and Linux.  */
> +  if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
> +rs6000_relative_jumptables = 0;
> +
>return ret;
>  }
>
> @@ -21419,6 +21423,16 @@ rs6000_xcoff_visibility (tree decl)
>enum symbol_visibility vis = DECL_VISIBILITY (decl);
>return visibility_types[vis];
>  }
> +
> +static section *
> +rs6000_xcoff_function_rodata_section (tree decl ATTRIBUTE_UNUSED,
> + bool relocatable)
> +{
> +  if (relocatable)
> +return data_section;
> +  else
> +return readonly_data_section;
> +}
>  #endif
>
>
> diff --git a/gcc/config/rs6000/xcoff.h b/gcc/config/rs6000/xcoff.h
> index cd0f99cb9c6..0dacd86eed9 100644
> --- a/gcc/config/rs6000/xcoff.h
> +++ b/gcc/config/rs6000/xcoff.h
> @@ -98,7 +98,7 @@
>  #define TARGET_ASM_SELECT_SECTION  rs6000_xcoff_select_section
>  #define TARGET_ASM_SELECT_RTX_SECTION  rs6000_xcoff_select_rtx_section
>  #define TARGET_ASM_UNIQUE_SECTION  rs6000_xcoff_unique_section
> -#define TARGET_ASM_FUNCTION_RODATA_SECTION default_no_function_rodata_section
> +#define TARGET_ASM_FUNCTION_RODATA_SECTION 
> rs6000_xcoff_function_rodata_section
>  #define TARGET_STRIP_NAME_ENCODING  rs6000_xcoff_strip_name_encoding
>  #define TARGET_SECTION_TYPE_FLAGS  rs6000_xcoff_section_type_flags
>  #ifdef HAVE_AS_TLS
> diff --git a/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c 
> b/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c
> new file mode 100644
> index 000..688a6f42836
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile { target { *-*-aix* || *-*-linux* } } } */
> +/* {

Re: [PATCH, 11 backport] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-02-10 Thread David Edelsohn via Gcc-patches

On Thu, Feb 10, 2022 at 4:17 PM Bill Schmidt  wrote:
>
> Hi!
>
> On 2/10/22 2:50 PM, Segher Boessenkool wrote:
> > On Thu, Feb 10, 2022 at 12:22:28PM -0600, Bill Schmidt wrote:
> >> This is a backport from mainline 3f30f2d1dbb3228b8468b26239fe60c2974ce2ac.
> >> These built-ins were misimplemented as always having big-endian semantics.
> >>
> >> Because the built-in infrastructure has changed, the modifications to the
> >> source are different but achieve the same purpose.  The modifications to
> >> the test suite are identical (after fixing the issue with -mbig that David
> >> pointed out with the original patch).
> >>  /* 1 argument vector functions added in ISA 3.0 (power9). */
> >> -BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi",  CONST,  
> >> vclzlsbb_v16qi)
> >> -BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",CONST,  vclzlsbb_v8hi)
> >> -BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",CONST,  vclzlsbb_v4si)
> >> -BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi",  CONST,  
> >> vctzlsbb_v16qi)
> >> -BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",CONST,  vctzlsbb_v8hi)
> >> -BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",CONST,  vctzlsbb_v4si)
> >> +BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi",  CONST,  
> >> vctzlsbb_v16qi)
> >> +BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",CONST,  vctzlsbb_v8hi)
> >> +BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",CONST,  vctzlsbb_v4si)
> >> +BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi",  CONST,  
> >> vclzlsbb_v16qi)
> >> +BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",CONST,  vclzlsbb_v8hi)
> >> +BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",CONST,  vclzlsbb_v4si)
> > Please change the default to be equal to the builtin name, so, the BE
> > version.  We do that everywhere else as well, and it makes a lot more
> > sense (since everything in Power has BE numbering).
> >
> > The trunk version has this correct afaics?
>
> No, trunk has this, for example:
>
>   const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
> VCLZLSBB_V16QI vctzlsbb_v16qi {endian}
>
> So the backport matches what is on trunk.
>
> Throughout the new builtin infrastructure, the defaults are set for
> little-endian, and the "endian" flag changes behavior for big-endian.
>
> >
> >> --- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
> >> @@ -1,6 +1,7 @@
> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
> > (Delete the redundant target clause when modifying any testcase, please).
>
> Okay.
> >
> >>  /* { dg-require-effective-target powerpc_p9vector_ok } */
> >>  /* { dg-options "-mdejagnu-cpu=power9" } */
> >> +/* { dg-additional-options "-mbig" { target powerpc64le-*-* } } */
> > You don't need the target clause, if it already is BE by default it does
> > not do anything to add it redundantly.
> >
> > But this is wrong anyway: the name of the target triple does not say
> > whether we are BE or LE.  Instead you should use the be or le selectors.
> > But again, just add -mbig always.
>
> This was added by David Edelsohn to the trunk version of the patch, because
> -mbig actually is not supported on all subtargets.  (I found that quite
> surprising also.)  Apparently this doesn't work on AIX, for example.  But
> -mlittle works everywhere.  Go figure.

-mbig/-mlittle only applies to Linux, not AIX and not Darwin.

I changed the BE testcases to add "-mbig" for little endian default
targets because the compiler implicitly should be operating in big
endian mode for other targets and the testcases should succeed.

For the LE testcases, I changed the target selector to
"powrpc*-*-linux*" because that is the only PowerPC target that can
operate as little endian.  I could not find a generic "le" target
selector.  powerpc*-*-linux* understands "-mlittle", so I left the
dg-options clause because there is no need to separate out "-mlittle"
for that subset of PowerPC targets.

Thanks, David

>
> That's something that should be fixed, I guess, but it's orthogonal
> to this patch.
>
> Thanks!
> Bill
>
> >
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c
> >> @@ -0,0 +1,15 @@
> >> +/* { dg-do compile { target { powerpc*-*-* } } } */
> >> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> >> +/* { dg-options "-mdejagnu-cpu=power9 -mlittle" } */
> > And here you do it correctly :-)
> >
> > Okay with those fixes (all happen a few times).  Thanks!
> >
> >
> > Segher

Re: [PATCH] rs6000: Fix up vspltis_shifted [PR102140]

2022-02-08 Thread David Edelsohn via Gcc-patches

On Tue, Feb 8, 2022 at 12:25 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs, because
> (const_vector:V4SI [
> (const_int 0 [0]) repeated x3
> (const_int -2147483648 [0x8000])
> ])
> is recognized as valid easy_vector_constant in between split1 pass and
> end of RA.
> The problem is that such constants need to be split, and the only
> splitter for that is:
> (define_split
>   [(set (match_operand:VM 0 "altivec_register_operand")
> (match_operand:VM 1 "easy_vector_constant_vsldoi"))]
>   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
> There is only a single splitting pass before RA, so after that finishes,
> if something gets matched in between that and end of RA (after that
> can_create_pseudo_p () would be no longer true), it will never be
> successfully split and we ICE at final.cc time or earlier.
>
> The i386 backend (and a few others) already use
> (cfun->curr_properties & PROP_rtl_split_insns)
> as a test for split1 pass finished, so that some insns that should be split
> during split1 and shouldn't be matched afterwards are properly guarded.
>
> So, the following patch does that for vspltis_shifted too.
>
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Okay.

Thanks, David

>
> 2022-02-08  Jakub Jelinek  
>
> PR target/102140
> * config/rs6000/rs6000.cc (vspltis_shifted): Return false also if
> split1 pass has finished already.
>
> * gcc.dg/pr102140.c: New test.
>
> --- gcc/config/rs6000/rs6000.cc.jj  2022-02-07 17:38:20.873123915 +0100
> +++ gcc/config/rs6000/rs6000.cc 2022-02-08 14:15:31.619505410 +0100
> @@ -6257,8 +6257,11 @@ vspltis_shifted (rtx op)
>  return false;
>
>/* We need to create pseudo registers to do the shift, so don't recognize
> - shift vector constants after reload.  */
> -  if (!can_create_pseudo_p ())
> + shift vector constants after reload.  Don't match it even before RA
> + after split1 is done, because there won't be further splitting pass
> + before RA to do the splitting.  */
> +  if (!can_create_pseudo_p ()
> +  || (cfun->curr_properties & PROP_rtl_split_insns))
>  return false;
>
>nunits = GET_MODE_NUNITS (mode);
> --- gcc/testsuite/gcc.dg/pr102140.c.jj  2022-02-08 14:24:25.839041166 +0100
> +++ gcc/testsuite/gcc.dg/pr102140.c 2022-02-08 14:24:03.038359745 +0100
> @@ -0,0 +1,23 @@
> +/* PR target/102140 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-Og -fipa-cp -fno-tree-ccp -fno-tree-ter -Wno-psabi" } */
> +
> +typedef int __attribute__((__vector_size__ (64))) U;
> +typedef __int128 __attribute__((__vector_size__ (64))) V;
> +
> +int a, b;
> +
> +static void
> +bar (char c, V v)
> +{
> +  v *= c;
> +  U u = a + (U) v;
> +  (union { U b; }) { u };
> +  b = 0;
> +}
> +
> +void
> +foo (void)
> +{
> +  bar (1, (V){((__int128) 9223372036854775808ULL) << 64});
> +}
>
> Jakub
>

Re: [PATCH] testsuite: Fix up testsuite/gcc.c-torture/execute/builtins/lib/chk.c for powerpc [PR104380]

2022-02-07 Thread David Edelsohn via Gcc-patches

On Mon, Feb 7, 2022 at 8:20 AM Jakub Jelinek  wrote:
>
> On Fri, Feb 04, 2022 at 12:00:57PM -0500, David Edelsohn via Gcc-patches 
> wrote:
> > > The following testcase FAILs when configured with
> > > --with-long-double-format=ieee .  Only happens in the -std=c* modes, not 
> > > the
> > > GNU modes; while the glibc headers have __asm redirects of
> > > vsnprintf and __vsnprinf_chk to __vsnprintfieee128 and
> > > __vsnprintf_chkieee128, the vsnprintf fortification extern inline 
> > > gnu_inline
> > > always_inline wrapper calls __builtin_vsnprintf_chk and we actually emit
> > > a call to __vsnprinf_chk (i.e. with IBM extended long double) instead of
> > > __vsnprintf_chkieee128.
> > >
> > > rs6000_mangle_decl_assembler_name already had cases for *printf and 
> > > *scanf,
> > > so this just adds another case for *printf_chk.  *scanf_chk doesn't exist.
> > > __ prefixing isn't done because *printf_chk already starts with __.
> > >
> > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk?
> >
> > Okay.
>
> Unfortunately, while I've tested the testcase also with -mabi=ieeelongdouble
> by hand, the full bootstrap/regtest was on GCCFarm where glibc is too old
> to test with --with-long-double-format=ieee.
> I've done full bootstrap/regtest with that option during the weekend and
> the patch regressed:
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O1
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -Og -g
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -Os
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O1
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Og -g
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Os
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O1
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -Og -g
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -Os
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O1
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -Og -g
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -Os
>
> The problem is that the execute/builtins/ testsuite wants to override some
> of the library functions and with the change we (correctly) call
> __*printf_chkieee128 and so lib/chk.c is no longer called but the glibc
> APIs are.
>
> The following patch fixes it.
>
> Tested on powerpc64le-linux, ok for trunk?

Okay.

Tha

Re: [PATCH] rs6000: Fix up -D_FORTIFY_SOURCE* with -mabi=ieeelongdouble [PR104380]

2022-02-04 Thread David Edelsohn via Gcc-patches

On Fri, Feb 4, 2022 at 11:58 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase FAILs when configured with
> --with-long-double-format=ieee .  Only happens in the -std=c* modes, not the
> GNU modes; while the glibc headers have __asm redirects of
> vsnprintf and __vsnprinf_chk to __vsnprintfieee128 and
> __vsnprintf_chkieee128, the vsnprintf fortification extern inline gnu_inline
> always_inline wrapper calls __builtin_vsnprintf_chk and we actually emit
> a call to __vsnprinf_chk (i.e. with IBM extended long double) instead of
> __vsnprintf_chkieee128.
>
> rs6000_mangle_decl_assembler_name already had cases for *printf and *scanf,
> so this just adds another case for *printf_chk.  *scanf_chk doesn't exist.
> __ prefixing isn't done because *printf_chk already starts with __.
>
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Okay.

Thanks, David

>
> 2022-02-04  Jakub Jelinek  
>
> PR target/104380
> * config/rs6000/rs6000.cc (rs6000_mangle_decl_assembler_name): Also
> adjust mangling of __builtin*printf_chk.
>
> * gcc.dg/pr104380.c: New test.
>
> --- gcc/config/rs6000/rs6000.cc.jj  2022-01-28 10:01:41.224837656 +0100
> +++ gcc/config/rs6000/rs6000.cc 2022-02-04 12:31:27.651715472 +0100
> @@ -28228,6 +28228,7 @@ rs6000_mangle_decl_assembler_name (tree
> {
>   size_t printf_len = strlen ("printf");
>   size_t scanf_len = strlen ("scanf");
> + size_t printf_chk_len = strlen ("printf_chk");
>
>   if (len >= printf_len
>   && strcmp (name + len - printf_len, "printf") == 0)
> @@ -28237,6 +28238,10 @@ rs6000_mangle_decl_assembler_name (tree
>&& strcmp (name + len - scanf_len, "scanf") == 0)
> newname = xasprintf ("__isoc99_%sieee128", name);
>
> + else if (len >= printf_chk_len
> +  && strcmp (name + len - printf_chk_len, "printf_chk") == 0)
> +   newname = xasprintf ("%sieee128", name);
> +
>   else if (name[len - 1] == 'l')
> {
>   bool uses_ieee128_p = false;
> --- gcc/testsuite/gcc.dg/pr104380.c.jj  2022-02-04 12:51:50.152643364 +0100
> +++ gcc/testsuite/gcc.dg/pr104380.c 2022-02-04 12:53:25.092317741 +0100
> @@ -0,0 +1,32 @@
> +/* PR target/104380 */
> +/* This test needs runtime that provides __*_chk functions.  */
> +/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */
> +/* { dg-options "-O2 -std=c99" } */
> +
> +#define FORTIFY_SOURCE 2
> +#include 
> +#include 
> +
> +static char buf[4096];
> +static char gfmt[] = "%Lg";
> +
> +static int __attribute__ ((noipa))
> +foo (char *str, const char *fmt, ...)
> +{
> +  int ret;
> +  va_list ap;
> +  va_start (ap, fmt);
> +  ret = vsnprintf (str, 4096, fmt, ap);
> +  va_end (ap);
> +  return ret;
> +}
> +
> +int
> +main ()
> +{
> +  long double dval = 128.0L;
> +  int ret = foo (buf, gfmt, dval);
> +  if (ret != 3 || __builtin_strcmp (buf, "128") != 0)
> +__builtin_abort ();
> +  return 0;
> +}
>
> Jakub
>

Re: [PATCH 0/3] Enable pointer_query caching throughout.

2022-02-03 Thread David Edelsohn via Gcc-patches

On Thu, Feb 3, 2022 at 6:09 PM Martin Sebor  wrote:
>
> On 2/3/22 15:56, David Edelsohn wrote:
> > This series of patches has exploded memory usage and I can no longer
> > bootstrap GCC on AIX.
> >
> > As with the Ranger problem exposed by Aldy's patch last September,
> > something is not freeing memory.
> >
> > Even on systems where GCC still bootstrap, this excessive memory usage
> > severely damages GCC compile performance.
>
> Does the change below by any chance make a difference?  (It's just
> a hunch, I haven't tested it beyond quickly building stage 1 and
> running a few tests.)

Hi, Martin

Thanks for the quick response.  Yes, I am able to restore bootstrap on
AIX (32 bit) with the change.

Thanks, David

>
> Martin
>
>
> diff --git a/gcc/pointer-query.h b/gcc/pointer-query.h
> index 4c725eeaf34..801a240c38d 100644
> --- a/gcc/pointer-query.h
> +++ b/gcc/pointer-query.h
> @@ -164,9 +164,9 @@ class pointer_query
> struct cache_type
> {
>   /* 1-based indices into cache.  */
> -vec indices;
> +auto_vec indices;
>   /* The cache itself.  */
> -vec access_refs;
> +auto_vec access_refs;
> };
>
>   public:

Re: [PATCH 0/3] Enable pointer_query caching throughout.

2022-02-03 Thread David Edelsohn via Gcc-patches

This series of patches has exploded memory usage and I can no longer
bootstrap GCC on AIX.

As with the Ranger problem exposed by Aldy's patch last September,
something is not freeing memory.

Even on systems where GCC still bootstrap, this excessive memory usage
severely damages GCC compile performance.

Thanks, David

Re: [PATCH] rs6000: Fix up #include or [PR104239]

2022-01-26 Thread David Edelsohn via Gcc-patches

On Wed, Jan 26, 2022 at 3:45 PM Jakub Jelinek  wrote:
>
> Hi!
>
> r12-4717-g7d37abedf58d66 added immintrin.h and x86gprintrin.h headers
> to rs6000, these headers are on x86 standalone headers that various
> programs include directly rather than including them through
> .
> Unfortunately, for that change the bmiintrin.h and bmi2intrin.h
> headers haven't been adjusted, so the effect is that if one includes them
> (without including also x86intrin.h first) #error will trigger.
> Furthermore, when including such headers conditionally as some real-world
> packages do, this means a regression.
>
> The following patch fixes it and matches what the x86 bmi{,2}intrin.h
> headers do.
>
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Okay.

Thanks for catching this.

- David

>
> 2022-01-26  Jakub Jelinek  
>
> PR target/104239
> * config/rs6000/bmiintrin.h: Test _X86GPRINTRIN_H_INCLUDED instead of
> _X86INTRIN_H_INCLUDED and adjust #error wording.
> * config/rs6000/bmi2intrin.h: Likewise.
>
> * gcc.target/powerpc/pr104239-1.c: New test.
> * gcc.target/powerpc/pr104239-2.c: New test.
>
> --- gcc/config/rs6000/bmiintrin.h.jj2022-01-11 23:11:21.936296534 +0100
> +++ gcc/config/rs6000/bmiintrin.h   2022-01-26 13:35:08.705945170 +0100
> @@ -29,8 +29,8 @@
> standard C or GNU C extensions, which are more portable and better
> optimized across multiple targets.  */
>
> -#if !defined _X86INTRIN_H_INCLUDED
> -# error "Never use  directly; include  instead."
> +#if !defined _X86GPRINTRIN_H_INCLUDED
> +# error "Never use  directly; include  instead."
>  #endif
>
>  #ifndef _BMIINTRIN_H_INCLUDED
> --- gcc/config/rs6000/bmi2intrin.h.jj   2022-01-11 23:11:21.936296534 +0100
> +++ gcc/config/rs6000/bmi2intrin.h  2022-01-26 13:34:53.373162122 +0100
> @@ -29,8 +29,8 @@
> standard C or GNU C extensions, which are more portable and better
> optimized across multiple targets.  */
>
> -#if !defined _X86INTRIN_H_INCLUDED
> -# error "Never use  directly; include  instead."
> +#if !defined _X86GPRINTRIN_H_INCLUDED
> +# error "Never use  directly; include  
> instead."
>  #endif
>
>  #ifndef _BMI2INTRIN_H_INCLUDED
> --- gcc/testsuite/gcc.target/powerpc/pr104239-1.c.jj2022-01-26 
> 13:42:34.103643030 +0100
> +++ gcc/testsuite/gcc.target/powerpc/pr104239-1.c   2022-01-26 
> 13:42:23.101798701 +0100
> @@ -0,0 +1,9 @@
> +/* PR target/104239 */
> +/* { dg-do compile } */
> +/* { dg-options "-DNO_WARN_X86_INTRINSICS" } */
> +
> +#if __has_include()
> +#include 
> +#endif
> +
> +int i;
> --- gcc/testsuite/gcc.target/powerpc/pr104239-2.c.jj2022-01-26 
> 13:42:42.279527345 +0100
> +++ gcc/testsuite/gcc.target/powerpc/pr104239-2.c   2022-01-26 
> 13:42:23.101798701 +0100
> @@ -0,0 +1,9 @@
> +/* PR target/104239 */
> +/* { dg-do compile } */
> +/* { dg-options "-DNO_WARN_X86_INTRINSICS" } */
> +
> +#if __has_include()
> +#include 
> +#endif
> +
> +int i;
>
> Jakub
>

Re: [PATCH] Disable -fsplit-stack support on non-glibc targets

2022-01-25 Thread David Edelsohn via Gcc-patches

This patch broke bootstrap on AIX.  It may have broken Darwin.  I have
applied the following patch.  AIX doesn't need to distinguish between
different Linux libc implementations.

Bootstrapped on powerpc-ibm-aix7.2.3.0

Thanks, David

aix: AIX is not GLIBC.

A recent patch added tests for OPTION_GLIBC that is defined in
linux.h and linux64.h.  This broke bootstrap for non-Linux rs6000
configurations.  This patch defines OPTION_GLIBC as 0.

* config/rs6000/aix.h (OPTION_GLIBC): Define as 0.

diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
index ad3238bf09a..eb7a0c09f72 100644
--- a/gcc/config/rs6000/aix.h
+++ b/gcc/config/rs6000/aix.h
@@ -23,6 +23,7 @@
 #define DEFAULT_ABI ABI_AIX
 #undef  TARGET_AIX
 #define TARGET_AIX 1
+#define OPTION_GLIBC 0

Re: Ping: [PATCH] PR 103763, Fix fold-vec-splat-floatdouble on power10.

2022-01-21 Thread David Edelsohn via Gcc-patches

On Fri, Jan 21, 2022 at 2:56 PM Michael Meissner  wrote:
>
> Ping patch
> https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587924.html
>
> | Date: Fri, 7 Jan 2022 16:05:53 -0500
> | From: Michael Meissner 
> | Subject: [PATCH] PR 103763, Fix fold-vec-splat-floatdouble on power10.
> | Message-ID: 

This patch is okay.

Thanks, David

Re: [PATCH v2, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-20 Thread David Edelsohn via Gcc-patches

On Thu, Jan 20, 2022 at 2:36 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch adds a combine pattern for "CA minus one". As CA only has two
> values (0 or 1), we could convert following pattern
>   (sign_extend:DI (plus:SI (reg:SI 98 ca)
> (const_int -1 [0x]
> to
>(plus:DI (reg:DI 98 ca)
> (const_int -1 [0x])))
>With this patch, one unnecessary sign extend is eliminated.
>
>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-20 Haochen Gui 
>
> gcc/
> * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr95737.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6ecb0bd6142..1d8b212962f 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -2358,6 +2358,19 @@ (define_insn "subf3_carry_in_xx"
>"subfe %0,%0,%0"
>[(set_attr "type" "add")])
>
> +(define_insn_and_split "*extenddi_ca_minus_one"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> +   (sign_extend:DI (plus:SI (reg:SI CA_REGNO)
> +(const_int -1]
> +  ""
> +  "#"
> +  ""
> +  [(parallel [(set (match_dup 0)
> +  (plus:DI (reg:DI CA_REGNO)
> +   (const_int -1)))
> + (clobber (reg:DI CA_REGNO))])]
> +  ""
> +)
>
>  (define_insn "@neg2"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c 
> b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> new file mode 100644
> index 000..94320f23423
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> @@ -0,0 +1,10 @@
> +/* PR target/95737 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */

Why does the testcase force power8? This testcase is not specific to
Power8 or later.

> +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
> +
> +
> +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
> long b)
> +{
> +   return -(a < b);
> +}

If you're only testing for lp64, the testcase could use "long" instead
of "long long".

This is okay with those changes.

Thanks, David

Re: [PATCH, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-19 Thread David Edelsohn via Gcc-patches

On Wed, Jan 19, 2022 at 2:12 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch adds a combine pattern for "CA minus one". As CA only has two
> values (0 or 1), we could convert following pattern
>   (sign_extend:DI (plus:SI (reg:SI 98 ca)
> (const_int -1 [0x]
> to
>(plus:DI (reg:DI 98 ca)
> (const_int -1 [0x])))
> With this patch, it eliminates one unnecessary sign extend. Also in 
> rs6000,
> regclass of CA register is set to NO_REGS. So CA is not in hard register set
> and it can't match register_operand. The patch changes it to any_operand.

Segher changed the class in 2014.

https://gcc.gnu.org/pipermail/gcc-patches/2014-September/399192.html

We need to ensure that it still is the correct decision in light of
these new patterns.

Thanks, David

>
> Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-19 Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (ca_operand): Match any_operand as CA
> register is not in hard register set.
> * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr95737.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index c65dfb91f3d..cd2ae1dc8e0 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -188,7 +188,7 @@ (define_predicate "vlogical_operand"
>
>  ;; Return 1 if op is the carry register.
>  (define_predicate "ca_operand"
> -  (match_operand 0 "register_operand")
> +  (match_operand 0 "any_operand")
>  {
>if (SUBREG_P (op))
>  op = SUBREG_REG (op);
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6ecb0bd6142..f1b09aad3b5 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -2358,6 +2358,21 @@ (define_insn "subf3_carry_in_xx"
>"subfe %0,%0,%0"
>[(set_attr "type" "add")])
>
> +(define_insn_and_split "*extenddi_ca_minus_one"
> +  [(set (match_operand:DI 0 "gpc_reg_operand")
> +   (sign_extend:DI (plus:SI (match_operand:SI 1 "ca_operand")
> +(const_int -1]
> +  ""
> +  "#"
> +  ""
> +  [(parallel [(set (match_dup 0)
> +  (plus:DI (match_dup 2)
> +   (const_int -1)))
> + (clobber (match_dup 2))])]
> +{
> +  operands[2] = copy_rtx (operands[1]);
> +  PUT_MODE (operands[2], DImode);
> +})
>
>  (define_insn "@neg2"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c 
> b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> new file mode 100644
> index 000..94320f23423
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> @@ -0,0 +1,10 @@
> +/* PR target/95737 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
> +
> +
> +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
> long b)
> +{
> +   return -(a < b);
> +}

Re: [PATCH, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-19 Thread David Edelsohn via Gcc-patches

On Wed, Jan 19, 2022 at 2:12 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch adds a combine pattern for "CA minus one". As CA only has two
> values (0 or 1), we could convert following pattern
>   (sign_extend:DI (plus:SI (reg:SI 98 ca)
> (const_int -1 [0x]
> to
>(plus:DI (reg:DI 98 ca)
> (const_int -1 [0x])))
> With this patch, it eliminates one unnecessary sign extend. Also in 
> rs6000,
> regclass of CA register is set to NO_REGS. So CA is not in hard register set
> and it can't match register_operand. The patch changes it to any_operand.

CA_REGNO should be in class CA_REGS, not class NO_REGS.  This seems
like a major, latent bug.

Thanks, David

>
> Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-19 Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (ca_operand): Match any_operand as CA
> register is not in hard register set.
> * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr95737.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index c65dfb91f3d..cd2ae1dc8e0 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -188,7 +188,7 @@ (define_predicate "vlogical_operand"
>
>  ;; Return 1 if op is the carry register.
>  (define_predicate "ca_operand"
> -  (match_operand 0 "register_operand")
> +  (match_operand 0 "any_operand")
>  {
>if (SUBREG_P (op))
>  op = SUBREG_REG (op);
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6ecb0bd6142..f1b09aad3b5 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -2358,6 +2358,21 @@ (define_insn "subf3_carry_in_xx"
>"subfe %0,%0,%0"
>[(set_attr "type" "add")])
>
> +(define_insn_and_split "*extenddi_ca_minus_one"
> +  [(set (match_operand:DI 0 "gpc_reg_operand")
> +   (sign_extend:DI (plus:SI (match_operand:SI 1 "ca_operand")
> +(const_int -1]
> +  ""
> +  "#"
> +  ""
> +  [(parallel [(set (match_dup 0)
> +  (plus:DI (match_dup 2)
> +   (const_int -1)))
> + (clobber (match_dup 2))])]
> +{
> +  operands[2] = copy_rtx (operands[1]);
> +  PUT_MODE (operands[2], DImode);
> +})
>
>  (define_insn "@neg2"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c 
> b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> new file mode 100644
> index 000..94320f23423
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> @@ -0,0 +1,10 @@
> +/* PR target/95737 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
> +
> +
> +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
> long b)
> +{
> +   return -(a < b);
> +}

Re: [PATCH] libstdc++: Implement C++20 atomic and atomic

2022-01-18 Thread David Edelsohn via Gcc-patches

This patch introduced new AIX testsuite failures.

PR libstdc++/104101

Thanks, David

Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-14 Thread David Edelsohn via Gcc-patches

On Fri, Jan 14, 2022 at 5:42 AM Kewen.Lin  wrote:
>
> on 2022/1/13 下午11:15, David Edelsohn wrote:
> > On Thu, Jan 13, 2022 at 7:40 AM Kewen.Lin  wrote:
> >>
> >> Hi David,
> >>
> >> on 2022/1/13 上午11:12, David Edelsohn wrote:
> >>> On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> 
>  Hi,
> 
>  This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
>  GET_MODE_NUNITS, which can use known constant instead.
> >>>
> >>> I'll let Segher decide, but often the additional code is useful
> >>> self-documentation instead of magic constants.  Or at least the change
> >>> requires comments documenting the derivation of the constants
> >>> currently described by the code itself.
> >>>
> >>
> >> Thanks for the comments, I added some comments as suggested, also removed
> >> the whole "altivec_vreveti2" since I noticed it's useless, it's not used
> >> by any built-in functions and even unused in the commit db042e1603db50573.
> >>
> >> The updated version has been tested as before.
> >
> > As we have discussed offline, the comments need to be clarified and 
> > expanded.
> >
> > And the removal of altivec_vreveti2 should be confirmed with Carl
> > Love, who added the pattern less than a year ago. There may be another
> > patch planning to use it.
> >
>
> Thanks for the suggestions David!  The comments have been updated, and Carl
> also helped to confirm the altivec_vreveti2 pattern is not planned for any
> future work and looks reasonable to remove.
>
> Does this updated version look good to you?

The revised patch is okay.

Thanks, David

>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vreveti2): Remove.
> * config/rs6000/vsx.md (*vsx_extract_si, 
> *vsx_extract_si_float_df,
> *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use
> known constant values to simplify code.
> ---
>  gcc/config/rs6000/altivec.md | 25 -
>  gcc/config/rs6000/vsx.md | 16 
>  2 files changed, 12 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 950b17862c4..4412175a0dc 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3950,31 +3950,6 @@
>DONE;
>  })
>
> -;; Vector reverse elements
> -(define_expand "altivec_vreveti2"
> -  [(set (match_operand:TI 0 "register_operand" "=v")
> -   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
> - UNSPEC_VREVEV))]
> -  "TARGET_ALTIVEC"
> -{
> -  int i, j, size, num_elements;
> -  rtvec v = rtvec_alloc (16);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -
> -  size = GET_MODE_UNIT_SIZE (TImode);
> -  num_elements = GET_MODE_NUNITS (TImode);
> -
> -  for (j = 0; j < num_elements; j++)
> -for (i = 0; i < size; i++)
> -  RTVEC_ELT (v, i + j * size)
> -   = GEN_INT (i + (num_elements - 1 - j) * size);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> -  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
> -operands[1], mask));
> -  DONE;
> -})
> -
>  ;; Vector reverse elements for V16QI V8HI V4SI V4SF
>  (define_expand "altivec_vreve2"
>[(set (match_operand:VEC_K 0 "register_operand" "=v")
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index acd729d1687..ee748ff4ebd 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3854,8 +3854,10 @@
>rtx vec_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering, the below minuend 3 is computed by
> + GET_MODE_NUNITS (V4SImode) - 1.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4230,8 +4232,10 @@
>rtx v4si_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering, the below minuend 3 is computed by
> + GET_MODE_NUNITS (V4SImode) - 1.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4273,8 +4277,10 @@
>rtx df_tmp = operands[4];
>int value;
>
> +  /* Adjust index for LE element ordering, the below minuend 3 is computed by
> + GET_MODE_NUNITS (V4SImode) - 1.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4466,8 +4472,10 @@
>  {
>int ele = INTVAL (operands[4]);
>
> +  /* Adjust index for LE element ordering, the below minuend 3 is computed by
> +

Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-13 Thread David Edelsohn via Gcc-patches

On Thu, Jan 13, 2022 at 7:28 AM Kewen.Lin  wrote:
>
> on 2022/1/13 上午11:56, Kewen.Lin via Gcc-patches wrote:
> > on 2022/1/13 上午11:44, David Edelsohn wrote:
> >> On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
> >>>
> >>> Hi David,
> >>>
> >>> on 2022/1/13 上午11:07, David Edelsohn wrote:
>  On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >
> > Hi,
> >
> > This patch is to fix register constraint v with
> > rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> > just like some other existing register constraints with
> > RS6000_CONSTRAINT_*.
> >
> > I happened to see this and hope it's not intentional and just
> > got neglected.
> >
> > Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> > powerpc64-linux-gnu P8.
> >
> > Is it ok for trunk?
> 
>  Why do you want to make this change?
> 
>  rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
> 
>  but all of the patterns that use a "v" constraint are (or should be)
>  protected by TARGET_ALTIVEC, or some final condition that only is
>  active for TARGET_ALTIVEC.  The other constraints are conditionally
>  set because they can be used in a pattern with multiple alternatives
>  where the pattern itself is active but some of the constraints
>  correspond to NO_REGS when some instruction variants for VSX is not
>  enabled.
> 
> >>>
> >>> Good point!  Thanks for the explanation.
> >>>
>  The change isn't wrong, but it doesn't correct a bug and provides no
>  additional benefit nor clarty that I can see.
> 
> >>>
> >>> The original intention is to make it consistent with the other existing
> >>> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
> >>> weird (like was neglected).  After you clarified above, 
> >>> RS6000_CONSTRAINT_v
> >>> seems useless at all in the current framework.  Do you prefer to remove
> >>> it to avoid any confusions instead?
> >>
> >> It's used in the reg_class, so there may be some heuristic in the GCC
> >> register allocator that cares about the number of registers available
> >> for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
> >> conditionally, so it seems best to leave it as is.
> >>
> >
> > I may miss something, but I didn't find it's used for the above purposes.
> > If it's best to leave it as is, the proposed patch seems to offer better
> > readability.
>
> Two more inputs for maintainers' decision:
>
> 1) the original proposed patch fixed one "bug" that is:
>
> In function rs6000_debug_reg_global, it tries to print the register class
> for the register constraint:
>
>   fprintf (stderr,
>"\n"
>"d  reg_class = %s\n"
>"f  reg_class = %s\n"
>"v  reg_class = %s\n"
>"wa reg_class = %s\n"
>...
>"\n",
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>...
>
> It uses rs6000_constraints[RS6000_CONSTRAINT_v] which is conditionally
> set here:
>
>   /* Add conditional constraints based on various options, to allow us to
>  collapse multiple insn patterns.  */
>   if (TARGET_ALTIVEC)
> rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
>
> But the actual register class for register constraint is hardcoded as
> ALTIVEC_REGS rather than rs6000_constraints[RS6000_CONSTRAINT_v].

I agree that the information is inaccurate, but it is informal
debugging output.  And if Altivec is disabled, the value of the
constraint is irrelevant / garbage.

>
> 2) Bootstrapped and tested one below patch to remove all the code using
> RS6000_CONSTRAINT_v on powerpc64le-linux-gnu P10 and P9,
> powerpc64-linux-gnu P8 and P7 with no regressions.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 37f07fe5358..3652629c5d0 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -2320,7 +2320,6 @@ rs6000_debug_reg_global (void)
>"\n"
>"d  reg_class = %s\n"
>"f  reg_class = %s\n"
> -  "v  reg_class = %s\n"
>"wa reg_class = %s\n"
>"we reg_class = %s\n"
>"wr reg_class = %s\n"
> @@ -2329,7 +2328,6 @@ rs6000_debug_reg_global (void)
>"\n",
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
> -  reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
> @@ -2984,11 +2982,6 @@

Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-13 Thread David Edelsohn via Gcc-patches

On Thu, Jan 13, 2022 at 7:40 AM Kewen.Lin  wrote:
>
> Hi David,
>
> on 2022/1/13 上午11:12, David Edelsohn wrote:
> > On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
> >> GET_MODE_NUNITS, which can use known constant instead.
> >
> > I'll let Segher decide, but often the additional code is useful
> > self-documentation instead of magic constants.  Or at least the change
> > requires comments documenting the derivation of the constants
> > currently described by the code itself.
> >
>
> Thanks for the comments, I added some comments as suggested, also removed
> the whole "altivec_vreveti2" since I noticed it's useless, it's not used
> by any built-in functions and even unused in the commit db042e1603db50573.
>
> The updated version has been tested as before.

As we have discussed offline, the comments need to be clarified and expanded.

And the removal of altivec_vreveti2 should be confirmed with Carl
Love, who added the pattern less than a year ago. There may be another
patch planning to use it.

Thanks, David

>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vreveti2): Remove.
> * config/rs6000/vsx.md (*vsx_extract_si, 
> *vsx_extract_si_float_df,
> *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use
> known constant values to simplify code.
> ---
>  gcc/config/rs6000/altivec.md | 25 -
>  gcc/config/rs6000/vsx.md | 12 
>  2 files changed, 8 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index c2312cc1e0f..b7f056f8c60 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3950,31 +3950,6 @@ (define_expand "altivec_negv4sf2"
>DONE;
>  })
>
> -;; Vector reverse elements
> -(define_expand "altivec_vreveti2"
> -  [(set (match_operand:TI 0 "register_operand" "=v")
> -   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
> - UNSPEC_VREVEV))]
> -  "TARGET_ALTIVEC"
> -{
> -  int i, j, size, num_elements;
> -  rtvec v = rtvec_alloc (16);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -
> -  size = GET_MODE_UNIT_SIZE (TImode);
> -  num_elements = GET_MODE_NUNITS (TImode);
> -
> -  for (j = 0; j < num_elements; j++)
> -for (i = 0; i < size; i++)
> -  RTVEC_ELT (v, i + j * size)
> -   = GEN_INT (i + (num_elements - 1 - j) * size);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> -  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
> -operands[1], mask));
> -  DONE;
> -})
> -
>  ;; Vector reverse elements for V16QI V8HI V4SI V4SF
>  (define_expand "altivec_vreve2"
>[(set (match_operand:VEC_K 0 "register_operand" "=v")
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 802db0d112b..d246410880d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3854,8 +3854,9 @@ (define_insn_and_split  "*vsx_extract_si"
>rtx vec_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4230,8 +4231,9 @@ (define_insn_and_split "*vsx_extract_si_float_df"
>rtx v4si_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4273,8 +4275,9 @@ (define_insn_and_split 
> "*vsx_extract_si_float_"
>rtx df_tmp = operands[4];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4466,8 +4469,9 @@ (define_insn "*vsx_insert_extract_v4sf_p9"
>  {
>int ele = INTVAL (operands[4]);
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
> +ele = 3 - ele;
>
>operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
>return "xxinsertw %x0,%x2,%4";
> --
> 2.27.0
>

Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-12 Thread David Edelsohn via Gcc-patches

On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
>
> Hi David,
>
> on 2022/1/13 上午11:07, David Edelsohn wrote:
> > On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> This patch is to fix register constraint v with
> >> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> >> just like some other existing register constraints with
> >> RS6000_CONSTRAINT_*.
> >>
> >> I happened to see this and hope it's not intentional and just
> >> got neglected.
> >>
> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> >> powerpc64-linux-gnu P8.
> >>
> >> Is it ok for trunk?
> >
> > Why do you want to make this change?
> >
> > rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
> >
> > but all of the patterns that use a "v" constraint are (or should be)
> > protected by TARGET_ALTIVEC, or some final condition that only is
> > active for TARGET_ALTIVEC.  The other constraints are conditionally
> > set because they can be used in a pattern with multiple alternatives
> > where the pattern itself is active but some of the constraints
> > correspond to NO_REGS when some instruction variants for VSX is not
> > enabled.
> >
>
> Good point!  Thanks for the explanation.
>
> > The change isn't wrong, but it doesn't correct a bug and provides no
> > additional benefit nor clarty that I can see.
> >
>
> The original intention is to make it consistent with the other existing
> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
> weird (like was neglected).  After you clarified above, RS6000_CONSTRAINT_v
> seems useless at all in the current framework.  Do you prefer to remove
> it to avoid any confusions instead?

It's used in the reg_class, so there may be some heuristic in the GCC
register allocator that cares about the number of registers available
for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
conditionally, so it seems best to leave it as is.

Thanks, David

Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-12 Thread David Edelsohn via Gcc-patches

On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>
> Hi,
>
> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
> GET_MODE_NUNITS, which can use known constant instead.

I'll let Segher decide, but often the additional code is useful
self-documentation instead of magic constants.  Or at least the change
requires comments documenting the derivation of the constants
currently described by the code itself.

Thanks, David

>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vreveti2): Use known constant
> values to simplify code.
> * config/rs6000/vsx.md (*vsx_extract_si, 
> *vsx_extract_si_float_df,
> *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9):
> Likewise.
> ---
>  gcc/config/rs6000/altivec.md | 11 +++
>  gcc/config/rs6000/vsx.md |  8 
>  2 files changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index c2312cc1e0f..d5c4ecfa9b7 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3957,17 +3957,12 @@ (define_expand "altivec_vreveti2"
>   UNSPEC_VREVEV))]
>"TARGET_ALTIVEC"
>  {
> -  int i, j, size, num_elements;
> +  int i;
>rtvec v = rtvec_alloc (16);
>rtx mask = gen_reg_rtx (V16QImode);
>
> -  size = GET_MODE_UNIT_SIZE (TImode);
> -  num_elements = GET_MODE_NUNITS (TImode);
> -
> -  for (j = 0; j < num_elements; j++)
> -for (i = 0; i < size; i++)
> -  RTVEC_ELT (v, i + j * size)
> -   = GEN_INT (i + (num_elements - 1 - j) * size);
> +  for (i = 0; i < 16; i++)
> +RTVEC_ELT (v, i) = GEN_INT (i);
>
>emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 802db0d112b..892b99c6d6b 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3855,7 +3855,7 @@ (define_insn_and_split  "*vsx_extract_si"
>int value;
>
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4231,7 +4231,7 @@ (define_insn_and_split "*vsx_extract_si_float_df"
>int value;
>
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4274,7 +4274,7 @@ (define_insn_and_split 
> "*vsx_extract_si_float_"
>int value;
>
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4467,7 +4467,7 @@ (define_insn "*vsx_insert_extract_v4sf_p9"
>int ele = INTVAL (operands[4]);
>
>if (!BYTES_BIG_ENDIAN)
> -ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
> +ele = 3 - ele;
>
>operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
>return "xxinsertw %x0,%x2,%4";
> --
> 2.27.0
>

Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-12 Thread David Edelsohn via Gcc-patches

On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>
> Hi,
>
> This patch is to fix register constraint v with
> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> just like some other existing register constraints with
> RS6000_CONSTRAINT_*.
>
> I happened to see this and hope it's not intentional and just
> got neglected.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?

Why do you want to make this change?

rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;

but all of the patterns that use a "v" constraint are (or should be)
protected by TARGET_ALTIVEC, or some final condition that only is
active for TARGET_ALTIVEC.  The other constraints are conditionally
set because they can be used in a pattern with multiple alternatives
where the pattern itself is active but some of the constraints
correspond to NO_REGS when some instruction variants for VSX is not
enabled.

The change isn't wrong, but it doesn't correct a bug and provides no
additional benefit nor clarty that I can see.

Thanks, David

Re: [PATCH, rs6000] Enable absolute jump table by default

2022-01-12 Thread David Edelsohn via Gcc-patches

On Wed, Jan 12, 2022 at 8:40 PM HAO CHEN GUI  wrote:
>
> Hi David,
>
> On 12/1/2022 下午 10:44, David Edelsohn wrote:
> > On Wed, Jan 12, 2022 at 7:22 AM HAO CHEN GUI  wrote:
> >>
> >> Hi,
> >>This patch enables absolute jump table by default on rs6000. The 
> >> relative jump tables are used when
> >>it's explicit set by "rs6000_relative_jumptables",
> >>or jump tables are placed in text section but global relocation is 
> >> required.
> >>
> >>Bootstrapped and tested on powerpc64-linux BE and LE with no 
> >> regressions. Is this okay for trunk?
> >> Any recommendations? Thanks a lot.
> >>
> >> ChangeLog
> >> 2022-01-12 Haochen Gui 
> >>
> >> gcc/
> >> * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Define.
> >> * config/rs6000/rs6000.c (rs6000_gen_pic_addr_diff_vec): Return
> >> true when relative jump table is explicit required or jump tables 
> >> have
> >> to be put in text section but global relocation is also required.
> >> * config/rs6000/rs6000.opt (rs6000_relative_jumptables): Disable.
> >>
> >> patch.diff
> >> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
> >> index d617f346f81..2e257c60f8c 100644
> >> --- a/gcc/config/rs6000/linux64.h
> >> +++ b/gcc/config/rs6000/linux64.h
> >> @@ -239,7 +239,7 @@ extern int dot_symbols;
> >>
> >>  /* Indicate that jump tables go in the text section.  */
> >>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> >> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> >> +#define JUMP_TABLES_IN_TEXT_SECTION 0
> >>
> >>  /* The linux ppc64 ABI isn't explicit on whether aggregates smaller
> >> than a doubleword should be padded upward or downward.  You could
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index 319182e94d9..9fba893a27a 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -28465,7 +28465,9 @@ rs6000_emit_xxspltidp_v2df (rtx dst, long value)
> >>  static bool
> >>  rs6000_gen_pic_addr_diff_vec (void)
> >>  {
> >> -  return rs6000_relative_jumptables;
> >> +  return rs6000_relative_jumptables
> >> +|| (JUMP_TABLES_IN_TEXT_SECTION
> >> +&& targetm.asm_out.reloc_rw_mask () == 3);
> >>  }
> >
> > This seems like contorted logic and overriding the
> > rs6000_relative_jumptables option change.  The later part of the patch
> > overrides rs6000_relative_jumptables for all rs6000 configurations,
> > and then changes this one use of rs6000_relative_jumptables to add
> > more logic to revert to the old meaning for some targets.
> >
> > What about all of the other uses of rs6000_relative_jumptables in the
> > target?  What about rs6000.md?
> >
> > I highly doubt that this patch is correct.
> >
> > Why not override rs6000_relative_jumptables for PPC64 Linux instead of
> > changing its value globally and then trying to fix it up?
> >
> > Thanks, David
>   Thanks for your comments.
>
>   In this patch, I tried to enable absolute jump table on all rs6000 targets.
> For PPC64 Linux, it supports RELRO section (e.g. .data.rel.ro.local) as
> "JUMP_TABLES_IN_TEXT_SECTION" is set to 0. So, absolute jump tables could be 
> placed
> in RELRO section whatever global relocation is required or not. The absolute 
> jump
> table can't be placed in text section when global relocation is required as 
> text
> section is not writable. So for other rs6000 targets, absolute jump table 
> can't be
> used if the target doesn't support RELRO and global relocation is required 
> also.
>
>   Looking forward to your advice. Thanks again.

Why not override rs6000_relative_jumptables in
rs6000_option_override_internal() for PPC64 Linux subtarget instead of
changing the default value and then trying to fix the behavior for
other configurations in rs6000_gen_pic_addr_diff_vec()?  Or override
it in SUBSUBTARGET_OVERRIDE_OPTIONS in linux64.h, or whichever header
file(s) is appropriate for the subtarget variants.

Your initial patch also changed rs6000_gen_pic_addr_diff_vec but
didn't address the use of rs6000_relative_jumptables in the definition
of CASE_VECTOR_MODE in rs6000.h.  That cannot have been correct.  At
least without the change to the default value of
rs6000_relative_jumptables you don't need to add kludges to all of the
places where that variable is used for other subtarget variants of the
rs6000 target.

Thanks, David

Re: [PATCH, rs6000] Enable absolute jump table by default

2022-01-12 Thread David Edelsohn via Gcc-patches

On Wed, Jan 12, 2022 at 7:22 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch enables absolute jump table by default on rs6000. The relative 
> jump tables are used when
>it's explicit set by "rs6000_relative_jumptables",
>or jump tables are placed in text section but global relocation is 
> required.
>
>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. 
> Is this okay for trunk?
> Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-12 Haochen Gui 
>
> gcc/
> * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Define.
> * config/rs6000/rs6000.c (rs6000_gen_pic_addr_diff_vec): Return
> true when relative jump table is explicit required or jump tables have
> to be put in text section but global relocation is also required.
> * config/rs6000/rs6000.opt (rs6000_relative_jumptables): Disable.
>
> patch.diff
> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
> index d617f346f81..2e257c60f8c 100644
> --- a/gcc/config/rs6000/linux64.h
> +++ b/gcc/config/rs6000/linux64.h
> @@ -239,7 +239,7 @@ extern int dot_symbols;
>
>  /* Indicate that jump tables go in the text section.  */
>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> +#define JUMP_TABLES_IN_TEXT_SECTION 0
>
>  /* The linux ppc64 ABI isn't explicit on whether aggregates smaller
> than a doubleword should be padded upward or downward.  You could
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 319182e94d9..9fba893a27a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -28465,7 +28465,9 @@ rs6000_emit_xxspltidp_v2df (rtx dst, long value)
>  static bool
>  rs6000_gen_pic_addr_diff_vec (void)
>  {
> -  return rs6000_relative_jumptables;
> +  return rs6000_relative_jumptables
> +|| (JUMP_TABLES_IN_TEXT_SECTION
> +&& targetm.asm_out.reloc_rw_mask () == 3);
>  }

This seems like contorted logic and overriding the
rs6000_relative_jumptables option change.  The later part of the patch
overrides rs6000_relative_jumptables for all rs6000 configurations,
and then changes this one use of rs6000_relative_jumptables to add
more logic to revert to the old meaning for some targets.

What about all of the other uses of rs6000_relative_jumptables in the
target?  What about rs6000.md?

I highly doubt that this patch is correct.

Why not override rs6000_relative_jumptables for PPC64 Linux instead of
changing its value globally and then trying to fix it up?

Thanks, David

>
>  void
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index c2a77182a9e..75e3fa86829 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -630,7 +630,7 @@ Target Mask(MMA) Var(rs6000_isa_flags)
>  Generate (do not generate) MMA instructions.
>
>  mrelative-jumptables
> -Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save
> +Target Undocumented Var(rs6000_relative_jumptables) Init(0) Save
>
>  mrop-protect
>  Target Var(rs6000_rop_protect) Init(0)

libgfortran bootstrap failure

2022-01-11 Thread David Edelsohn via Gcc-patches

The recent patch to support Power IEEE128 causes a bootstrap failure
on AIX and possibly all non-GLIBC systems.

+#if defined(__powerpc64__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ \
+&& defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 32)
+#define POWER_IEEE128 1
+#endif

__GLIBC_PREREQ is tested on all systems.

/nasfarm/edelsohn/src/src/libgfortran/libgfortran.h:107:49: error:
missing binary operator before token "("
  107 | && defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 32)
  | ^

Re: [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2022-01-11 Thread David Edelsohn via Gcc-patches

Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible. Note that explicit instruction scheduling "barriers" are
added to prevent floating-point computations from being moved before or
after the explicit FPSCR manipulations.  (That these are required has
been reported as an issue in GCC: PR102783.)

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-10-18  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.

Okay.

Thanks, David

Re: [PATCH] PR 102935, Fix pr101384-1.c code generation test.

2022-01-11 Thread David Edelsohn via Gcc-patches

On Tue, Jan 11, 2022 at 12:06 PM Bill Schmidt  wrote:
>
> Hi Mike,
>
> This looks fine to me.  Maintainers?

Okay.

Thanks, David

>
> Thanks,
> Bill
>
> On 1/7/22 6:33 PM, Michael Meissner wrote:
> > Fix pr101384-1.c code generation test.
> >
> > Add support for the compiler using XXSPLTIB reg,255 to load all 1's into a
> > register on power9 and above instead of using VSPLTI{B,H,W} reg,-1.
> >
> > gcc/testsuite/
> > 2022-01-07  Michael Meissner  
> >
> >   PR testsuite/102935
> >   * gcc.target/powerpc/pr101384-1.c: Update insn regexp for power9
> >   and power10.
> > ---
> >  gcc/testsuite/gcc.target/powerpc/pr101384-1.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr101384-1.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> > index 627d7d76721..41cf84bf8bc 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> > @@ -2,7 +2,7 @@
> >  /* { dg-do compile { target le } } */
> >  /* { dg-options "-O2 -maltivec" } */
> >  /* { dg-require-effective-target powerpc_altivec_ok } */
> > -/* { dg-final { scan-assembler-times {\mvspltis[whb] [^\n\r]*,-1\M} 9 } } 
> > */
> > +/* { dg-final { scan-assembler-times {\mvspltis[whb] 
> > [^\n\r]*,-1\M|\mxxspltib[^\n\r]*,255\M} 9 } } */
> >  /* { dg-final { scan-assembler-times {\mvslw\M} 3 } } */
> >  /* { dg-final { scan-assembler-times {\mvslh\M} 3 } } */
> >  /* { dg-final { scan-assembler-times {\mvslb\M} 3 } } */

Re: Ping: [PATCH] rs6000: Add split pattern to replace

2022-01-11 Thread David Edelsohn via Gcc-patches

On Tue, Jan 11, 2022 at 2:27 AM Xionghu Luo  wrote:
>
> On 2022/1/11 06:55, David Edelsohn wrote:
> >>> +(define_insn_and_split "sldoi_to_mov_"
> > It would be more consistent with the naming convention to use
> > "sldoi_to_mov" without the final "_".
>
> OK, thanks.
>
> >
> >>> +  [(set (match_operand:VM 0 "altivec_register_operand")
> >>> + (unspec:VM [(match_operand:VM 1 "easy_vector_constant")
> > Should this be "easy_vector_constant_vsldoi"?
>
>
> This doesn't work. easy_vector_constant_vsldoi return false due to
> vspltis_shifted "return 0" as:
>
> vspltis_shifted (rtx op): /* If all elements are equal, we don't need to 
> do VSLDOI.  */
>
>
> (gdb) p op
> $7 = (rtx_def *) (const_vector:V4SI [
> (const_int 0 [0]) repeated x4
> ])
> (gdb) p easy_vector_constant_vsldoi(op, V4SImode)
> $8 = false
> p easy_vector_constant(op, V4SImode)
> $9 = true

Okay, thanks for checking.

>
> >
> >>> + (match_dup 1)
> >>> + (match_operand:VM 2 "u5bit_cint_operand")]
> > This should be match_operand:QI, right?
>
> Yes.

This patch is okay with the other changes.

Thanks, David

Re: Ping: [PATCH] rs6000: powerpc suboptimal boolean test of contiguous bits [PR102239]

2022-01-10 Thread David Edelsohn via Gcc-patches

On Mon, Jan 10, 2022 at 12:37 AM Xionghu Luo  wrote:
>
> Ping, thanks.
>
>
> On 2021/12/13 13:16, Xionghu Luo wrote:
> > Add specialized version to combine two instructions from
> >
> >  9: {r123:CC=cmp(r124:DI&0x6,0);clobber scratch;}
> >REG_DEAD r124:DI
> >  10: pc={(r123:CC==0)?L15:pc}
> >   REG_DEAD r123:CC
> >
> > to:
> >
> >  10: {pc={(r123:DI&0x6==0)?L15:pc};clobber scratch;clobber %0:CC;}
> >
> > then split2 will split it to one rotate dot instruction (to save one
> > rotate back instruction) as shifted result doesn't matter when comparing
> > to 0 in CCEQmode.
> >
> > Bootstrapped and regression tested pass on Power 8/9/10, OK for master?
> >
> > gcc/ChangeLog:
> >
> >   PR target/102239
> >   * config/rs6000/rs6000.md (*anddi3_insn_dot): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR target/102239
> >   * gcc.target/powerpc/pr102239.c: New test.
> > ---
> >  gcc/config/rs6000/rs6000-protos.h   |  1 +
> >  gcc/config/rs6000/rs6000.c  |  7 
> >  gcc/config/rs6000/rs6000.md | 38 +
> >  gcc/testsuite/gcc.target/powerpc/pr102239.c | 13 +++
> >  4 files changed, 59 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102239.c
> >
> > diff --git a/gcc/config/rs6000/rs6000-protos.h 
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 14f6b313105..3644c524376 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -73,6 +73,7 @@ extern int expand_block_move (rtx[], bool);
> >  extern bool expand_block_compare (rtx[]);
> >  extern bool expand_strn_compare (rtx[], int);
> >  extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
> > +extern bool rs6000_is_valid_rotate_dot_mask (rtx mask, machine_mode mode);
> >  extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
> >  extern bool rs6000_is_valid_shift_mask (rtx, rtx, machine_mode);
> >  extern bool rs6000_is_valid_insert_mask (rtx, rtx, machine_mode);
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 5e129986516..57a38cf954a 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -11606,6 +11606,13 @@ rs6000_is_valid_mask (rtx mask, int *b, int *e, 
> > machine_mode mode)
> >return true;
> >  }
> >
> > +bool
> > +rs6000_is_valid_rotate_dot_mask (rtx mask, machine_mode mode)
> > +{
> > +  int nb, ne;
> > +  return rs6000_is_valid_mask (mask, , , mode) && nb >= ne && ne > 0;
> > +}
> > +
> >  /* Return whether MASK (a CONST_INT) is a valid mask for any rlwinm, 
> > rldicl,
> > or rldicr instruction, to implement an AND with it in mode MODE.  */
> >
> > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > index 6bec2bddbde..014dc9612ea 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -3762,6 +3762,44 @@ (define_insn_and_split "*and3_2insn_dot2"
> > (set_attr "dot" "yes")
> > (set_attr "length" "8,12")])
> >
> > +(define_insn_and_split "*anddi3_insn_dot"

This pattern needs a name that better represents its purpose.  The
pattern name implies that it's operating on a combination of AND and
Record Condition bit.  Also "insn" is confusing; I think that you are
using the template from the 2insn_dot names, so this should explicitly
be 1insn. Maybe "branch_anddi3_1insn_dot", or just
"branch_anddi3_dot".

> > + [(set (pc)
> > +(if_then_else (eq (and:DI (match_operand:DI 1 "gpc_reg_operand" "%r,r")
> > +   (match_operand:DI 2 "const_int_operand" "n,n"))
> > +   (const_int 0))
> > +   (label_ref (match_operand 3 ""))
> > +   (pc)))
> > +  (clobber (match_scratch:DI 0 "=r,r"))
> > +  (clobber (reg:CC CR0_REGNO))]
> > +  "rs6000_is_valid_rotate_dot_mask (operands[2], DImode)
> > +  && TARGET_POWERPC64"
> > +  "#"
> > +  "&& reload_completed"
> > +  [(pc)]
> > +{
> > +   int nb, ne;
> > +   if (rs6000_is_valid_mask (operands[2], , , DImode)
> > +   && nb >= ne
> > +   && ne > 0)
> > + {
> > + unsigned HOST_WIDE_INT val = INTVAL (operands[2]);
> > + int shift = 63 - nb;
> > + rtx tmp = gen_rtx_ASHIFT (DImode, operands[1], GEN_INT (shift));
> > + tmp = gen_rtx_AND (DImode, tmp, GEN_INT (val << shift));
> > + rtx cr0 = gen_rtx_REG (CCmode, CR0_REGNO);
> > + rs6000_emit_dot_insn (operands[0], tmp, 1, cr0);
> > + rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[3]);
> > + rtx cond = gen_rtx_EQ (CCEQmode, cr0, const0_rtx);
> > + rtx ite = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, loc_ref, pc_rtx);
> > + emit_jump_insn (gen_rtx_SET (pc_rtx, ite));
> > + DONE;
> > + }
> > +   else
> > + FAIL;
> > +}
> > +  [(set_attr "type" "shift")
> > +   (set_attr "dot" "yes")
> > +   (set_attr "length" "8,12")])
> >
> >  (define_expand "3"
> >[(set (match_operand:SDI 0 "gpc_reg_operand")
> > diff --git

Re: [PATCH] rs6000: Remove useless code related to -mno-power10

2022-01-10 Thread David Edelsohn via Gcc-patches

On Wed, Dec 29, 2021 at 4:37 AM Kewen.Lin  wrote:
>
> Hi,
>
> Option -mpower10 was made as "WarnRemoved" since commit r11-2318,
> so -mno-power10 doesn't take effect any more.  This patch is to
> remove one line useless code which still respects it.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (rs6000_disable_incompatible_switches): 
> Remove
> useless related to option -mno-power10.
> ---
>  gcc/config/rs6000/rs6000.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index e82a47f4c0e..66b01e589b0 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -24825,7 +24825,6 @@ rs6000_disable_incompatible_switches (void)
>  const HOST_WIDE_INT dep_flags; /* flags that depend on this option.  
> */
>  const char *const name;/* name of the switch.  */
>} flags[] = {
> -{ OPTION_MASK_POWER10, OTHER_POWER10_MASKS,"power10"   },
>  { OPTION_MASK_P9_VECTOR,   OTHER_P9_VECTOR_MASKS,  "power9-vector" },
>  { OPTION_MASK_P8_VECTOR,   OTHER_P8_VECTOR_MASKS,  "power8-vector" },
>  { OPTION_MASK_VSX, OTHER_VSX_VECTOR_MASKS, "vsx"   },

Okay.

Thanks, David

Re: Ping^1 [PATCH, rs6000] new split pattern for TI to V1TI move [PR103124]

2022-01-10 Thread David Edelsohn via Gcc-patches

On Sun, Jan 9, 2022 at 10:16 PM HAO CHEN GUI  wrote:
>
> Hi,
>
> Gentle ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587051.html
>
> Thanks
>
> On 17/12/2021 上午 9:55, HAO CHEN GUI wrote:
> > Hi,
> >This patch defines a new split pattern for TI to V1TI move. The pattern 
> > concatenates two subreg:DI of
> > a TI to a V2DI. With the pattern, the subreg pass can do register split for 
> > TI when there is a TI to V1TI
> > move. The patch optimizes one unnecessary "mr" out on P9. The new test case 
> > illustrates it.
> >
> >Bootstrapped and tested on powerpc64-linux BE and LE with no 
> > regressions. Is this okay for trunk?
> > Any recommendations? Thanks a lot.
> >
> > ChangeLog
> > 2021-12-13 Haochen Gui 
> >
> > gcc/
> >   * config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined.
> >
> > gcc/testsuite/
> >   * gcc.target/powerpc/pr103124.c: New testcase.
> >
> >
> > patch.diff
> > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> > index bf033e31c1c..52968eb4609 100644
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -6589,3 +6589,19 @@ (define_insn "xxeval"
> > [(set_attr "type" "vecperm")
> >  (set_attr "prefixed" "yes")])
> >
> > +;; Construct V1TI by vsx_concat_v2di
> > +(define_split
> > +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> > + (subreg:V1TI
> > +   (match_operand:TI 1 "int_reg_operand") 0 ))]
> > +  "TARGET_P9_VECTOR && !reload_completed"
> > +  [(const_int 0)]
> > +{
> > +  rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0);
> > +  rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8);
> > +  rtx tmp3 = gen_reg_rtx (V2DImode);
> > +  emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2));
> > +  rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0);
> > +  emit_move_insn (operands[0], tmp4);
> > +  DONE;
> > +})
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> > new file mode 100644
> > index 000..e9072d19b8e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-require-effective-target int128 } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> > +/* { dg-final { scan-assembler-not "\mmr\M" } } */

Segher probably would prefer {\mmr\M} .

> > +
> > +vector __int128 add (long long a)
> > +{
> > +  vector __int128 b;
> > +  b = (vector __int128) {a};
> > +  return b;
> > +}

This is okay.

Thanks, David

Re: Ping: [PATCH] rs6000: Add split pattern to replace

2022-01-10 Thread David Edelsohn via Gcc-patches

On Mon, Jan 10, 2022 at 12:04 AM Xionghu Luo  wrote:
>
> Gentle ping, thanks.
>
>
> On 2021/12/29 09:27, Xionghu Luo wrote:
> > 7: r120:V4SI=const_vector
> > 8: r121:V4SI=unspec[r120:V4SI,r120:V4SI,0xc] 260
> >
> > with r121:v4SI = r120:V4SI when r120 is a vector with same element.
> >
> > Bootstrapped and regtested pass on powerpc64le-linux-gnu {P10, P9}
> > and powerpc64-linux-gnu {P8, P7}.  OK for master?
> >
> > gcc/ChangeLog:
> >
> >   * config/rs6000/altivec.md (sldoi_to_mov_): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/powerpc/sldoi_to_mov.c: New test.
> > ---
> >  gcc/config/rs6000/altivec.md| 11 +++
> >  gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c | 15 +++
> >  2 files changed, 26 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> >
> > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> > index b2909857c34..25f86dbe828 100644
> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -383,6 +383,17 @@ (define_split
> >  }
> >  })
> >
> > +(define_insn_and_split "sldoi_to_mov_"

It would be more consistent with the naming convention to use
"sldoi_to_mov" without the final "_".

> > +  [(set (match_operand:VM 0 "altivec_register_operand")
> > + (unspec:VM [(match_operand:VM 1 "easy_vector_constant")

Should this be "easy_vector_constant_vsldoi"?

> > + (match_dup 1)
> > + (match_operand:VM 2 "u5bit_cint_operand")]

This should be match_operand:QI, right?

Thanks, David

> > + UNSPEC_VSLDOI))]
> > +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0) (match_dup 1))])
> > +
> >  (define_insn "get_vrsave_internal"
> >[(set (match_operand:SI 0 "register_operand" "=r")
> >   (unspec:SI [(reg:SI VRSAVE_REGNO)] UNSPEC_GET_VRSAVE))]
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c 
> > b/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> > new file mode 100644
> > index 000..2053243c456
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +#include 
> > +vector signed int foo1 (vector signed int a) {
> > +vector signed int b = {0};
> > +return vec_sum2s(a, b);
> > +}
> > +
> > +vector signed int foo2 (vector signed int a) {
> > +vector signed int b = {0};
> > +return vec_sld(b, b, 4);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mvsldoi\M} 1 {target le} } } */
>
> --
> Thanks,
> Xionghu

Re: [power-ieee128] OPEN CONV

2022-01-08 Thread David Edelsohn via Gcc-patches

On Sat, Jan 8, 2022 at 1:59 PM Michael Meissner  wrote:
>
> On Sat, Jan 08, 2022 at 03:18:07PM +0100, Jakub Jelinek wrote:
> > On Sat, Jan 08, 2022 at 03:13:10PM +0100, Thomas Koenig wrote:
> > >
> > > On 08.01.22 15:02, Jakub Jelinek via Fortran wrote:
> > > > Note, as for byteswapping, apparently it wasn't ever working right fox
> > > > the IBM extended real(kind=16) and complex(kind=16).
> > >
> > > The lack of bug reports since the conversion feature was introduced in
> > > 2006, more than 15 years ago, tells us something, I guess...
> >
> > powerpc64le was only introduced in GCC 4.8 in 2013, so slightly less
> > than that, but still.
> > Either nobody interchanges/shares fortran unformatted data between
> > powerpc big and little endian, or if they do, they don't use real(kind=16)
> > or complex(kind=16) in there...
>
> I still wish I had had the forethought when we were setting up the LE ABI to
> change the default 128-bit format to IEEE instead of IBM.  But alas, I didn't.
> You would still need converters between the big endian IBM format and little
> endian IEEE format, but it would have avoided a lot of the problems where GCC
> assumes there is only one floating point format for each size.

Mike,

The LE ABI initial target was Power8 and IEEE128 hardware support was
added to Power9.  The ABI was a conscious decision. IEEE 128 was not a
viable requirement for the LE ABI at the time of the transition.

Thanks, David

Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8

2022-01-07 Thread David Edelsohn via Gcc-patches

On Fri, Jan 7, 2022 at 3:57 PM Paul A. Clarke  wrote:
>
> On Fri, Jan 07, 2022 at 02:40:51PM -0500, David Edelsohn via Gcc-patches 
> wrote:
> > +#ifdef __LITTLE_ENDIAN__
> > +  /* Sum across four integers with two integer results.  */
> > +  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
> > +  /* Note: vec_sum2s could be used here, but on little-endian, vector
> > + shifts are added that are not needed for this use-case.
> > + A vector shift to correctly position the 32-bit integer results
> > + (currently at [0] and [2]) to [1] and [3] would then need to be
> > + swapped back again since the desired results are two 64-bit
> > + integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
> > +#else
> >/* Sum across four integers with two integer results.  */
> >result = vec_sum2s (vsum, (__vector signed int) zero);
> >
> > If little-endian adds shifts to correct for the position and
> > big-endian does not, why not use the inline asm without the shifts for
> > both?  It seems confusing to add the inline asm only for LE instead of
> > always using it with the appropriate comment.
> >
> > It's a good and valuable optimization for LE.  Fewer variants are less
> > fragile, easier to test and easier to maintain.  If you're going to go
> > to the trouble of using inline asm for LE, use it for both.
>
> BE (only) _does_ need a shift as seen on the next two lines after the
> code snippet above:
>   /* Sum across four integers with two integer results.  */
>   result = vec_sum2s (vsum, (__vector signed int) zero);
>   /* Rotate the sums into the correct position.  */
>   result = vec_sld (result, result, 6);
>
> So, when using {vec_sum2s;vec_sld}:
> - LE gets an implicit shift in vec_sum2s which just needs to be undone
>   by the vec_sld, and those shifts don't "cancel out" and get removed
>   by GCC.
> - BE does not get any implicit shifts, but needs one that comes from
>   vec_sld.
>
> Are you saying use the asm(vsum2sws) and then conditionally call
> vec_sld on BE only?
>
> I viewed this change as a temporary bandage unless and until GCC can
> remove the unnecessary swaps.  It seems like the preferred code is
> vec_sum2s/vec_sld, not the asm, but that currently is suboptimal for LE.

Nevermind.  I thought that these patches had not been reviewed.

Thanks, David

Re: [PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2022-01-07 Thread David Edelsohn via Gcc-patches

On Fri, Jan 7, 2022 at 3:35 PM Paul A. Clarke  wrote:
>
> On Fri, Jan 07, 2022 at 02:23:14PM -0500, David Edelsohn wrote:
> > > Power10 ISA added `vextract*` instructions which are realized in the
> > > `vec_extractm` instrinsic.
> > >
> > > Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
> > > `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
> > >
> > > 2021-10-21  Paul A. Clarke  
> > >
> > > gcc
> > > * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
> > > when _ARCH_PWR10.
> > > * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
> > > (_mm_movemask_epi8): Likewise.
> > > ---
> > > Tested on Power10 powerpc64le-linux (compiled with and without
> > > `-mcpu=power10`).
> > >
> > > OK for trunk?
> >
> > This is okay modulo
> >
> > > + return vec_extractm ((__v16qu) __A);
> >
> > Should the above be __v16qi like x86?
>
> That would match x86 better, but we don't have a function signature
> for vec_extractm which accepts a signed type.

Okay, nevermind.  I thought that vec_extractm also allowed signed.

Thanks, David

Re: [PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2022-01-07 Thread David Edelsohn via Gcc-patches

On Fri, Jan 7, 2022 at 3:32 PM Paul A. Clarke  wrote:
>
> On Fri, Jan 07, 2022 at 02:15:22PM -0500, David Edelsohn wrote:
> > > Power10 ISA added `xxblendv*` instructions which are realized in the
> > > `vec_blendv` instrinsic.
> > >
> > > Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> > > `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
> > >
> > > Also, copy a test from i386 for testing `_mm_blendv_ps`.
> > > This should have come with commit 
> > > ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> > > but was inadvertently omitted.
> > >
> > > 2021-10-20  Paul A. Clarke  
> > >
> > > gcc
> > > * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
> > > when _ARCH_PWR10.
> > > (_mm_blendv_ps): Likewise.
> > > (_mm_blendv_pd): Likewise.
> > >
> > > gcc/testsuite
> > > * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
> > > adjust dg directives to suit.
> > > ---
> > > Tested on Power10 powerpc64le-linux (compiled with and without
> > > `-mcpu=power10`).
> > >
> > > OK for trunk?
> >
> > This is okay modulo
> >
> > > + return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> > > __mask);
> >
> > Should the above be __v16qi like x86?
>
> That does arguably match the types involved (epi8) better.
>
> Shall I change the original implementation as well (4 lines later)?
>
> >   return (__m128i) vec_sel ((__v16qi) __A, (__v16qi) __B, __lmask);

vec_blendv supports the signed type, so it seems that the function
should use that type, unless unsigned is preferred because PowerPC
defaults to unsigned char.

I wasn't going to recommend changing the existing code because I don't
know how the signed type interacts with the other builtins.

Thanks, David

Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8

2022-01-07 Thread David Edelsohn via Gcc-patches

+#ifdef __LITTLE_ENDIAN__
+  /* Sum across four integers with two integer results.  */
+  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
+  /* Note: vec_sum2s could be used here, but on little-endian, vector
+ shifts are added that are not needed for this use-case.
+ A vector shift to correctly position the 32-bit integer results
+ (currently at [0] and [2]) to [1] and [3] would then need to be
+ swapped back again since the desired results are two 64-bit
+ integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
+#else
   /* Sum across four integers with two integer results.  */
   result = vec_sum2s (vsum, (__vector signed int) zero);

If little-endian adds shifts to correct for the position and
big-endian does not, why not use the inline asm without the shifts for
both?  It seems confusing to add the inline asm only for LE instead of
always using it with the appropriate comment.

It's a good and valuable optimization for LE.  Fewer variants are less
fragile, easier to test and easier to maintain.  If you're going to go
to the trouble of using inline asm for LE, use it for both.

Thanks, David

Re: [PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2022-01-07 Thread David Edelsohn via Gcc-patches

> Power10 ISA added `vextract*` instructions which are realized in the
> `vec_extractm` instrinsic.
>
> Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
> `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
>
> 2021-10-21  Paul A. Clarke  
>
> gcc
> * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
> when _ARCH_PWR10.
> * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
> (_mm_movemask_epi8): Likewise.
> ---
> Tested on Power10 powerpc64le-linux (compiled with and without
> `-mcpu=power10`).
>
> OK for trunk?

This is okay modulo

> + return vec_extractm ((__v16qu) __A);

Should the above be __v16qi like x86?

Thanks, David

Re: [PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2022-01-07 Thread David Edelsohn via Gcc-patches

> Power10 ISA added `xxblendv*` instructions which are realized in the
> `vec_blendv` instrinsic.
>
> Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
>
> Also, copy a test from i386 for testing `_mm_blendv_ps`.
> This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> but was inadvertently omitted.
>
> 2021-10-20  Paul A. Clarke  
>
> gcc
> * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
> when _ARCH_PWR10.
> (_mm_blendv_ps): Likewise.
> (_mm_blendv_pd): Likewise.
>
> gcc/testsuite
> * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
> adjust dg directives to suit.
> ---
> Tested on Power10 powerpc64le-linux (compiled with and without
> `-mcpu=power10`).
>
> OK for trunk?

This is okay modulo

> + return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> __mask);

Should the above be __v16qi like x86?

Thanks, David

Re: [PATCH] aix: handle 64bit inodes for include directories

2021-12-30 Thread David Edelsohn via Gcc-patches

Hi, Jeff

Is the revised patch from Clement okay?

Thanks, David

On Tue, Aug 24, 2021 at 3:59 AM CHIGOT, CLEMENT  wrote:
>
> >>> So my worry here is this is really a host property -- ie, this is
> >>> behavior of where GCC runs, not the target for which GCC is generating 
> >>> code.
> >>>
> >>> That implies that the change in aix.h is wrong.  aix.h is for the
> >>> target, not the host -- you don't want to define something like
> >>> HOST_STAT_FOR_64BIT_INODES there.
> >>>
> >>> You'd want to be triggering this behavior via a host fragment, x-aix, or
> >>> better yet via an autoconf test.
> >> Indeed, would this version be better ? I'm not sure about the configure 
> >> test.
> >> But as we are retrieving the size of dev_t and ino_t just above, I'm 
> >> assuming
> >> that the one being used in stat directly. At least, that's the case on 
> >> AIX, and
> >> this test is only made for AIX.
> > It's a clear improvement.  It's still checking for the aix target though:
> >
> > +# Select the right stat being able to handle 64bit inodes, if needed.
> > +if test "$enable_largefile" != no; then
> > +  case "$target" in
> > +*-*-aix*)
> > +  if test "$ac_cv_sizeof_ino_t" == "4" -a "$ac_cv_sizeof_dev_t" ==
> > 4; then
> > +
> > +$as_echo "#define HOST_STAT_FOR_64BIT_INODES stat64x" >>confdefs.h
> > +
> > +  fi;;
> > +  esac
> > +fi
> >
> > Again, we're dealing with a host property.  You might be able to just
> > change $target above to $host.  Hmm, that makes me wonder about canadian
> > crosses where host != build.We may need to do this for both the aix
> > host and aix build.
>
> Yes, my bad, I've updated the case. I don't know if there is a usual way
> to check both $build and $host. I've tried to avoid code duplication so
> tell me if it's okay or if you'd rather have a case for $build and one
> for $host.
>
> Thanks,
> Clément

Re: [PATCH] rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus

2021-12-20 Thread David Edelsohn via Gcc-patches

On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool
 wrote:
>
> On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote:
> > On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo  wrote:
> > > These four UNSPECS seems could be replaced with native RTL, and why
> > > "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))"
> > > in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT):
> > >
> > >   This bit is sticky; that is, once set to 1 it
> > >   remains set to 1 until it is set to 0 by an
> > >   mtvscr instruction.
> > >
> > > The RTL pattern set it to 0 but final ASM doesn't present it?  And why
> > > not use Clobber VSCR_REGNO instead?
> >
> > The design came from the early implementation of Altivec:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html
> >
> > If one later checks for saturation (reads VSCR), one needs a
> > corresponding SET of the value.  It's set in an architecture-specific
> > manner that isn't described to GCC, but it's set, not just clobbered
> > and in an undefined state.
>
> Well.  RTL clobber and set do exactly the same thing, except with
> clobber it is not specified *what* value is set.  All bits are set, all
> bits are defined.  There is no (direct) way in RTL to say
> "undetermined".
>
> An RTL clobber would work just fine afaics?

I don't know about the original intention from Aldy, but if one were
looking at an RTL dump and the code used the saturation bit from VSCR,
it might be confusing to see a CLOBBER instead of a SET.  The SET
documents that VSCR_REGNO is assigned a specific value; GCC doesn't
know about the semantics, but it's not some undefined bit pattern.
CLOBBER implies a trash value or a value that one will not query
later, i.e., one would want to SET the register to a specific value
before using it.

>
> > The RTL does not describe that VSCR is set to the value 0.  The
> > (const_int 0) is not the value set.  You can think of the (const_int
> > 0) as a dummy RTL argument to the VSCR UNSPEC.  UNSPEC requires at
> > least one argument and the pattern doesn't try to express the
> > argument, so it uses a dummy RTL constant.
>
> Yup.  Traditionally (pc) was used for this.  Nowadays (const_int 0) is
> not really more expensive anymore, and many people find it clearer (but
> not in this case it seems :-) ).
>
> > It's part of a PARALLEL
> > and the plus or minus already expresses the data dependency of the
> > pattern on the input operands.
>
> But they do not describe any dependency on vscr, or output to it.  This
> is the same problem we have with fpscr (most FP insns use some of its
> fields, most set some, but there is no way to cleanly express that).

It describes that VSCR_REGNO is set, an output. It doesn't describe
how it is set nor inform the compiler that the value depends on the
input operands in some complicated way unknown to the compiler, but
the compiler cannot do anything useful with the additional
information.

>
> Explicit clobbers like this help one side of the issue.  For vscr, other
> than the sat bit there is only the nj bit, and we just ignore that :-)
>
> > This patch is okay.  Thanks for updating the machine description and
> > for cleaning up the formatting.
>
> x2.  Thanks!
>
>
> Segher

Re: [PATCH] rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus

2021-12-20 Thread David Edelsohn via Gcc-patches

On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo  wrote:
>
> These four UNSPECS seems could be replaced with native RTL, and why
> "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))"
> in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT):
>
>   This bit is sticky; that is, once set to 1 it
>   remains set to 1 until it is set to 0 by an
>   mtvscr instruction.
>
> The RTL pattern set it to 0 but final ASM doesn't present it?  And why
> not use Clobber VSCR_REGNO instead?

The design came from the early implementation of Altivec:

https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html

If one later checks for saturation (reads VSCR), one needs a
corresponding SET of the value.  It's set in an architecture-specific
manner that isn't described to GCC, but it's set, not just clobbered
and in an undefined state.

The RTL does not describe that VSCR is set to the value 0.  The
(const_int 0) is not the value set.  You can think of the (const_int
0) as a dummy RTL argument to the VSCR UNSPEC.  UNSPEC requires at
least one argument and the pattern doesn't try to express the
argument, so it uses a dummy RTL constant.  It's part of a PARALLEL
and the plus or minus already expresses the data dependency of the
pattern on the input operands.

I'm unsure of the meaning of your question "final ASM doesn't present
it".  The operation on VSCR is implicit and not emitted as an
instruction.  It's in a PARALLEL, which means that the single Altivec
instruction has both effects.  Is that what you were asking?

> Tested pass on P10, OK for master?

This patch is okay.  Thanks for updating the machine description and
for cleaning up the formatting.

> Thanks.
>
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vaddus): Replace
> UNSPEC_VADDU with us_plus.
> (altivec_vaddss): Replace UNSPEC_VADDS with ss_plus.
> (altivec_vsubus): Replace UNSPEC_VSUBU with us_minus.
> (altivec_vsubss): Replace UNSPEC_VSUBS with ss_minus.
> (altivec_abss_): Likewise.
> ---
>  gcc/config/rs6000/altivec.md | 29 ++---
>  1 file changed, 10 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index a057218aa28..b2909857c34 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -29,8 +29,6 @@ (define_c_enum "unspec"
> UNSPEC_VMHADDSHS
> UNSPEC_VMHRADDSHS
> UNSPEC_VADDCUW
> -   UNSPEC_VADDU
> -   UNSPEC_VADDS
> UNSPEC_VAVGU
> UNSPEC_VAVGS
> UNSPEC_VMULEUB
> @@ -61,8 +59,6 @@ (define_c_enum "unspec"
> UNSPEC_VSR
> UNSPEC_VSRO
> UNSPEC_VSUBCUW
> -   UNSPEC_VSUBU
> -   UNSPEC_VSUBS
> UNSPEC_VSUM4UBS
> UNSPEC_VSUM4S
> UNSPEC_VSUM2SWS
> @@ -517,9 +513,8 @@ (define_insn "altivec_vaddcuw"
>
>  (define_insn "altivec_vaddus"
>[(set (match_operand:VI 0 "register_operand" "=v")
> -(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> -   (match_operand:VI 2 "register_operand" "v")]
> -  UNSPEC_VADDU))
> +(us_plus:VI (match_operand:VI 1 "register_operand" "v")
> +   (match_operand:VI 2 "register_operand" "v")))
> (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
>""
>"vaddus %0,%1,%2"
> @@ -527,9 +522,8 @@ (define_insn "altivec_vaddus"
>
>  (define_insn "altivec_vaddss"
>[(set (match_operand:VI 0 "register_operand" "=v")
> -(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> -(match_operand:VI 2 "register_operand" "v")]
> -  UNSPEC_VADDS))
> +(ss_plus:VI (match_operand:VI 1 "register_operand" "v")
> +   (match_operand:VI 2 "register_operand" "v")))
> (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
>"VECTOR_UNIT_ALTIVEC_P (mode)"
>"vaddss %0,%1,%2"
> @@ -563,9 +557,8 @@ (define_insn "altivec_vsubcuw"
>
>  (define_insn "altivec_vsubus"
>[(set (match_operand:VI 0 "register_operand" "=v")
> -(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> -(match_operand:VI 2 "register_operand" "v")]
> -  UNSPEC_VSUBU))
> +(us_minus:VI (match_operand:VI 1 "register_operand" "v")
> +(match_operand:VI 2 "register_operand" "v")))
> (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
>"VECTOR_UNIT_ALTIVEC_P (mode)"
>"vsubus %0,%1,%2"
> @@ -573,9 +566,8 @@ (define_insn "altivec_vsubus"
>
>  (define_insn "altivec_vsubss"
>[(set (match_operand:VI 0 "register_operand" "=v")
> -(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> -(match_operand:VI 2 "register_operand" "v")]
> -  UNSPEC_VSUBS))
> +(ss_minus:VI (match_operand:VI 1 "register_operand" "v")
> +(match_operand:VI 2 "register_operand" "v")))
> (set (reg:SI VSCR_REGNO)

Re: [PATCH, rs6000] Implement mffscrni pattern

2021-12-20 Thread David Edelsohn via Gcc-patches

On Mon, Dec 20, 2021 at 12:56 AM HAO CHEN GUI  wrote:
>
> Hi,
>   I modified the patch according to David and Segher's advice.
>
>   This patch defines a pattern for mffscrni. If the RN is a constant, it can 
> call
> gen_rs6000_mffscrni directly. The "rs6000-builtin-new.def" defines prototype 
> for builtin arguments.
> The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument 
> is DI while its
> corresponding builtin has a const int argument. The patch also fixed it.
>
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. 
> Is this okay for trunk?
> Any recommendations? Thanks a lot.
>
> ChangeLog
> 2021-12-17 Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (u2bit_cint_operand): Defined.
> * config/rs6000/rs6000-call.c
> (rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if
> it's a constant. The pattern for constant can be recognized now.
> * config/rs6000/rs6000.md (UNSPECV_MFFSCRNI): Defined.
> (rs6000_mffscrni): Defined.
> (rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI.
> Call gen_rs6000_mffscrni when operand[0] is a const int[0,3].
>
> gcc/testsuite/
> * gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni.
> * gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to
> test mffscrn and mffscrni separately.

This revised patch is okay.

Thanks, David

>
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index f216ffdf410..b10b4ce6065 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -219,6 +219,11 @@ (define_predicate "u1bit_cint_operand"
>(and (match_code "const_int")
> (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1")))
>
> +;; Return 1 if op is an unsigned 2-bit constant integer.
> +(define_predicate "u2bit_cint_operand"
> +  (and (match_code "const_int")
> +   (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 3")))
> +
>  ;; Return 1 if op is a unsigned 3-bit constant integer.
>  (define_predicate "u3bit_cint_operand"
>(and (match_code "const_int")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index d9736eaf21c..81261a0f24d 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -9610,13 +9610,15 @@ rs6000_expand_set_fpscr_rn_builtin (enum insn_code 
> icode, tree exp)
>   compile time if the argument is a variable.  The least significant two
>   bits of the argument, regardless of type, are used to set the rounding
>   mode.  All other bits are ignored.  */
> -  if (CONST_INT_P (op0) && !const_0_to_3_operand(op0, VOIDmode))
> +  if (CONST_INT_P (op0))
>  {
> -  error ("Argument must be a value between 0 and 3.");
> -  return const0_rtx;
> +  if (!const_0_to_3_operand (op0, VOIDmode))
> +   {
> + error ("Argument must be a value between 0 and 3.");
> + return const0_rtx;
> +   }
>  }
> -
> -  if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
> +  else if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
>  op0 = copy_to_mode_reg (mode0, op0);
>
>pat = GEN_FCN (icode) (op0);
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6bec2bddbde..b18746af7ea 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -177,6 +177,7 @@ (define_c_enum "unspecv"
> UNSPECV_MFFS; Move from FPSCR
> UNSPECV_MFFSL   ; Move from FPSCR light instruction version
> UNSPECV_MFFSCRN ; Move from FPSCR float rounding mode
> +   UNSPECV_MFFSCRNI; Move from FPSCR float rounding mode with imm
> UNSPECV_MFFSCDRN; Move from FPSCR decimal float rounding mode
> UNSPECV_MTFSF   ; Move to FPSCR Fields 8 to 15
> UNSPECV_MTFSF_HI; Move to FPSCR Fields 0 to 7
> @@ -6315,6 +6316,14 @@ (define_insn "rs6000_mffscrn"
> "mffscrn %0,%1"
>[(set_attr "type" "fp")])
>
> +(define_insn "rs6000_mffscrni"
> +  [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> +   (unspec_volatile:DF [(match_operand:SI 1 "u2bit_cint_operand" "n")]
> +   UNSPECV_MFFSCRNI))]
> +   "TARGET_P9_MISC"
> +   "mffscrni %0,%1"
> +  [(set_attr "type" "fp")])
> +
>  (define_insn "rs6000_mffscdrn"
>[(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSCDRN))
> @@ -6324,7 +6333,7 @@ (define_insn "rs6000_mffscdrn"
>[(set_attr "type" "fp")])
>
>  (define_expand "rs6000_set_fpscr_rn"
> - [(match_operand:DI 0 "reg_or_cint_operand")]
> + [(match_operand:SI 0 "reg_or_cint_operand")]
>"TARGET_HARD_FLOAT"
>  {
>rtx tmp_df = gen_reg_rtx (DFmode);
> @@ -6333,9 +6342,15 @@ (define_expand "rs6000_set_fpscr_rn"
>   new rounding mode bits from operands[0][62:63]

Re: [PATCH, rs6000] Implement mffscrni pattern

2021-12-17 Thread David Edelsohn via Gcc-patches

On Thu, Dec 16, 2021 at 9:43 PM HAO CHEN GUI  wrote:
>
> Hi,
>This patch defines a pattern for mffscrni. If the RN is a constant, it can 
> call
> gen_rs6000_mffscrni directly. The "rs6000-builtin-new.def" defines prototype 
> for builtin arguments.
> The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument 
> is DI while its
> corresponding builtin has a const int argument. The patch also fixed it.
>
>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. 
> Is this okay for trunk?
> Any recommendations? Thanks a lot.

Hi, Haochen

I have a question about the mode of the input operand in the new pattern below.

>
> ChangeLog
> 2021-12-17 Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (u2bit_cint_operand): Defined.
> * config/rs6000/rs6000-call.c
> (rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if
> it's a constant. The pattern for constant can be recognized now.
> * config/rs6000/rs6000.md (UNSPECV_MFFSCRNI): Defined.
> (rs6000_mffscrni): Defined.
> (rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI.
> Call gen_rs6000_mffscrni when operand[0] is a const int[0,3].
>
> gcc/testsuite/
> * gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni.
> * gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to
> test mffscrn and mffscrni separately.
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index f216ffd..b10b4ce 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -219,6 +219,11 @@ (define_predicate "u1bit_cint_operand"
>(and (match_code "const_int")
> (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1")))
>
> +;; Return 1 if op is an unsigned 2-bit constant integer.
> +(define_predicate "u2bit_cint_operand"
> +  (and (match_code "const_int")
> +   (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 3")))
> +
>  ;; Return 1 if op is a unsigned 3-bit constant integer.
>  (define_predicate "u3bit_cint_operand"
>(and (match_code "const_int")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index d9736ea..81261a0 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -9610,13 +9610,15 @@ rs6000_expand_set_fpscr_rn_builtin (enum insn_code 
> icode, tree exp)
>   compile time if the argument is a variable.  The least significant two
>   bits of the argument, regardless of type, are used to set the rounding
>   mode.  All other bits are ignored.  */
> -  if (CONST_INT_P (op0) && !const_0_to_3_operand(op0, VOIDmode))
> +  if (CONST_INT_P (op0))
>  {
> -  error ("Argument must be a value between 0 and 3.");
> -  return const0_rtx;
> +  if (!const_0_to_3_operand (op0, VOIDmode))
> +   {
> + error ("Argument must be a value between 0 and 3.");
> + return const0_rtx;
> +   }
>  }
> -
> -  if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
> +  else if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
>  op0 = copy_to_mode_reg (mode0, op0);
>
>pat = GEN_FCN (icode) (op0);
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6bec2bd..291396c 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -177,6 +177,7 @@ (define_c_enum "unspecv"
> UNSPECV_MFFS; Move from FPSCR
> UNSPECV_MFFSL   ; Move from FPSCR light instruction version
> UNSPECV_MFFSCRN ; Move from FPSCR float rounding mode
> +   UNSPECV_MFFSCRNI; Move from FPSCR float rounding mode with imm
> UNSPECV_MFFSCDRN; Move from FPSCR decimal float rounding mode
> UNSPECV_MTFSF   ; Move to FPSCR Fields 8 to 15
> UNSPECV_MTFSF_HI; Move to FPSCR Fields 0 to 7
> @@ -6315,6 +6316,14 @@ (define_insn "rs6000_mffscrn"
> "mffscrn %0,%1"
>[(set_attr "type" "fp")])
>
> +(define_insn "rs6000_mffscrni"
> +  [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> +   (unspec_volatile:DF [(match_operand:DF 1 "u2bit_cint_operand" "n")]

Why is this input operand 1 DFmode?  This is a 2 bit integer value.
This pattern is called from rs6000_set_fpscr_rn with an SImode
operand, and it seems that this should be SImode as well.

Thanks, David

> +   UNSPECV_MFFSCRNI))]
> +   "TARGET_P9_MISC"
> +   "mffscrni %0,%1"
> +  [(set_attr "type" "fp")])
> +
>  (define_insn "rs6000_mffscdrn"
>[(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSCDRN))
> @@ -6324,7 +6333,7 @@ (define_insn "rs6000_mffscdrn"
>[(set_attr "type" "fp")])
>
>  (define_expand "rs6000_set_fpscr_rn"
> - [(match_operand:DI 0 "reg_or_cint_operand")]
> + [(match_operand:SI 0 "reg_or_cint_operand")]
>"TARGET_HARD_FLOAT"
>

Re: [PATCH v4 0/6] __builtin_dynamic_object_size

2021-12-17 Thread David Edelsohn via Gcc-patches

Siddhesh,

This patch series seems to have caused testsuite regressions for
memcpy-chk, etc. in 32 bit mode (i386, x86-64 -m32 and -mx32, AIX 32
bit).

I have opened PR 103759.

Thanks, David

Re: [PATCH 6/6] rs6000: Rename arrays to remove temporary _x suffix

2021-12-14 Thread David Edelsohn via Gcc-patches

On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> While we had two sets of built-in infrastructure at once, I added _x as a
> suffix to two arrays to disambiguate the old and new versions.  Time to fix
> that also.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-06  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (altivec_resolve_overloaded_builtin): Likewise.  Also rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (rs6000_builtin_is_supported): Likewise.
> (rs6000_gimple_fold_mma_builtin): Likewise.  Also rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (rs6000_gimple_fold_builtin): Rename rs6000_builtin_info_x to
> rs6000_builtin_info.
> (cpu_expand_builtin): Likewise.
> (rs6000_expand_builtin): Likewise.
> (rs6000_init_builtins): Likewise.  Also rename rs6000_builtin_decls_x
> to rs6000_builtin_decls.
> (rs6000_builtin_decl): Rename rs6000_builtin_decls_x to
> rs6000_builtin_decls.
> * config/rs6000/rs6000-gen-builtins.c (write_decls): In generated 
> code,
> rename rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_bif_static_init): In generated code, rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_init_bif_table): In generated code, rename
> rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_init_ovld_table): In generated code, rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (write_init_file): Likewise.
> * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function):
> Likewise.
> (rs6000_builtin_md_vectorized_function): Likewise.
> (rs6000_builtin_reciprocal): Likewise.
> (add_condition_to_bb): Likewise.
> (rs6000_atomic_assign_expand_fenv): Likewise.

Okay.

Thanks, David

Re: [PATCH 5/6] rs6000: Rename functions with "new" in their names

2021-12-14 Thread David Edelsohn via Gcc-patches

On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> While we had two sets of built-in functionality at the same time, I put "new"
> in the names of quite a few functions.  Time to undo that.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin):
> Remove forward declaration.
> (rs6000_new_builtin_type_compatible): Rename to
> rs6000_builtin_type_compatible.
> (rs6000_builtin_type_compatible): Remove.
> (altivec_resolve_overloaded_builtin): Remove.
> (altivec_build_new_resolved_builtin): Rename to
> altivec_build_resolved_builtin.
> (altivec_resolve_new_overloaded_builtin): Rename to
> altivec_resolve_overloaded_builtin.  Remove static keyword.  Adjust
> called function names.
> * config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Remove
> forward declaration.
> (rs6000_gimple_fold_new_builtin): Likewise.
> (rs6000_invalid_new_builtin): Rename to rs6000_invalid_builtin.
> (rs6000_gimple_fold_builtin): Remove.
> (rs6000_new_builtin_valid_without_lhs): Rename to
> rs6000_builtin_valid_without_lhs.
> (rs6000_new_builtin_is_supported): Rename to
> rs6000_builtin_is_supported.
> (rs6000_gimple_fold_new_mma_builtin): Rename to
> rs6000_gimple_fold_mma_builtin.
> (rs6000_gimple_fold_new_builtin): Rename to
> rs6000_gimple_fold_builtin.  Remove static keyword.  Adjust called
> function names.
> (rs6000_expand_builtin): Remove.
> (new_cpu_expand_builtin): Rename to cpu_expand_builtin.
> (new_mma_expand_builtin): Rename to mma_expand_builtin.
> (new_htm_spr_num): Rename to htm_spr_num.
> (new_htm_expand_builtin): Rename to htm_expand_builtin.  Change name
> of called function.
> (rs6000_expand_new_builtin): Rename to rs6000_expand_builtin.  Remove
> static keyword.  Adjust called function names.
> (rs6000_new_builtin_decl): Rename to rs6000_builtin_decl.  Remove
> static keyword.
> (rs6000_builtin_decl): Remove.
> * config/rs6000/rs6000-gen-builtins.c (write_decls): In gnerated code,
> rename rs6000_new_builtin_is_supported to rs6000_builtin_is_supported.
> * config/rs6000/rs6000-internal.h (rs6000_invalid_new_builtin): Rename
> to rs6000_invalid_builtin.
> * config/rs6000/rs6000.c (rs6000_new_builtin_vectorized_function):
> Rename to rs6000_builtin_vectorized_function.
> (rs6000_new_builtin_md_vectorized_function): Rename to
> rs6000_builtin_md_vectorized_function.
> (rs6000_builtin_vectorized_function): Remove.
> (rs6000_builtin_md_vectorized_function): Remove.

Okay.

Thanks, David

Re: [PATCH 4/6] rs6000: Remove rs6000-builtin.def and associated data and functions

2021-12-14 Thread David Edelsohn via Gcc-patches

On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> The old rs6000-builtin.def file is no longer needed.  Remove it and the code
> that depends on it.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-builtin.def: Delete.
> * config/rs6000/rs6000-call.c (builtin_compatibility): Delete.
> (builtin_description): Delete.
> (builtin_hash_struct): Delete.
> (builtin_hasher): Delete.
> (builtin_hash_table): Delete.
> (builtin_hasher::hash): Delete.
> (builtin_hasher::equal): Delete.
> (rs6000_builtin_info_type): Delete.
> (rs6000_builtin_info): Delete.
> (bdesc_compat): Delete.
> (bdesc_3arg): Delete.
> (bdesc_4arg): Delete.
> (bdesc_dst): Delete.
> (bdesc_2arg): Delete.
> (bdesc_altivec_preds): Delete.
> (bdesc_abs): Delete.
> (bdesc_1arg): Delete.
> (bdesc_0arg): Delete.
> (bdesc_htm): Delete.
> (bdesc_mma): Delete.
> (rs6000_overloaded_builtin_p): Delete.
> (rs6000_overloaded_builtin_name): Delete.
> (htm_spr_num): Delete.
> (rs6000_builtin_is_supported_p): Delete.
> (rs6000_gimple_fold_mma_builtin): Delete.
> (gt-rs6000-call.h): Remove include directive.
> * config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p): Delete.
> (rs6000_builtin_is_supported_p): Delete.
> (rs6000_overloaded_builtin_name): Delete.
> * config/rs6000/rs6000.c (rs6000_builtin_decls): Delete.
> (rs6000_debug_reg_global): Remove reference to RS6000_BUILTIN_COUNT.
> * config/rs6000/rs6000.h (rs6000_builtins): Delete.
> (altivec_builtin_types): Delete.
> (rs6000_builtin_decls): Delete.
> * config/rs6000/t-rs6000 (TM_H): Don't add rs6000-builtin.def.

Okay.

Thanks, David

Re: [PATCH 3/6] rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def

2021-12-14 Thread David Edelsohn via Gcc-patches

On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> This patch just renames a file and updates the build machinery accordingly.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-builtin-new.def: Rename to...
> * config/rs6000/rs6000-builtins.def: ...this.
> * config/rs6000/rs6000-gen-builtins.c: Adjust header commentary.
> * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Rename
> rs6000-builtin-new.def to rs6000-builtins.def.
> (rs6000-builtins.c): Likewise.

Okay.

Thanks, David

Re: [PATCH 2/6] rs6000: Remove altivec_overloaded_builtins array and initialization

2021-12-14 Thread David Edelsohn via Gcc-patches

On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> This patch just removes the huge altivec_overloaded_builtins array.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Remove.
> * config/rs6000/rs6000.h (altivec_overloaded_builtins): Remove.

Okay.

Thanks, David

Re: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.

2021-12-14 Thread David Edelsohn via Gcc-patches

On Fri, Nov 5, 2021 at 3:38 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:11 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for scalars on power10.
> >
> > This patch implements XXSPLTIDP support for SF, and DF scalar constants.
> > The previous patch added support for vector constants.  This patch adds
> > the support for SFmode and DFmode scalar constants.
> >
> > I added 2 new tests to test loading up SF and DF scalar constants.
>
>
> ok
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
> >   (UNSPEC_XXSPLTIW_CONST): New unspec.
> >   (movsf_hardfloat): Add support for generating XXSPLTIDP.
> >   (mov_hardfloat32): Likewise.
> >   (mov_hardfloat64): Likewise.
> >   (xxspltidp__internal): New insns.
> >   (xxspltiw__internal): New insns.
> >   (splitters for SF/DFmode): Add new splitters for XXSPLTIDP.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/vec-splat-constant-df.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
> > ---
>
> ok
>
>
> >  gcc/config/rs6000/rs6000.md   | 97 +++
> >  .../powerpc/vec-splat-constant-df.c   | 60 
> >  .../powerpc/vec-splat-constant-sf.c   | 60 
> >  3 files changed, 199 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> >
> > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > index 3a7bcd2426e..4122acb98cf 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -156,6 +156,8 @@ (define_c_enum "unspec"
> > UNSPEC_PEXTD
> > UNSPEC_HASHST
> > UNSPEC_HASHCHK
> > +   UNSPEC_XXSPLTIDP_CONST
> > +   UNSPEC_XXSPLTIW_CONST
> >])
> >
> >  ;;
> > @@ -7764,17 +7766,17 @@ (define_split
> >  ;;
> >  ;;   LWZ  LFSLXSSP   LXSSPX STFS   STXSSP
> >  ;;   STXSSPX  STWXXLXOR  LI FMRXSCPSGNDP
> > -;;   MR   MT  MF   NOP
> > +;;   MR   MT  MF   NOPXXSPLTIDP
> >
> >  (define_insn "movsf_hardfloat"
> >[(set (match_operand:SF 0 "nonimmediate_operand"
> >"=!r,   f, v,  wa,m, wY,
> > Z, m, wa, !r,f, wa,
> > -   !r,*c*l,  !r, *h")
> > +   !r,*c*l,  !r, *h,wa")
> >   (match_operand:SF 1 "input_operand"
> >"m, m, wY, Z, f, v,
> > wa,r, j,  j, f, wa,
> > -   r, r, *h, 0"))]
> > +   r, r, *h, 0, eP"))]
> >"(register_operand (operands[0], SFmode)
> > || register_operand (operands[1], SFmode))
> > && TARGET_HARD_FLOAT
> > @@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat"
> > mr %0,%1
> > mt%0 %1
> > mf%1 %0
> > -   nop"
> > +   nop
> > +   #"
> >[(set_attr "type"
> >   "load,   fpload,fpload, fpload,fpstore,   fpstore,
> >fpstore,store, veclogical, integer,   fpsimple,  fpsimple,
> > -  *,  mtjmpr,mfjmpr, *")
> > +  *,  mtjmpr,mfjmpr, *, vecperm")
> > (set_attr "isa"
> >   "*,  *, p9v,p8v,   *, p9v,
> >p8v,*, *,  *, *, *,
> > -  *,  *, *,  *")])
> > +  *,  *, *,  *, p10")])
> >
> >  ;;   LWZ  LFIWZX STWSTFIWX MTVSRWZMFVSRWZ
> >  ;;   FMR  MR MT%0   MF%1   NOP
> > @@ -8064,18 +8067,18 @@ (define_split
> >
> >  ;;   STFD LFD FMR LXSDSTXSD
> >  ;;   LXSD STXSD   XXLOR   XXLXOR  GPR<-0
> > -;;   LWZ  STW MR
> > +;;   LWZ  STW MR  XXSPLTIDP
> >
> >
> >  (define_insn "*mov_hardfloat32"
> >[(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> >  "=m,  d,  d,  ,   wY,
> >,   Z,  ,  ,  !r,
> > -  Y,  r,  !r")
> > +  Y,  r,  !r, wa")
> >   (match_operand:FMOVE64 1 "input_operand"
> >   "d,  m,  d,  wY, ,
> >Z,  ,   ,  ,  ,
> > -  r,  Y,  r"))]
> > +  r,  Y,  r,  eP"))]
> >"! TARGET_POWERPC64 && TARGET_HARD_FLOAT
> > && (gpc_reg_operand (operands[0], mode)
> > || gpc_reg_operand (operands[1], mode))"
> > @@ -8092,20 +8095,21 @@ (define_insn

Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants

2021-12-14 Thread David Edelsohn via Gcc-patches

On Fri, Nov 5, 2021 at 3:24 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for vectors on power10.
> >
> > This patch implements XXSPLTIDP support for all vector constants.  The
> > XXSPLTIDP instruction is given a 32-bit immediate that is converted to a 
> > vector
> > of two DFmode constants.  The immediate is in SFmode format, so only 
> > constants
> > that fit as SFmode values can be loaded with XXSPLTIDP.
> >
> > The constraint (eP) added in the previous patch for XXSPLTIW is also used
> > for XXSPLTIDP.
> >
>
> ok
>
>
> > DImode scalar constants are not handled.  This is due to the majority of 
> > DImode
> > constants will be in the GPR registers.  With vector registers, you have the
> > problem that XXSPLTIDP splats the double word into both elements of the
> > vector.  However, if TImode is loaded with an integer constant, it wants a 
> > full
> > 128-bit constant.
>
> This may be worth as adding to a todo somewhere in the code.
>
> >
> > SFmode and DFmode scalar constants are not handled in this patch.  The
> > support for for those constants will be in the next patch.
>
> ok
>
> >
> > I have added a temporary switch (-msplat-float-constant) to control whether 
> > or
> > not the XXSPLTIDP instruction is generated.
> >
> > I added 2 new tests to test loading up V2DI and V2DF vector constants.
>
>
>
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >   generating XXSPLTIDP.
> >   (vsx_prefixed_constant): Likewise.
> >   (easy_vector_constant): Likewise.
> >   * config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
> >   New declaration.
> >   * config/rs6000/rs6000.c (output_vec_const_move): Add support for
> >   generating XXSPLTIDP.
> >   (prefixed_xxsplti_p): Likewise.
> >   (constant_generates_xxspltidp): New function.
> >   * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
> >   regex for power10.
> >   * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
> > ---
>
>
> ok
>
> >  gcc/config/rs6000/predicates.md   |   9 ++
> >  gcc/config/rs6000/rs6000-protos.h |   1 +
> >  gcc/config/rs6000/rs6000.c| 108 ++
> >  gcc/config/rs6000/rs6000.opt  |   4 +
> >  .../powerpc/pr86731-fwrapv-longlong.c |   9 +-
> >  .../powerpc/vec-splat-constant-v2df.c |  64 +++
> >  .../powerpc/vec-splat-constant-v2di.c |  50 
> >  7 files changed, 241 insertions(+), 4 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> >
> > diff --git a/gcc/config/rs6000/predicates.md 
> > b/gcc/config/rs6000/predicates.md
> > index ed6252bd0c4..d748b11857c 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant"
> >
> >if (constant_generates_xxspltiw (_const))
> >   return true;
> > +
> > +  if (constant_generates_xxspltidp (_const))
> > + return true;
> >  }
> >
> >/* Otherwise consider floating point constants hard, so that the
> > @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant"
> >if (constant_generates_xxspltiw (_const))
> >  return true;
> >
> > +  if (constant_generates_xxspltidp (_const))
> > +return true;
> > +
> >return false;
> >  })
> >
> > @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant"
> >
> > if (constant_generates_xxspltiw (_const))
> >   return true;
> > +
> > +   if (constant_generates_xxspltidp (_const))
> > + return true;
> >   }
>
>
> ok
>
> >
> >if (TARGET_P9_VECTOR
> > diff --git a/gcc/config/rs6000/rs6000-protos.h 
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 99c6a671289..2d28df7442d 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, 
> > machine_mode,
> >  vec_const_128bit_type *);
> >  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
> >  extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
> > +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *);
> >  #endif /* RTX_CODE */
> >
> >  #ifdef TREE_CODE
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index be24f56eb31..8fde48cf2b3 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx *operands)
> >

Re: [PATCH 3/5] Add Power10 XXSPLTIW

2021-12-14 Thread David Edelsohn via Gcc-patches

On Fri, Nov 5, 2021 at 2:50 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:09 -0400, Michael Meissner wrote:
> > Generate XXSPLTIW on power10.
> >
>
> Hi,
>
>
> > This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
> > instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
> > adding support for vector constants that can be used, and adding a
> > VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
> >
> > The eP constraint was added to recognize constants that can be loaded into
> > vector registers with a single prefixed instruction.
>
> Perhaps Swap "... the eP constraint was added ..."  for "Add the eP
> constraint to ..."
>
>
> >
> > I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
> > constants.
>
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/constraints.md (eP): Update comment.
> >   * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >   generating XXSPLTIW.
> >   (vsx_prefixed_constant): New predicate.
> >   (easy_vector_constant): Add support for
> >   generating XXSPLTIW.
> >   * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
> >   declaration.
> >   (constant_generates_xxspltiw): Likewise.
> >   * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
> >   XXSPLTIW, don't do XXSPLTIB and sign extend.
>
> Perhaps just 'generate XXSPLTIW if possible'.
>
> >   (output_vec_const_move): Add support for XXSPLTIW.
> >   (prefixed_xxsplti_p): New function.
> >   (constant_generates_xxspltiw): New function.
> >   * config/rs6000/rs6000.md (prefixed attribute): Add support to
> >   mark XXSPLTI* instructions as being prefixed.
> >   * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
> >   switch.
> >   * config/rs6000/vsx.md (vsx_mov_64bit): Add support for
> >   generating XXSPLTIW or XXSPLTIDP.
> >   (vsx_mov_32bit): Likewise.
> >   * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
> >   eP constraint.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
> >   * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
> > ---
> >  gcc/config/rs6000/constraints.md  |  6 ++
> >  gcc/config/rs6000/predicates.md   | 46 ++-
> >  gcc/config/rs6000/rs6000-protos.h |  2 +
> >  gcc/config/rs6000/rs6000.c| 81 +++
> >  gcc/config/rs6000/rs6000.md   |  5 ++
> >  gcc/config/rs6000/rs6000.opt  |  4 +
> >  gcc/config/rs6000/vsx.md  | 28 +++
> >  gcc/doc/md.texi   |  4 +
> >  .../powerpc/vec-splat-constant-v16qi.c| 27 +++
> >  .../powerpc/vec-splat-constant-v4sf.c | 67 +++
> >  .../powerpc/vec-splat-constant-v4si.c | 51 
> >  .../powerpc/vec-splat-constant-v8hi.c | 62 ++
> >  .../gcc.target/powerpc/vec-splati-runnable.c  |  4 +-
> >  13 files changed, 369 insertions(+), 18 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
> >
> > diff --git a/gcc/config/rs6000/constraints.md 
> > b/gcc/config/rs6000/constraints.md
> > index e72132b4c28..a4b05837fa6 100644
> > --- a/gcc/config/rs6000/constraints.md
> > +++ b/gcc/config/rs6000/constraints.md
> > @@ -213,6 +213,12 @@ (define_constraint "eI"
> >"A signed 34-bit integer constant if prefixed instructions are 
> > supported."
> >(match_operand 0 "cint34_operand"))
> >
> > +;; A SF/DF scalar constant or a vector constant that can be loaded into 
> > vector
> > +;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
> > +(define_constraint "eP"
> > +  "A constant that can be loaded into a VSX register with one prefixed 
> > insn."
> > +  (match_operand 0 "vsx_prefixed_constant"))
> > +
> >  ;; A TF/KF scalar constant or a vector constant that can load certain IEEE
> >  ;; 128-bit constants into vector registers using LXVKQ.
> >  (define_constraint "eQ"
> > diff --git a/gcc/config/rs6000/predicates.md 
> > b/gcc/config/rs6000/predicates.md
> > index e0d1c718e9f..ed6252bd0c4 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant"
> >vec_const_128bit_type vsx_const;
> >if (TARGET_POWER10 &&

Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)

2021-12-14 Thread David Edelsohn via Gcc-patches

On Fri, Nov 5, 2021 at 2:13 PM Michael Meissner  wrote:
>
> On Fri, Nov 05, 2021 at 12:01:43PM -0500, will schmidt wrote:
> > On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote:
> > > Add new constant data structure.
> > >
> > > This patch provides the data structure and function to convert a
> > > CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
> > > an array of bytes, half-words, words, and  double words that can be loaded
> > > into a 128-bit vector register.
> > >
> > > The next patches will use this data structure to generate code that
> > > generates load of the vector/floating point registers using the XXSPLTIDP,
> > > XXSPLTIW, and LXVKQ instructions that were added in power10.
> > >
> > > 2021-11-05  Michael Meissner  
> > >
>
> Whoops, it should be meiss...@linux.ibm.com.
>
> > comment to be explicit on the structure name being copied to/from.
> > (vec_const_128bit_type is easy to search for, vector or constant or
> > structure are not as unique)
>
> Yes, the original name was more generic (rs6000_const).  Originally it could
> potentially handle vector constants that were greater than 128-bits if we ever
> have support for larger vectors.  But I thought that extra generallity 
> hindered
> the code (since you had to check whether the size was exactly 128-bits, etc.).
> So I made the data structure tailored to the problem at hand.
>
> > > +
> > > +/* Copy an floating point constant to the vector constant structure.  */
> > > +
> >
> > s/an/a/
>
> Ok.
>
> > > +static void
> > > +constant_fp_to_128bit_vector (rtx op,
> > > + machine_mode mode,
> > > + size_t byte_num,
> > > + vec_const_128bit_type *info)
> > > +{
> > > +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> > > +  unsigned num_words = bitsize / 32;
> > > +  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
> > > +  long real_words[VECTOR_128BIT_WORDS];
> > > +
> > > +  /* Make sure we don't overflow the real_words array and that it is
> > > + filled completely.  */
> > > +  gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);
> >
> > Not clear to me on the potential to partially fill the real_words
> > array.
>
> At the moment we don't support a 16-bit floating point type in the compiler
> (the Power10 has limited 16-bit floating point support, but we don't make a
> special type for it).  If/when we add the 16-bit floating point, we will
> possibly need to revisit this.
>
> > > +
> > > +  real_to_target (real_words, rtype, mode);
> > > +
> > > +  /* Iterate over each 32-bit word in the floating point constant.  The
> > > + real_to_target function puts out words in endian fashion.  We need
> >
> > Meaning host-endian fashion, or is that meant to be big-endian ?
>
> Real_to_target puts out the 32-bit values in endian fashion.  This data
> structure wants to hold everything in big endian fashion to make checking
> things simpler.
>
> > Perhaps also rephrase or move the comment up to indicate that
> > real_to_target will have placed or has already placed the words in
> >  endian fashion.
> > As stated I was expecting to see a call to real_to_target() below the
> > comment.
>
> Yes, I probably should move the real_to_target call after the comment.
>
> > > +
> > > +  /* Possibly splat the constant to fill a vector size.  */
> >
> >
> > Suggest "Splat the constant to fill a vector size if ..."
>
> Ok.

Okay.

Thanks, David

1 2 3 4 >

1 - 100 of 355 matches

Mail list logo