Re: RISC-V: Added support for CRC.

2023-08-16 Thread Paul Koning via Gcc-patches



> On Aug 16, 2023, at 3:42 PM, Philipp Tomsich  wrote:
> 
> On Wed, 16 Aug 2023 at 21:10, Alexander Monakov  wrote:
>> 
>> 
>> On Tue, 15 Aug 2023, Jeff Law wrote:
>> 
>>> Because if the compiler can optimize it automatically, then the projects 
>>> have
>>> to do literally nothing to take advantage of it.  They just compile normally
>>> and their bitwise CRC gets optimized down to either a table lookup or a 
>>> clmul
>>> variant.  That's the real goal here.
>> 
>> The only high-profile FOSS project that carries a bitwise CRC implementation
>> I'm aware of is the 'xz' compression library. There bitwise CRC is used for
>> populating the lookup table under './configure --enable-small':
>> 
>> https://github.com/tukaani-project/xz/blob/2b871f4dbffe3801d0da3f89806b5935f758d5f3/src/liblzma/check/crc64_small.c
>> 
>> It's a well-reasoned choice and your compiler would be undoing it
>> (reintroducing the table when the bitwise CRC is employed specifically
>> to avoid carrying the table).

Is that compiled with -Os?  It would seem sensible for that to be the case, and 
for the table optimization to be suppressed if that switch is used.

paul




Re: [RFC] GCC Security policy

2023-08-16 Thread Paul Koning via Gcc-patches



> On Aug 16, 2023, at 3:53 AM, Alexander Monakov  wrote:
> 
>> ...
>> Is "timing-safety" a security property?  Not the way I understand that
>> term.  It sounds like another way to say that the code meets real time
>> constraints or requirements.
> 
> I meant in the sense of not admitting timing attacks:
> https://en.wikipedia.org/wiki/Timing_attack
> 
>> No, compilers don't help with that (at least C doesn't -- Ada might be
>> better here but I don't know enough).  For sufficiently strict
>> requirements you'd have to examine both the generated machine code and
>> understand, in gruesome detail, what the timing behaviors of the executing
>> hardware are.  Good luck if it's a modern billion-transistor machine.
> 
> Yes. On the other hand, the reality in the FOSS ecosystem is that
> cryptographic libraries heavily lean on the ability to express
> a constant-time algorithm in C and get machine code that is actually
> constant-time. There's a bit of a conflict here between what we
> can promise and what people might expect of GCC, and it seems
> relevant when discussing what goes into the Security Policy.

I agree.  What should be said is that such techniques are erroneous.  The kind 
of code you're talking about inserts steps not strictly needed for the 
calculation to make it constant time (or more nearly so).  But clearly that has 
to rely on an assumption that the optimizer isn't smart enough to spot those 
unnecessary operations and delete them.  Never mind the fact that it relies on 
a notion that C statements have timing properties in the first place, which the 
standard doesn't do.

So I would argue that a serious attempt to cure timing attacks has to be coded 
in assembly language.  Even then, of course, optimizations in modern machine 
pipelines may give you trouble, but at least in that case you're writing 
explicitly for a specific ISA and are in a position to take into account its 
timing properties, to the extent they are known and defined.

paul




Re: [RFC] GCC Security policy

2023-08-15 Thread Paul Koning via Gcc-patches



> On Aug 15, 2023, at 8:37 PM, Alexander Monakov  wrote:
> 
>> ...
>> At some point the system tools need to respect the programmer or operator.
>> There is a difference between writing "Hello, World" and writing
>> performance critical or safety critical code.  That is the responsibility
>> of the programmer and the development team to choose the right software
>> engineers and right tools.  And to have the development environment and
>> checks in place to ensure that the results are meeting the requirements.
>> 
>> It is not the role of GCC or its security policy to tell people how to do
>> their job or hobby.  This isn't a safety tag required to be attached to a
>> new mattress.
> 
> Yes (though I'm afraid the analogy with the mattress is a bit lost on me).
> Those examples were meant to illustrate the point I tried to make earlier,
> not as additions proposed for the Security Policy. Specific examples
> where we can tell people in advance that compiler output needs to be
> verified, because the compiler is not engineered to preserve those
> security-relevant properties from the source code (and we would not
> accept such accidents as security bugs).

Now I'm confused.  I thought the whole point of what GCC is trying to, and 
wants to document, is that it DOES preserve security properties.  If the source 
code is standards-compliant and contains algorithms free of security holes, 
then the compiler is supposed to deliver output code that is likewise free of 
holes -- in other words, the transformation performed by GCC does not introduce 
holes in a hole-free input.

> Granted, it is a bit of a stretch since the notion of timing-safety is
> not really well-defined for C source code, but I didn't come up with
> better examples.

Is "timing-safety" a security property?  Not the way I understand that term.  
It sounds like another way to say that the code meets real time constraints or 
requirements.  No, compilers don't help with that (at least C doesn't -- Ada 
might be better here but I don't know enough).  For sufficiently strict 
requirements you'd have to examine both the generated machine code and 
understand, in gruesome detail, what the timing behaviors of the executing 
hardware are.  Good luck if it's a modern billion-transistor machine.

Again, I don't see that as a security property.  If it's considered desirable 
to say something about this, fine, but the words Siddesh crafted don't fit for 
that kind of property.

paul



Re: [RFC] GCC Security policy

2023-08-15 Thread Paul Koning via Gcc-patches



> On Aug 15, 2023, at 10:07 AM, Alexander Monakov  wrote:
> 
> 
> On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:
> 
>> Does this as the first paragraph address your concerns:
> 
> Thanks, this is nicer (see notes below). My main concern is that we shouldn't
> pretend there's some method of verifying that arbitrary source code is "safe"
> to pass to an unsandboxed compiler, nor should we push the responsibility of
> doing that on users.

Perhaps, but clearly the compiler can't do it ("Halting problem"...) so it has 
to be clear that the solution must be outside the compiler.  

paul



Re: [RFC] GCC Security policy

2023-08-11 Thread Paul Koning via Gcc-patches



> On Aug 11, 2023, at 10:36 AM, Siddhesh Poyarekar  wrote:
> 
> On 2023-08-10 14:50, Siddhesh Poyarekar wrote:
   As a result, the only case for a potential security issue in all
   these cases is when it ends up generating vulnerable output for
   valid input source code.
>>> 
>>> I think this leaves open the interpretation "every wrong code bug
>>> is potentially a security bug".  I suppose that's true in a trite sense,
>>> but not in a useful sense.  As others said earlier in the thread,
>>> whether a wrong code bug in GCC leads to a security bug in the object
>>> code is too application-dependent to be a useful classification for GCC.
>>> 
>>> I think we should explicitly say that we don't generally consider wrong
>>> code bugs to be security bugs.  Leaving it implicit is bound to lead
>>> to misunderstanding.
>> I see what you mean, but the context-dependence of a bug is something GCC 
>> will have to deal with, similar to how libraries have to deal with bugs.  
>> But I agree this probably needs some more expansion.  Let me try and come up 
>> with something more detailed for that last paragraph.
> 
> How's this:
> 
> As a result, the only case for a potential security issue in the compiler is 
> when it generates vulnerable application code for valid, trusted input source 
> code.  The output application code could be considered vulnerable if it 
> produces an actual vulnerability in the target application, specifically in 
> the following cases:

You might make it explicit that we're talking about wrong code errors here -- 
in other words, the source code is correct (conforms to the standard) and the 
algorithm expressed in the source code does not have a vulnerability, but the 
generated code has semantics that differ from those of the source code such 
that it does have a vulnerability.

> - The application dereferences an invalid memory location despite the 
> application sources being valid.
> 
> - The application reads from or writes to a valid but incorrect memory 
> location, resulting in an information integrity issue or an information leak.
> 
> - The application ends up running in an infinite loop or with severe 
> degradation in performance despite the input sources having no such issue, 
> resulting in a Denial of Service.  Note that correct but non-performant code 
> is not a security issue candidate, this only applies to incorrect code that 
> may result in performance degradation.

The last sentence somewhat contradicts the preceding one.  Perhaps "...may 
result in performance degradation severe enough to amount to a denial of 
service".

> - The application crashes due to the generated incorrect code, resulting in a 
> Denial of Service.

paul



Re: RISC-V: Added support for CRC.

2023-08-09 Thread Paul Koning via Gcc-patches



> On Aug 9, 2023, at 2:32 AM, Alexander Monakov  wrote:
> 
> 
> On Tue, 8 Aug 2023, Jeff Law wrote:
> 
>> If the compiler can identify a CRC and collapse it down to a table or clmul,
>> that's a major win and such code does exist in the real world. That was the
>> whole point behind the Fedora experiment -- to determine if these things are
>> showing up in the real world or if this is just a benchmarking exercise.
> 
> Can you share the results of the experiment and give your estimate of what
> sort of real-world improvement is expected? I already listed the popular
> FOSS projects where CRC performance is important: the Linux kernel and
> a few compression libraries. Those projects do not use a bitwise CRC loop,
> except sometimes for table generation on startup (which needs less time
> than a page fault that may be necessary to bring in a hardcoded table).
> 
> For those projects that need a better CRC, why is the chosen solution is
> to optimize it in the compiler instead of offering them a library they
> could use with any compiler?
> 
> Was there any thought given to embedded projects that use bitwise CRC
> exactly because they little space for a hardcoded table to spare?

Or those that use smaller tables -- for example, the classic VAX microcode 
approach with a 16-entry table, doing CRC 4 bits at a time.

I agree that this seems an odd thing to optimize.  CRC is a well known CPU hog 
with well established efficient solutions, and it's hard to see  why anyone who 
needs good performance would fail to understand and apply that knowledge.

paul




Re: [RFC] GCC Security policy

2023-08-08 Thread Paul Koning via Gcc-patches



> On Aug 8, 2023, at 11:55 AM, Siddhesh Poyarekar  wrote:
> 
> On 2023-08-08 11:48, David Malcolm wrote:
>> On Tue, 2023-08-08 at 09:33 -0400, Paul Koning via Gcc-patches wrote:
>>> 
>>> 
>>>> On Aug 8, 2023, at 9:01 AM, Jakub Jelinek via Gcc-patches
>>>>  wrote:
>>>> 
>>>> On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via Gcc-
>>>> patches wrote:
>>>>> There's probably external tools to do this, not sure if we should
>>>>> replicate
>>>>> things in the driver for this.
>>>>> 
>>>>> But sure, I think the driver is the proper point to address any
>>>>> of such
>>>>> issues - iff we want to address them at all.  Maybe a nice little
>>>>> google summer-of-code project ;)
>>>> 
>>>> What I'd really like to avoid is having all compiler bugs
>>>> (primarily ICEs)
>>>> considered to be security bugs (e.g. DoS category), it would be
>>>> terrible to
>>>> release every week a new compiler because of the "security" issues.
>>> 
>>> Indeed.  But my answer would be that such things are not DoS issues.
>>> DoS means that an external input, over which you have little control,
>>> is impairing service.  In the case of a compiler, if feeding it bad
>>> source code X.c causes it to crash, the answer is "well, then don't
>>> do that".
>> Agreed.
>> I'm not sure how to "wordsmith" this, but it seems like the sources and
>> options on the *host* are assumed to be trusted, and that the act of
>> *compiling* source on the host requires trusting them, just like the
>> act of executing the compiled code on the target does.  Though users
>> may be more familiar with sandboxing the target than the host.
>> We should spell this out further for libgccjit: libgccjit allows for
>> ahead-of-time and JIT compilation of sources - but it assumes that
>> those sources (and the compilation options) are trusted.
>> [Adding Andrea Corallo to the addressees]
>> For example, Emacs is using libgccjit to do ahead-of-time compilation
>> of Emacs bytecode.  I'm assuming that Emacs is assuming that its
>> bytecode is trusted, and that there isn't any attempt by Emacs to
>> sandbox the Emacs Lisp being processed.
>> However, consider a situation in which someone attempted to, say, embed
>> libgccjit inside a web browser to generate machine code from
>> JavaScript, where the JavaScript is potentially controlled by an
>> attacker.  I think we want to explicitly say that that if you're going
>> to do that, you need to put some other layer of defense in, so that
>> you're not blithely accepting the inputs to the compilation (sources
>> and options) from a potentially hostile source, where a crafted input
>> sources could potentially hit an ICE in the compiler and thus crash the
>> web browser.
> 
> +1, this is precisely the kind of thing the security policy should warn 
> against and suggest using sandboxing for.  The compiler (or libgccjit) isn't 
> really in a position to defend such uses, ICE or otherwise.

I agree somewhat.  But only somewhat, because the compiler's job is not to 
crash even if presented with bad inputs.  An ICE is a bug, which of course 
we've always accepted.  But as several have agreed, it's not a DoS bug, 
therefore not a security bug.

The scenario of the web browser is a valid one, and I would use it to 
illustrate a general point, which is redundancy in safety measures. If inputs 
come from possibly hostile sources, it's sound practice to have multiple layers 
of protection.  The consuming software should be robust so it doesn't fail when 
subjected to bad inputs.  But additional layers of protection in case there is 
a defect in the first layer are valuable, and sandboxing or the like (chroot, 
for example) can provide that additional defense.  This isn't really a GCC 
issue but rather a general principle of prudence.

paul



Re: [RFC] GCC Security policy

2023-08-08 Thread Paul Koning via Gcc-patches



> On Aug 8, 2023, at 9:01 AM, Jakub Jelinek via Gcc-patches 
>  wrote:
> 
> On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via Gcc-patches 
> wrote:
>> There's probably external tools to do this, not sure if we should replicate
>> things in the driver for this.
>> 
>> But sure, I think the driver is the proper point to address any of such
>> issues - iff we want to address them at all.  Maybe a nice little
>> google summer-of-code project ;)
> 
> What I'd really like to avoid is having all compiler bugs (primarily ICEs)
> considered to be security bugs (e.g. DoS category), it would be terrible to
> release every week a new compiler because of the "security" issues.

Indeed.  But my answer would be that such things are not DoS issues.  DoS means 
that an external input, over which you have little control, is impairing 
service.  In the case of a compiler, if feeding it bad source code X.c causes 
it to crash, the answer is "well, then don't do that".

paul




Re: [PATCH] fix pdp11_expand_epilogue (PR target/107841)

2023-07-13 Thread Paul Koning via Gcc-patches



> On Jul 13, 2023, at 12:47 PM, Mikael Pettersson  wrote:
> 
> If the stack frame only contains an alloca area, then
> pdp11_expand_epilogue fails to deallocate it, resulting
> in callee-saved registers and the return address being
> restored from the wrong stack slots.  Fixed by adding
> || cfun->calls_alloca to the condition for deallocating
> the frame.
> 
> Tested with a cross to pdp11-unknown-aout.
> 
> Ok for master? (Note: I don't have commit rights.)
> 
> gcc/
> 
>   PR target/107841
>   * config/pdp11/pdp11.c (pdp11_expand_epilogue): Also
>   deallocate alloca-only frame.
> 
> gcc/testsuite/
> 
>   PR target/107841
>   * gcc.target/pdp11/pr107841.c: New test.
> ---
> gcc/config/pdp11/pdp11.cc |  2 +-
> gcc/testsuite/gcc.target/pdp11/pr107841.c | 12 
> 2 files changed, 13 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/pdp11/pr107841.c
> 
> diff --git a/gcc/config/pdp11/pdp11.cc b/gcc/config/pdp11/pdp11.cc
> index f6dd841f184..311a1d225e0 100644
> --- a/gcc/config/pdp11/pdp11.cc
> +++ b/gcc/config/pdp11/pdp11.cc
> @@ -393,7 +393,7 @@ pdp11_expand_epilogue (void)
>   rtx x, reg, via_ac = NULL;
> 
>   /* Deallocate the local variables.  */
> -  if (fsize)
> +  if (fsize || cfun->calls_alloca)
> {
>   if (frame_pointer_needed)
>   {
> diff --git a/gcc/testsuite/gcc.target/pdp11/pr107841.c 
> b/gcc/testsuite/gcc.target/pdp11/pr107841.c
> new file mode 100644
> index 000..a363c468b0b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/pdp11/pr107841.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Verify that the stack frame is deallocated using the frame pointer.  */
> +
> +void qq (int a)
> +{
> +char *s = __builtin_alloca (128);
> +__builtin_sprintf (s, "qq %d", 3);
> +}
> +
> +/* { dg-final { scan-assembler "mov\tr5,sp" } } */
> -- 
> 2.41.0
> 


Thanks Mikael.  I committed this fix.

paul



Re: [PATCH] fix pdp11_expand_epilogue (PR target/107841)

2023-07-13 Thread Paul Koning via Gcc-patches



> On Jul 13, 2023, at 12:47 PM, Mikael Pettersson  wrote:
> 
> If the stack frame only contains an alloca area, then
> pdp11_expand_epilogue fails to deallocate it, resulting
> in callee-saved registers and the return address being
> restored from the wrong stack slots.  Fixed by adding
> || cfun->calls_alloca to the condition for deallocating
> the frame.
> 
> Tested with a cross to pdp11-unknown-aout.
> 
> Ok for master? (Note: I don't have commit rights.)

Yes, thank you!

Question for the experts: how is this handled?  Do I need to apply this change 
to my workspace and commit it, with Mikael as the change author?

paul

> 
> gcc/
> 
>   PR target/107841
>   * config/pdp11/pdp11.c (pdp11_expand_epilogue): Also
>   deallocate alloca-only frame.
> 
> gcc/testsuite/
> 
>   PR target/107841
>   * gcc.target/pdp11/pr107841.c: New test.
> ---
> gcc/config/pdp11/pdp11.cc |  2 +-
> gcc/testsuite/gcc.target/pdp11/pr107841.c | 12 
> 2 files changed, 13 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/pdp11/pr107841.c
> 
> diff --git a/gcc/config/pdp11/pdp11.cc b/gcc/config/pdp11/pdp11.cc
> index f6dd841f184..311a1d225e0 100644
> --- a/gcc/config/pdp11/pdp11.cc
> +++ b/gcc/config/pdp11/pdp11.cc
> @@ -393,7 +393,7 @@ pdp11_expand_epilogue (void)
>   rtx x, reg, via_ac = NULL;
> 
>   /* Deallocate the local variables.  */
> -  if (fsize)
> +  if (fsize || cfun->calls_alloca)
> {
>   if (frame_pointer_needed)
>   {
> diff --git a/gcc/testsuite/gcc.target/pdp11/pr107841.c 
> b/gcc/testsuite/gcc.target/pdp11/pr107841.c
> new file mode 100644
> index 000..a363c468b0b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/pdp11/pr107841.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Verify that the stack frame is deallocated using the frame pointer.  */
> +
> +void qq (int a)
> +{
> +char *s = __builtin_alloca (128);
> +__builtin_sprintf (s, "qq %d", 3);
> +}
> +
> +/* { dg-final { scan-assembler "mov\tr5,sp" } } */
> -- 
> 2.41.0
> 



Re: [committed] Convert xstormy16 to LRA

2023-05-11 Thread Paul Koning via Gcc-patches



> On May 11, 2023, at 11:05 AM, Hans-Peter Nilsson via Gcc-patches 
>  wrote:
> 
> ...
> Yes, very interesting.  Thank you for sharing this.  I've
> seen regressions with LRA for CRIS too, for
> "double-register-sized" types, which for CRIS, a 32-bit
> target, translates to 64-bit types (DFmode and DImode), and
> where LRA does a much worse job than reload; spills a lot
> more often to stack, even after trying every
> register-allocation-related hook I found (and also an LRA
> patch which helped only by a fraction, but regressed results
> on x86_64-linux, so let's quickly forget it again).

That observation makes me a bit worried.  While CRIS may not be a priority 
platform, that description makes it sound like a case that would be significant 
in any 32 bit platform, which would include priority ones like i386 and ARM.

If that's true, I wonder about dropping Reload.  While I understand it's been 
years since LRA was first introduced, wouldn't we even so want to go by the 
rule that a newer replacement mechanism doesn't replace an older one  until the 
replacement demonstrates comparable or better output compared with the older 
one?

paul




Re: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Paul Koning via Gcc-patches



> On May 2, 2023, at 9:18 AM, Roger Sayle  wrote:
> 
> 
> On 02 May 2023 13:40, Paul Koning wrote:
>>> On May 1, 2023, at 7:37 PM, Roger Sayle 
>> wrote:
>>> 
>>> ...
>>> The shiftsi.cc regression on xstormy16 is fixed by adding
>>> -fno-split-wide-types.
>>> In fact, if all the regression tests pass, I'd suggest that
>>> flag_split_wide-types = false should be the default on xstormy16 now
>>> that we've moved to LRA.  And if this works for xstormy16, it might be
>>> useful to other targets for the LRA transition; it's a difference in
>>> behaviour between reload and LRA that could potentially affect
>>> multiple targets.
>> 
>> Is there documentation for that flag?
> 
> Yes, see the section -fsplit-wide-types in
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Thanks.  So I'm wondering why that would be a problem.

The obvious question is whether it interacts badly with MD file entries that 
describe wide operations, perhaps with constraints that require things like 
odd/even register pairs.  But I would assume all that gets handled.

Along the same lines, why would a target, or a user, not do early wide 
splitting all the time?  The documentation for that option gives no clue why it 
would ever be bad.

paul




Re: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Paul Koning via Gcc-patches



> On May 1, 2023, at 7:37 PM, Roger Sayle  wrote:
> 
> ...
> The shiftsi.cc regression on xstormy16 is fixed by adding
> -fno-split-wide-types.
> In fact, if all the regression tests pass, I'd suggest that
> flag_split_wide-types = false
> should be the default on xstormy16 now that we've moved to LRA.  And if this
> works for xstormy16, it might be useful to other targets for the LRA
> transition;
> it's a difference in behaviour between reload and LRA that could potentially
> affect multiple targets.

Is there documentation for that flag?  

paul



Re: [PATCH 2/5] match.pd: Remove commented out line pragmas unless -vv is used.

2023-04-28 Thread Paul Koning via Gcc-patches
On the check for verbose==2, should that be verbose >= 2 ?

paul

> On Apr 28, 2023, at 6:38 AM, Tamar Christina via Gcc-patches 
>  wrote:
> 
> Hi All,
> 
> genmatch currently outputs commented out line directives that have no effect
> but the compiler still has to parse only to discard.
> 
> They are however handy when debugging genmatch output.  As such this moves 
> them
> behind the -vv flag.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR bootstrap/84402
>   * genmatch.cc (output_line_directive): Only emit commented directive
>   when -vv.
>   (main): Initialize verbose.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 
> 638606b2502f640e59527fc5a0b23fa3bedd0cee..6d62cdea2082d92e5ecc1102c80205115a4e3040
>  100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -209,7 +209,7 @@ output_line_directive (FILE *f, location_t location,
>   else
>   fprintf (f, "%s:%d", file, loc.line);
> }
> -  else
> +  else if (verbose == 2)
> /* Other gen programs really output line directives here, at least for
>development it's right now more convenient to have line information
>from the generated file.  Still keep the directives as comment for now
> @@ -5221,6 +5221,7 @@ main (int argc, char **argv)
> return 1;
> 
>   bool gimple = true;
> +  verbose = 0;
>   char *input = argv[argc-1];
>   for (int i = 1; i < argc - 1; ++i)
> {
> 
> 
> 
> 
> -- 
> 



Re: [committed] libgcc CRIS: Define TARGET_HAS_NO_HW_DIVIDE

2023-04-26 Thread Paul Koning via Gcc-patches



> On Apr 26, 2023, at 8:05 PM, Hans-Peter Nilsson  wrote:
> 
> Not many targets define this besides msp430, pdp1, xtensa,
> and arm compared to those that appear to unconditionally
> have a hardware division instruction (also, pdp11 and
> msp430 seem confused and should be empty instead of "1"  ...

How so, "confused"?  The documentation says it should be defined, it doesn't 
say that it should be defined as empty.  What goes wrong if it's defined as 1 
rather than empty?

The documentation is also somewhat misleading, because it says to define it if 
the hardware has no divide instruction.  The more accurate statement is that it 
should be defined if the hardware has no 64 / 32 bit divide hardware support.  
pdp11.h points this out in a comment, because most pdp11s do have divide 
instructions but those are for 32 / 16 bits.

paul

Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Paul Koning via Gcc-patches



> On Apr 23, 2023, at 12:47 PM, Segher Boessenkool  
> wrote:
> 
> This minimal patch enables LRA for all targets.  It does not clean up
> the target code, nor does it do anything to generic code: it just
> deletes all target definitions of TARGET_LRA_P.
> 
> There are three kinds of changes:
> 
> 1) Targets that already always have LRA, but that redefine the hook
> anyway.  These are gcn, pdp11, rx, sparc, vax, and xtensa.  Nothing
> really changes for these targets with this patch (but later patches
> will delete the superfluous hook implementations).

I thought that the existing coding for pdp11 makes LRA selectable (via -mlra) 
and defaults to off.  I had planned to change it to default to on but leave it 
selectable.  I suppose just having it on is ok too, although the code from LRA 
wasn't as efficient as the old last I looked (which is a while ago).

paul




Re: [PATCH] doc: Document order of define_peephole2 scanning

2023-04-18 Thread Paul Koning via Gcc-patches
I'm not sure about the meaning of part of this.  "...resumes at the last 
generated insn."  Does that mean:

1. If a match is found at some insn, the replacement defined by the matching 
define_peephole2 is performed, and then the scan resumes at the first of the 
insns produced by the replacement.

or

2. If a match is found at some insn, the replacement defined by the matching 
define_peephole2 is performed, and then the scan resumes at the insn 
immediately following the ones just matched.

"Last generated" seems to fit option 1, but I'm not sure if that's what you 
meant.  Maybe you could add some words to say more explicitly which it is.

paul

> On Apr 18, 2023, at 1:55 PM, Hans-Peter Nilsson via Gcc-patches 
>  wrote:
> 
> Generated pdf inspected.  Ok to commit?
> 
> Thoughts on fixing the IMHO wart to also expose all
> replacements to all define_peephole2?  Looks feasible
> (famous last words), but then again I haven't checked the
> history yet.
> 
> -- >8 --
> I was a bit surprised when my define_peephole2 didn't match,
> but it was because it was expected to partially match the
> generated output of a previous define_peephole2.  I had
> assumed that the algorithm exposed newly created opportunities
> to all define_peephole2's.  While things can change in that
> direction, let's start with documenting the current state.
> 
>   * doc/md.texi (define_peephole2): Document order of scanning.
> ---
> gcc/doc/md.texi | 8 
> 1 file changed, 8 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 07bf8bdebffb..0f9e32d2c648 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -9362,6 +9362,14 @@ If the preparation falls through (invokes neither 
> @code{DONE} nor
> @code{FAIL}), then the @code{define_peephole2} uses the replacement
> template.
> 
> +Insns are scanned in forward order from beginning to end for each basic
> +block, but the basic blocks are scanned in reverse order of appearance
> +in a function.  After a successful replacement, scanning for further
> +opportunities for @code{define_peephole2} matches, resumes at the last
> +generated insn.  I.e. for the example above, the first insn that can be
> +matched by another @code{define_peephole2}, is @code{(set (match_dup 3)
> +(match_dup 4))}.
> +
> @end ifset
> @ifset INTERNALS
> @node Insn Attributes
> -- 
> 2.30.2
> 



Re: Should -ffp-contract=off the default on GCC?

2023-03-21 Thread Paul Koning via Gcc-patches



> On Mar 21, 2023, at 1:59 PM, Jeff Law via Gcc-patches 
>  wrote:
> 
> 
> 
> On 3/21/23 11:00, Qing Zhao via Gcc-patches wrote:
>>> On Mar 21, 2023, at 12:56 PM, Paul Koning  wrote:
>>> 
>>> 
>>> 
 On Mar 21, 2023, at 11:01 AM, Qing Zhao via Gcc-patches 
  wrote:
 
 ...
 Most of the compiler users are not familiar with language standards, or no 
 access to language standards. Without clearly documenting such warnings 
 along with the option explicitly, the users have not way to know such 
 potential impact.
>>> 
>>> With modern highly optimized languages, not knowing the standard is going 
>>> to get you in trouble.  There was a wonderful paper from MIT a few years 
>>> ago describing all the many ways C can bite you if you don't know the rules.
>> Yes, it’s better to know the details of languages standard. -:)
>> However, I don’t think that this is a realistic expectation to the compiler 
>> users:  to know all the details of a language standard.
> Umm, they really do need to know that stuff.
> 
> If the developer fails to understand the language standard, then they're 
> likely going to write code that is ultimately undefined or doesn't behave in 
> they expect.  How is the compiler supposed to guess what the developer 
> originally intended?  How should the compiler handle the case when two 
> developers have different understandings of how a particular piece of code 
> should work?  In the end it's the language standard that defines how all this 
> stuff should work.
> 
> Failure to understand the language is a common problem and we do try to emit 
> various diagnostics to help developers avoid writing non-conformant code.  
> But ultimately if a developer fails to understand the language standard, then 
> they're going to be surprised by the behavior of their code.

Conversely, of course, the problem is that C and other languages have evolved 
to the point that you have to be a language lawyer to write valid code.  In 
other words, a substantial fraction of programmers are by definition writing 
unreliable code.  This is not a good situation, and it may be part of the 
reason why modern software has such a high rate of defects.

paul



Re: Should -ffp-contract=off the default on GCC?

2023-03-21 Thread Paul Koning via Gcc-patches



> On Mar 21, 2023, at 11:01 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> ...
> Most of the compiler users are not familiar with language standards, or no 
> access to language standards. Without clearly documenting such warnings along 
> with the option explicitly, the users have not way to know such potential 
> impact.

With modern highly optimized languages, not knowing the standard is going to 
get you in trouble.  There was a wonderful paper from MIT a few years ago 
describing all the many ways C can bite you if you don't know the rules.

paul



Re: [RFC] Introduce -finline-memset-loops

2023-01-13 Thread Paul Koning via Gcc-patches



> On Jan 13, 2023, at 8:54 PM, Alexandre Oliva via Gcc-patches 
>  wrote:
> 
> Hello, Richard,
> 
> Thank you for the feedback.
> 
> On Jan 12, 2023, Richard Biener  wrote:
> 
>> On Tue, Dec 27, 2022 at 5:12 AM Alexandre Oliva via Gcc-patches
>>  wrote:
> 
>>> This patch extends the memset expansion to start with a loop, so as to
>>> still take advantage of known alignment even with long lengths, but
>>> without necessarily adding store blocks for every power of two.
> 
>> I wonder if that isn't better handled by targets via the setmem pattern,
> 
> That was indeed where I started, but then I found myself duplicating the
> logic in try_store_by_multiple_pieces on a per-target basis.
> 
> Target-specific code is great for tight optimizations, but the main
> purpose of this feature is not an optimization.  AFAICT it actually
> slows things down in general (due to code growth, and to conservative
> assumptions about alignment), 

I thought machinery like the memcpy patterns have as one of their benefits the 
ability to find the alignment of their operands and from that optimize things.  
So I don't understand why you'd say "conservative".

paul




Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-12 Thread Paul Koning via Gcc-patches



> On Jan 12, 2023, at 9:40 AM, Segher Boessenkool  
> wrote:
> 
> On Thu, Jan 12, 2023 at 09:17:31AM -0500, Paul Koning wrote:
>>> On Jan 12, 2023, at 4:50 AM, Segher Boessenkool 
>>>  wrote:
>>> I mean general_operand accepts that sp thing you saw.  But your
>>> constraints do not?  (I don't know your target well, maybe this isn't
>>> true).  Things like this should be sorted out by reload, but you get
>>> better code (and fewer problems ;-) ) if you make code that fits
>>> earlier.  The L in LRA means "local": it "just" makes things fit, it
>>> does not emphasise optimising your code.
>> 
>> The destination is "nonimmediate_operand" which matches what the machine 
>> actually does.  It's like VAX and M68k, instruction operands in general can 
>> be registers, memory references, register indirect or memory indirect, 
>> memory at register with offset, or autoinc/autodec off any register.
>> 
>> As far as operand type is concerned, SP is certainly a valid operand for an 
>> add operation, that isn't the problem.  The problem is that it's a two 
>> operand machine: the first operand of the add instruction is both source and 
>> destination.  And LRA assigned the source register to be the destination 
>> register as required, but then after doing so it replaced the destination 
>> (an FP reference) by a different register (SP reference), violating the 
>> constraint after having satisfied it earlier.
> 
> Ah okay.  Yes, something does not verify if the instructions are valid
> before doing some replacement.  Something around ELIMINABLE_REGS it
> looks like?  Maybe the dump file says more, or maybe the dump file can
> be improved.

The Reload dump file mentions in a couple of places that it sees r5 (FP) as 
eliminable, replaced by r6 (SP).  The IRA dump file says nothing.

Yes, the ELIMINABLE_REGS macro says FP can be eliminated, which is correct.  In 
fact, if it weren't for that, it would be wrong for the register allocator to 
pick it (R5) as a general register to hold the result of that addhi3.  So the 
trouble is that the replacement is made after that register allocation, and 
given the constraint the allocator actually needs to generate an additional 
instruction, a move from SP to R5 so it can then do the two-operand add the 
constraint requires.

This feels like a bug; should I file a bug report?  Or is there something the 
target code can do to make this work?  I looked through the internals manual a 
bit, it doesn't show anything helpful.  The register elimination hooks are 
rather generic and don't give me any handle to recognize cases like this one.

paul



Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-12 Thread Paul Koning via Gcc-patches



> On Jan 12, 2023, at 4:50 AM, Segher Boessenkool  
> wrote:
> 
> On Wed, Jan 11, 2023 at 05:07:47PM -0500, Paul Koning wrote:
>>> On Jan 11, 2023, at 2:28 PM, Segher Boessenkool 
>>>  wrote:
>>> I would say your predicates are way too lenient here (general_operand),
>>> but this needs more info.  A testcase to reproduce the problem, to
>>> start with :-)
>> 
>> I'll try to trim it down.
>> 
>> What do you mean "too lenient"?  The first input operand (which is supposed 
>> to be the same as the output since the instruction set is 2-address) is 
>> "general_operand".  The destination is "nonimmediate_operand" which fits the 
>> constraints applied to it.
> 
> I mean general_operand accepts that sp thing you saw.  But your
> constraints do not?  (I don't know your target well, maybe this isn't
> true).  Things like this should be sorted out by reload, but you get
> better code (and fewer problems ;-) ) if you make code that fits
> earlier.  The L in LRA means "local": it "just" makes things fit, it
> does not emphasise optimising your code.

The destination is "nonimmediate_operand" which matches what the machine 
actually does.  It's like VAX and M68k, instruction operands in general can be 
registers, memory references, register indirect or memory indirect, memory at 
register with offset, or autoinc/autodec off any register.

As far as operand type is concerned, SP is certainly a valid operand for an add 
operation, that isn't the problem.  The problem is that it's a two operand 
machine: the first operand of the add instruction is both source and 
destination.  And LRA assigned the source register to be the destination 
register as required, but then after doing so it replaced the destination (an 
FP reference) by a different register (SP reference), violating the constraint 
after having satisfied it earlier.

Interesting to know what LRA stands for, I didn't know that.

paul



Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Paul Koning via Gcc-patches



> On Jan 11, 2023, at 2:28 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Jan 11, 2023 at 01:42:22PM -0500, Paul Koning wrote:
>> Or, as in my case, because building with LRA as the default triggers an ICE 
>> that I don't understand.  I posted a note to the GCC list about what I saw, 
>> but have received no reaction.
> 
> ?

I just saw that, thanks!

> I would say your predicates are way too lenient here (general_operand),
> but this needs more info.  A testcase to reproduce the problem, to
> start with :-)

I'll try to trim it down.

What do you mean "too lenient"?  The first input operand (which is supposed to 
be the same as the output since the instruction set is 2-address) is 
"general_operand".  The destination is "nonimmediate_operand" which fits the 
constraints applied to it.

paul



Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Paul Koning via Gcc-patches



> On Jan 11, 2023, at 1:32 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Jan 11, 2023 at 05:27:36PM +0100, Richard Biener wrote:
>>> Am 11.01.2023 um 16:17 schrieb Segher Boessenkool 
>>> :
 Note this is more info for port maintainers not for users and
 changes.html is for users.
>>> 
>>> And users will notice some ports will have to be removed, because those
>>> ports are not maintained / not maintained enough.  Some ports will not
>>> work with LRA, most will be easy to fix, but someone will have to do
>>> that.  If no one does so the port works sufficiently well it will have
>>> to be removed before release.
>>> 
 "In a future release" is also quite vague.
>>> 
>>> It's what we usually say in changes.html .  "In GCC 14" if you want?
>>> 
>>> I can add some stuff on how this will benefit users?
>> 
>> I guess listing the ports without LRA support might be a first step for 
>> clarification?
> 
> Every port has LRA support.
> 
> Some ports will not build later when we delete old reload, because they
> use some functions and/or data structures unique to that.

Or, as in my case, because building with LRA as the default triggers an ICE 
that I don't understand.  I posted a note to the GCC list about what I saw, but 
have received no reaction.

If anyone can help me understand how LRA can generate RTL with register choices 
that violate the constraints listed in the MD file, I would be grateful.

paul



Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-05 Thread Paul Koning via Gcc-patches
Does this mean that targets that have an option to use LRA or not should now 
default to LRA?  Some currently default to old reload.

paul


> On Jan 5, 2023, at 2:27 PM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> Happy new year everyone.
> 
> Is this patch okay to commit?
> 
> 
> Segher
> 
> ---
> htdocs/gcc-13/changes.html | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index 3876ad77543a..954469cdcfa4 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -39,6 +39,8 @@ a work-in-progress.
> Legacy debug info compression option -gz=zlib-gnu was 
> removed
>   and the option is ignored right now.
> New debug info compression option value -gz=zstd has 
> been added.
> +Support for old reload is deprecated. It will be removed in a future
> +  release. Every target will always use LRA from then on.
> 
> 
> 
> -- 
> 1.8.3.1
> 



Re: [PATCH 3/3] Use startswith in targets.

2021-04-21 Thread Paul Koning via Gcc-patches



> On Mar 19, 2021, at 5:21 AM, Martin Liska  wrote:
> 
> 
> gcc/ChangeLog:
> 
>   ...
>   * config/pdp11/pdp11.c (pdp11_output_ident): Likewise.

pdp11 is ok.  Thanks.

paul




Scam alert

2021-03-17 Thread Paul Koning via Gcc-patches
It's probably obvious to most, but... I just got a fairly plausible looking 
phishing email pretending that my gcc.gnu.org password was about to expire.  
The link it asked me to click on was a giveaway that the mail came from a 
criminal, but we know that these red flags can be overlooked.

So just a note in case others are being scammed as well.

paul



Re: [PATCH 2/4] PDP11: Use a mode with `const_double_zero' expressions

2021-01-08 Thread Paul Koning via Gcc-patches



> On Jan 7, 2021, at 8:50 PM, Maciej W. Rozycki  wrote:
> 
> ...
> 
> Provide a new iterator to provide copies of FP substitutions across the 
> FP modes supported as the substitutions now need to match the mode of 
> the operands.
> 
>   gcc/
>   * config/pdp11/pdp11.md (PDPfp): New mode iterator.
>   (fcc_cc, fcc_ccnz): Use it.  Add mode to `const_double_zero' and 
>   operands.
> ---
> gcc/config/pdp11/pdp11.md |   10 ++
> 1 file changed, 6 insertions(+), 4 deletions(-)
> 
> gcc-pdp11-const-double-zero-mode.diff
> Index: gcc/gcc/config/pdp11/pdp11.md
> ===
> --- gcc.orig/gcc/config/pdp11/pdp11.md
> +++ gcc/gcc/config/pdp11/pdp11.md
> @@ -82,6 +82,8 @@
> 
> (define_code_iterator SHF [ashift ashiftrt lshiftrt])
> 
> +(define_mode_iterator PDPfp [SF DF])
> +
> ;; Substitution to turn a CC clobber into a CC setter.  We have four of
> ;; these: for CCmode vs. CCNZmode, and for CC_REGNUM vs. FCC_REGNUM.
> (define_subst "cc_cc"
> @@ -101,19 +103,19 @@
>(set (match_dup 0) (match_dup 1))])
> 
> (define_subst "fcc_cc"
> -  [(set (match_operand 0 "") (match_operand 1 ""))
> +  [(set (match_operand:PDPfp 0 "") (match_operand:PDPfp 1 ""))
>(clobber (reg FCC_REGNUM))]
>   ""
>   [(set (reg:CC FCC_REGNUM)
> - (compare:CC (match_dup 1) (const_double_zero)))
> + (compare:CC (match_dup 1) (const_double_zero:PDPfp)))
>(set (match_dup 0) (match_dup 1))])
> 
> (define_subst "fcc_ccnz"
> -  [(set (match_operand 0 "") (match_operand 1 ""))
> +  [(set (match_operand:PDPfp 0 "") (match_operand:PDPfp 1 ""))
>(clobber (reg FCC_REGNUM))]
>   ""
>   [(set (reg:CCNZ FCC_REGNUM)
> - (compare:CCNZ (match_dup 1) (const_double_zero)))
> + (compare:CCNZ (match_dup 1) (const_double_zero:PDPfp)))
>(set (match_dup 0) (match_dup 1))])
> 
> (define_subst_attr "cc_cc" "cc_cc" "_nocc" "_cc")


Ok.  Thanks Maciej.

paul



Re: [PATCH] avr: cc0 to mode_cc conversion

2021-01-05 Thread Paul Koning via Gcc-patches



> On Jan 5, 2021, at 8:54 AM, Senthil Kumar Selvaraj via Gcc-patches 
>  wrote:
> 
> 
> Senthil Kumar Selvaraj writes:
> 
>> Georg-Johann Lay writes:
>> 
>> ...
>>> 
>>> 2) We just saw 100reds of insns being dublicated, basically the whole
>>> machine description except for the few insns that leave cc alone.
>>> Isn't is possible to use define subst for the bulk of the insns and
>>> get a neat code that's better to grasp and to maintain?
>>> After all it's just appending a clobber of reg_cc, and in the current
>>> proposal almost 50% of the backend is just redundent repetitions of
>>> previous insns.
> 
> I could not find a way to get define_subst to do define_insn_and_split -
> other targets using the same approach (pdp11, h8300) have the
> duplication as well.

I ran into the same issue, I tried as well for the obvious reason.  I'm pretty 
sure someone told me (a) that doesn't work, and (b) the reason is xyzzy.  But I 
no long remember what the reason is, or even if I was told one.

The impression I have is that define_subst isn't a macro facility, even though 
it looks a bit like one, and that may be why it can't do what you want to do 
here.

paul




Re: [PATCH] avr: cc0 to mode_cc conversion

2020-12-17 Thread Paul Koning via Gcc-patches



> On Dec 17, 2020, at 6:21 AM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Thu, Dec 17, 2020 at 02:15:51PM +0530, Senthil Kumar Selvaraj via 
> Gcc-patches wrote:
>> The work on my github branch was not complete - I'd blindly followed
>> whatever the CC0 Transition wiki mentioned (the first three steps of
>> case #2), and fixed any regression fallout (for ATmega128).
>> 
>> I intend to try out a define_subst/early clobber of reg_cc based
>> approach (inspired by the cris port) and see if that can help avoid the
>> proliferation of define_insn_and_splits. Will update how that works out.
> 
> Yeah, case #2 does not necessarily give the best result, but it usually
> is the least work to do, so certainly a good choice under time pressure.

I was under the impression from what I read in the blog a year or two ago (when 
I did the pdp11 ccmode work) that "case 2" is the better answer for machines in 
which pretty much every operation touches the condition codes.  In other words, 
I understood that case 1 would for such machines not be the right answer -- it 
wasn't just that "case 2 is easier".

Did I misunderstand?  Is there a reason for machines such as pdp11, in which 
basically every operation that handles data, even a move instruction, touches 
CC, to use approach 1?

paul




Re: [PATCH] genemit: Handle `const_double_zero' rtx

2020-12-16 Thread Paul Koning via Gcc-patches



> On Dec 16, 2020, at 6:35 AM, Maciej W. Rozycki  wrote:
> 
> ...
> NB the PDP-11 pieces affected here and tripping this assertion are I 
> believe dead code, as these insns are only ever produced by splitters and 
> do not appear to be referred by their names.  

Right; I gave them names for documentation purposes, after all insns are 
allowed to have names whether or not the name is referenced or is a required 
name for RTL generation.

paul




Re: [PATCH 29/31] PDP11: Use `const_double_zero' to express double zero constant

2020-12-15 Thread Paul Koning via Gcc-patches



> On Dec 15, 2020, at 9:06 AM, Maciej W. Rozycki  wrote:
> 
> On Tue, 15 Dec 2020, Martin Liška wrote:
> 
>> If I see correctly, starting from this revision I can't compile a cross
>> compiler of x86_64-linux-gnu:
>> 
>> ../configure --target=pdp11-aout --disable-bootstrap --enable-languages=c,c++
>> --disable-multilib  --enable-obsolete

PDP11 doesn't do C++, there isn't any way to do that with aout that I have seen 
(no named section support).  There's been some experimental work on ELF for 
pdp11 (!) which should make this possible but I haven't tried that yet.

> 
> Thanks for the heads-up and the details of the configuration, and sorry 
> for the breakage.
> ...
> 
>> Can you please take a look?
> 
> I'm fairly sure this is due to the difference in TARGET_SUPPORTS_WIDE_INT
> with the VAX backend vs the PDP-11 one.  I have an idea how this should be 
> addressed and will be implementing it shortly.

What's the difference?  pdp11 does support wide int.

> NB the backend fails `-Werror' compilation:
> 
> In file included from ./tm.h:18,
> from .../gcc/backend.h:28,
> from .../gcc/varasm.c:31:
> .../gcc/varasm.c: In function 'void assemble_zeros(long unsigned int)':
> .../gcc/config/pdp11/pdp11.h:622:20: error: format '%o' expects argument of 
> type 'unsigned int', but argument 3 has type 'long unsigned int' 
> [-Werror=format=]
>  622 | fprintf (FILE, "\t.blkb\t%o\n", (SIZE) & 0x);   \
>  |^~~  ~~~
>  ||
>  |long unsigned int
> 
> etc., so that has to be disabled.  This may be with 64-bit hosts only, I'm 
> not sure, however the way varargs are handled means the issue will likely 
> cause a broken compiler to be built if the warning is ignored, at least 
> with some hosts.
> 
> Paul, can you please look into it sometime?

Will do, thanks.

paul




Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-12-11 Thread Paul Koning via Gcc-patches



> On Dec 11, 2020, at 9:54 AM, Maciej W. Rozycki  wrote:
> 
> On Wed, 9 Dec 2020, Paul Koning wrote:
> 
>>> This all sounds great.  Do you happen to know if it is cycle-accurate 
>>> with respect to individual hardware microarchitectures simulated?  That 
>>> would be required for performance evaluation of compiler-generated code.
>> 
>> No, it isn't.  I believe it just charges one time unit per instruction, 
>> with the possible exception of CIS instructions.
> 
> Fair enough, from experience most CPU emulators are instruction-accurate 
> only.  Of all the generally available emulators I came across (and looked 
> into closely enough; maybe I missed something) only ones for the Z80 were 
> cycle-accurate, and I believe the MAME project has had cycle-accurate 
> emulation, both down to the system level and both out of necessity, as 
> software they were written for was often unforgiving when it comes to any 
> discrepancy with respect to original hardware.

I know of a cycle-accurate CDC 6000 simulator, but I think that was a one man 
project never released.

> Commercially, MIPS Technologies used to have cycle-accurate MIPSsim, 
> actually used for hardware verification, and taking into account all the 
> implementation details such as the TLB and caches of individual CPU cores 
> supported.

There was also a simulator with capabilities like that for the SB-1 CPU core of 
the Sibyte SB-1250 SoC. 

> ...
>> I don't know of any cycle accurate PDP-11 emulators.  It's not even 
>> clear if it is possible to build one, given the asynchronous operation 
>> of the UNIBUS.  It certainly would be extremely difficult since even the 
>> documented timing is amazingly complex, never mind the possibility that 
>> the reality is different from what is documented.
> 
> For the purpose of compiler's performance evaluation however I don't 
> think we need to go down as far as the external bus, so however UNIBUS 
> performs should not really matter.  Even with the modern systems all the 
> pipeline descriptions and operation timings we have recorded within GCC 
> reflect perfect operating conditions such as hot caches, no TLB misses, no 
> branch mispredictions, to say nothing of disruption to all that caused by 
> hardware interrupts and context switches.

True, but I was thinking of models where the UNIBUS is used for memory.  The 
real issue is that the documented timings are full of strange numbers.  There 
isn't a timing for a given instruction, but rather a whole pile of numbers 
depending on the addressing modes, with occasional exceptions to a pattern (for 
example, some register to register operations are faster than the general 
pattern for the operation and addressing mode costs would suggest).  And it's 
hard to find a number that can be used as the "cycle time" where each time 
value is a small multiple of that basic number.  That's an issue both for a 
timing simulation and also for the GCC instruction scheduler and instruction 
cost models -- I ended up rounding things rather drastically and trimming out 
some detail in order to have the cost values be small integers and not blow up 
the size of the scheduler state machine.

> So I guess with cycle-accurate PDP-11 emulation it would be sufficient if 
> relative CPU instruction execution timings were correctly reflected, such 
> as the latency of say MOV vs DIV, as I am fairly sure they are not even 
> close to being equivalent.  But that does come at a cost; cycle-accurate 
> MIPSsim was much slower than its instruction-accurate counterpart which 
> also existed.
> 
> ...
>> More interesting would be to tweak the optimizing machinery to improve 
>> parts that either have bitrotted or never actually worked. The code 
>> generation for auto-increment etc. isn't particularly effective and I 
>> think that's a known limitation.  Ditto indirect addressing, since few 
>> other machines have that.  (VAX does, of course; it might benefit too.)  
>> And with LRA things are more limited still, again this seems to be known 
>> and is caused by the focus on modern machine architectures.
> 
> Correctness absolutely has to take precedence over performance, but that 
> does not mean the latter has to be completely ignored either.  And the 
> presence of tools may only help with that.  We may not have the resources 
> available commercially significant ports have, but that does not mean we 
> should decide upfront to abandon any kind of performance QA.  I think we 
> can still act professionally and try to do our best to make the quality of 
> code produced as good as possible within our available resources.

Definitely.  For me, one complication is that the key bits of the common code 
are things I don't really understand and there isn't much documentation, 
especially for the LRA case.  Some of what is documented apparently hasn't been 
correct in many years, and possibly was never correct.  I think some of the 
auto-increment facilities fall in that category.

   

Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-12-09 Thread Paul Koning via Gcc-patches



> On Dec 9, 2020, at 9:06 AM, Maciej W. Rozycki  wrote:
> 
> On Sat, 28 Nov 2020, Paul Koning wrote:
> 
>>> Hmm, I gather those systems are able to run some kind of BSD Unix: don't 
>>> they support the r-commands which would allow you to run DejaGNU testing 
>>> with a realistic environment PDP-11 hardware would be usually used with, 
>>> possibly on actual hardware even?  I always feel a bit uneasy about the 
>>> accuracy of any simulation (having suffered from bugs in QEMU causing 
>>> false negatives in software verification).
>> 
>> Fair enough.  But SIMH is a full system emulator with a very large 
>> amount of history and expertise involved in its creation.  It's also 
>> known to run every PDP-11 OS and most diagnostics.  Yes, it certainly 
>> runs BSD 2.x; the reason I didn't use that approach is that I don't know 
>> it well.
> 
> This all sounds great.  Do you happen to know if it is cycle-accurate 
> with respect to individual hardware microarchitectures simulated?  That 
> would be required for performance evaluation of compiler-generated code.

No, it isn't.  I believe it just charges one time unit per instruction, with 
the possible exception of CIS instructions. 

I don't know of any cycle accurate PDP-11 emulators.  It's not even clear if it 
is possible to build one, given the asynchronous operation of the UNIBUS.  It 
certainly would be extremely difficult since even the documented timing is 
amazingly complex, never mind the possibility that the reality is different 
from what is documented.

The pdp11 back end uses a very rough approximation of the documented 11/70 
timing, but GCC doesn't make it easy (or maybe not even possible) to use the 
full timing details.  It's not something I'd expect to refine a whole lot 
further.

More interesting would be to tweak the optimizing machinery to improve parts 
that either have bitrotted or never actually worked. The code generation for 
auto-increment etc. isn't particularly effective and I think that's a known 
limitation.  Ditto indirect addressing, since few other machines have that.  
(VAX does, of course; it might benefit too.)  And with LRA things are more 
limited still, again this seems to be known and is caused by the focus on 
modern machine architectures.

paul



Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-28 Thread Paul Koning via Gcc-patches



> On Nov 25, 2020, at 12:07 PM, Maciej W. Rozycki  wrote:
> 
> On Mon, 23 Nov 2020, Paul Koning wrote:
> 
>>> ...
> 
>> I've hacked together a primitive newlib based "bare metal" execution 
>> test setup that uses SIMH, but it's not a particularly clean setup.  
>> And it hasn't been posted, I hope to be able to do that at some point.
> 
> Hmm, I gather those systems are able to run some kind of BSD Unix: don't 
> they support the r-commands which would allow you to run DejaGNU testing 
> with a realistic environment PDP-11 hardware would be usually used with, 
> possibly on actual hardware even?  I always feel a bit uneasy about the 
> accuracy of any simulation (having suffered from bugs in QEMU causing 
> false negatives in software verification).

Fair enough.  But SIMH is a full system emulator with a very large amount of 
history and expertise involved in its creation.  It's also known to run every 
PDP-11 OS and most diagnostics.  Yes, it certainly runs BSD 2.x; the reason I 
didn't use that approach is that I don't know it well. 

> While I would expect old BSD libc to miss some of the C language features 
> considered standard nowadays, I think at least the C GCC frontend runtime 
> (libgcc.a) and the test suite do not overall rely on their presence, and 
> any individual test cases that do can be easily excluded.
> 
>> Thanks for the fix.
> 
> I take it as an approval and will apply the change then along with the 
> rest of the series.  Thank you for your review.

I should have been explicit.  Yes, I approve that change, thanks.

paul



Re: H8 cc0 conversion

2020-11-28 Thread Paul Koning via Gcc-patches



> On Nov 25, 2020, at 5:05 PM, Hans-Peter Nilsson  wrote:
> 
> On Tue, 24 Nov 2020, Eric Botcazou wrote:
> 
>>> I'm intested in any notes, however vague, on that matter.  I was
>>> a bit surprised to see that myself...that is, after fixing
>>> *some* related regressions, like the one in combine.  (Did I
>>> actually miss something?)
>> 
>> My recollection for the Visium port would align with what Jeff saw but, on 
>> the
>> other hand, this could have been very marginal a phenomenon in the end.
> 
> Thanks.  Though, without claims substantiated as anything more
> than a feeling, I'm going stick out my chin and say that you're
> both seasoned enough gcc hackers to be influenced by earlier
> experience, and that things have changed enough that this is no
> longer true.

FWIW, I used the Visium approach (and a pile of help from Eric), i.e., CC after 
reload, in the PDP-11 cc0 conversion.

paul



Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-23 Thread Paul Koning via Gcc-patches



> On Nov 19, 2020, at 10:38 PM, Maciej W. Rozycki  wrote:
> 
> Hi,
> 
> [Paul, there's a PDP11 piece for you further down here and then 29/31.]
> 
> ...
> 
> Then there is a fix for the PDP11 backend addressing an issue I found in 
> the handling of floating-point comparisons.  Unlike all the other changes 
> this one has not been regression-tested, not even built as I have no idea 
> how to prepare a development environment for a PDP11 target (also none of 
> my VAX pieces is old enough to support PDP11 machine code execution).

I agree this is a correct change, interesting that it was missed before.  You'd 
expect some ICE issues from that mistake.  Perhaps there were and I didn't 
realize the cause; the PDP11 test run is not yet fully clean.

I've hacked together a primitive newlib based "bare metal" execution test setup 
that uses SIMH, but it's not a particularly clean setup.  And it hasn't been 
posted, I hope to be able to do that at some point.

Thanks for the fix.

paul



Re: [PATCH][MIPS] PR target/77510 Reduce size of MIPS core automaton

2020-11-10 Thread Paul Koning via Gcc-patches
> On Nov 10, 2020, at 6:09 PM, Jeff Law via Gcc-patches 
>  wrote:
> 
>> ...
>> ChangeLog
>> 
>> gcc/
>> PR target/77510
>> * config/mips/gs464.md: Reduce reservation
>> duration to 10 cycles.
>> * config/mips/i6400.md: Likewise.
>> * config/mips/m5100.md: Likewise.
>> * config/mips/p5600.md: Likewise.
>> * config/mips/p6600.md: Likewise.
>> * config/mips/xlp.md: Likewise.
>> * config/mips/xlr.md: Likewise.
> 
> I doubt there's a measurable performance penalty for not fully modeling
> the latency of these instructions, but I'm quite hesitant to ACK this
> change without hearing from others in the MIPS space.   Alternately, if
> the affected models aren't being sold anymore, then I wouldn't worry
> about getting feedback on them.
> 
> 
> Jeff

XLR and XLP are listed as active products on their manufacturer's website 
(Broadcom).  I don't know about the others.  

paul




Re: [PATCH][MIPS] PR target/77510 Reduce size of MIPS core automaton

2020-11-10 Thread Paul Koning via Gcc-patches



> On Nov 10, 2020, at 6:42 AM, Xu Chenghua  wrote:
> 
> 
> Hi:
> 
> This patch reduce reservation of model do not more than 10 cycles. The memory 
> of genautomata down to 1GB.

Does this make any significant difference in performance of the generated code? 
 The original cycle counts are from the CPU description, so trimming them to a 
max of 10 means the model no longer reflects the real implementation.  It may 
be the difference is too small to worry about.  Can you comment on this?

paul




Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-07-22 Thread Paul Koning via Gcc-patches



> On Jul 18, 2020, at 2:50 PM, Jakub Jelinek via Gcc-patches 
>  wrote:
> 
> Hi!
> 
> The following patch adds __builtin_bit_cast builtin, similarly to
> clang or MSVC which implement std::bit_cast using such an builtin too.
> It checks the various std::bit_cast requirements, when not constexpr
> evaluated acts pretty much like VIEW_CONVERT_EXPR of the source argument
> to the destination type and the hardest part is obviously the constexpr
> evaluation.  I couldn't use the middle-end native_encode_initializer,
> because it needs to handle the C++ CONSTRUCTOR_NO_CLEARING vs. negation of
> that, value initialization of missing members if there are any etc., and
> needs to handle bitfields even if they don't have an integral representative
> (I've left out PDP11 handling of those, couldn't figure out how exactly are
> bitfields laid out there).

It seems to be spelled out in builtins.c function c_readstr.

paul




Re: [PATCH] pdp11: Fix handling of common (local and global) vars [PR94134]

2020-03-11 Thread Paul Koning via Gcc-patches
Ok, thanks.  

paul

> On Mar 11, 2020, at 1:12 PM, Jakub Jelinek  wrote:
> 
> Hi!
> 
> As mentioned in the PR, the generic code decides to put the a variable into
> lcomm_section, which is a NOSWITCH section and thus the generic code doesn't
> switch into a particular section before using
> ASM_OUTPUT{_ALIGNED{,_DECL}_}_LOCAL, on many targets that results just in
> .lcomm (or for non-local .comm) directives which don't need a switch to some
> section, other targets put switch_to_section (bss_section) at the start of
> that macro.
> pdp11 doesn't do that (and doesn't have bss_section), and so emits the
> lcomm/comm variables in whatever section is current (it has only .text/.data
> and for DEC assembler rodata).
> 
> The following patch fixes that by putting it always into data section, and
> additionally avoids emitting an empty line in the assembly for the lcomm
> vars.
> 
> Tested on the testcase, I'm afraid I have no other way to test this target.
> Ok for trunk?
> 
> 2020-03-11  Jakub Jelinek  
> 
>   PR target/94134
>   * config/pdp11/pdp11.c (pdp11_asm_output_var): Call switch_to_section
>   at the start to switch to data section.  Don't print extra newline if
>   .globl directive has not been emitted.
> 
>   * gcc.c-torture/execute/pr94134.c: New test.
> 
> --- gcc/config/pdp11/pdp11.c.jj   2020-01-12 11:54:36.382413876 +0100
> +++ gcc/config/pdp11/pdp11.c  2020-03-11 15:44:43.37397 +0100
> @@ -743,6 +743,7 @@ void
> pdp11_asm_output_var (FILE *file, const char *name, int size,
> int align, bool global)
> {
> +  switch_to_section (data_section);
>   if (align > 8)
> fprintf (file, "\t.even\n");
>   if (TARGET_DEC_ASM)
> @@ -763,8 +764,8 @@ pdp11_asm_output_var (FILE *file, const
>   {
> fprintf (file, ".globl ");
> assemble_name (file, name);
> +   fprintf (file, "\n");
>   }
> -  fprintf (file, "\n");
>   assemble_name (file, name);
>   fputs (":", file);
>   ASM_OUTPUT_SKIP (file, size);
> --- gcc/testsuite/gcc.c-torture/execute/pr94134.c.jj  2020-03-11 
> 15:46:09.540710642 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr94134.c 2020-03-11 
> 15:40:18.538840663 +0100
> @@ -0,0 +1,14 @@
> +/* PR target/94134 */
> +
> +static volatile int a = 0;
> +static volatile int b = 1;
> +
> +int
> +main ()
> +{
> +  a++;
> +  b++;
> +  if (a != 1 || b != 2)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
>