Re: [PATCH] powerpc/atomic: Use YZ constraints for DS-form instructions

2024-09-17 Thread Segher Boessenkool
Hi!

On Mon, Sep 16, 2024 at 10:05:10PM +1000, Michael Ellerman wrote:
> The 'ld' and 'std' instructions require a 4-byte aligned displacement
> because they are DS-form instructions. But the "m" asm constraint
> doesn't enforce that.
> 
> That can lead to build errors if the compiler chooses a non-aligned
> displacement, as seen with GCC 14:
> 
>   /tmp/ccuSzwiR.s: Assembler messages:
>   /tmp/ccuSzwiR.s:2579: Error: operand out of domain (39 is not a multiple of 
> 4)
>   make[5]: *** [scripts/Makefile.build:229: net/core/page_pool.o] Error 1
> 
> Dumping the generated assembler shows:
> 
>   ld 8,39(8)   # MEM[(const struct atomic64_t *)_29].counter, t
> 
> Use the YZ constraints to tell the compiler either to generate a DS-form
> displacement, or use an X-form instruction, either of which prevents the
> build error.

Great explanation text, a perfect commit!  :-)

> See commit 2d43cc701b96 ("powerpc/uaccess: Fix build errors seen with
> GCC 13/14") for more details on the constraint letters.
> 
> Fixes: 9f0cbea0d8cc ("[POWERPC] Implement atomic{, 64}_{read, write}() 
> without volatile")
> Cc: sta...@vger.kernel.org # v2.6.24+
> Reported-by: Stephen Rothwell 
> Closes: https://lore.kernel.org/all/20240913125302.0a06b...@canb.auug.org.au
> Signed-off-by: Michael Ellerman 

Reviewed-By: Segher Boessenkool 


Segher



Re: [PATCH v2 05/17] vdso: Avoid call to memset() by getrandom

2024-08-29 Thread Segher Boessenkool
On Thu, Aug 29, 2024 at 07:36:38PM +0200, Christophe Leroy wrote:
> 
> 
> Le 28/08/2024 à 19:25, Segher Boessenkool a écrit :
> >
> >>Not sure about static binaries, though: do those even use the VDSO?
> >
> >With "static binary" people usually mean "a binary not using any DSOs",
> >I think the VDSO is a DSO, also in this respect?  As always, -static
> >builds are *way* less problematic (and faster and smaller :-) )
> >
> 
> AFAIK on powerpc even static binaries use the vDSO, otherwise signals 
> don't work.

How can that work?  Non-dynamic binaries do not use ld.so (that is the
definition of a dynamic binary, even).  So they cannot link (at runtime)
to any DSO (unless that is done manually?!)

Maybe there is something at a fixed offset in the vDSO, or something
like that?  Is this documented somewhere?


Segher



Re: [PATCH v2 05/17] vdso: Avoid call to memset() by getrandom

2024-08-28 Thread Segher Boessenkool
On Wed, Aug 28, 2024 at 07:12:55PM +0200, Ard Biesheuvel wrote:
> On Wed, 28 Aug 2024 at 18:24, Segher Boessenkool
>  wrote:
> > > In my experience, this is likely to do the opposite: it causes the
> > > compiler to 'forget' the semantics of memcpy() and memset(), so that
> > > explicit trivial calls will no longer be elided and replaced with
> > > plain loads and stores (as it can no longer guarantee the equivalence)
> >
> > No, the compiler will never forget those semantics.  But if you tell it
> > your function named memset() is not the actual standard memset -- via
> > -fno-builtin-memset for example -- the compiler won't optimise things
> > involving it quite as much.  You told it so eh?
> 
> That is exactly the point I am making. So how would this help in this case?

I think we agree?  :-)

> > > This needs to be fixed for Clang as well, so throwing GCC specific
> > > flags at it will at best be a partial solution.
> >
> > clang says it is a 100% plug-in replacement for GCC, so they will have
> > to accept all GCC flags.  And in many cases they do.  Cases where they
> > don't are bugs.
> 
> Even if this were true, we support Clang versions today that do not
> support -fno-tree-loop-distribute-patterns so my point stands.

It is true.  Yes, this cause problems for their users.

> > > It is not a complete solution, unfortunately, and I guess there may be
> > > other situations (compiler/arch combinations) where this might pop up
> > > again.
> >
> > Why do mem* not work in VDSOs?  Fix that, and all these problems
> > disappear, and you do not need workrarounds :-)
> 
> Good point. We should just import memcpy and memset in the VDSO ELF metadata.

Yeah.  In many cases GCC will replace such calls by (faster and/or
smaller) inline code anyway, but when it does leave a call, there needs
to be an external function implementing it :-)

> Not sure about static binaries, though: do those even use the VDSO?

With "static binary" people usually mean "a binary not using any DSOs",
I think the VDSO is a DSO, also in this respect?  As always, -static
builds are *way* less problematic (and faster and smaller :-) )


Segher



Re: [PATCH v2 05/17] vdso: Avoid call to memset() by getrandom

2024-08-28 Thread Segher Boessenkool
On Wed, Aug 28, 2024 at 02:26:08PM +0200, Jason A. Donenfeld wrote:
> On Wed, Aug 28, 2024 at 2:24 PM Arnd Bergmann  wrote:
> >
> > On Wed, Aug 28, 2024, at 11:18, Jason A. Donenfeld wrote:
> > > On Tue, Aug 27, 2024 at 05:53:30PM -0500, Segher Boessenkool wrote:
> > >> On Tue, Aug 27, 2024 at 11:08:19AM -0700, Eric Biggers wrote:
> > >> >
> > >> > Is there a compiler flag that could be used to disable the generation 
> > >> > of calls
> > >> > to memset?
> > >>
> > >> -fno-tree-loop-distribute-patterns .  But, as always, read up on it, see
> > >> what it actually does (and how it avoids your problem, and mostly: learn
> > >> what the actual problem *was*!)
> > >
> > > This might help with various loops, but it doesn't help with the matter
> > > that this patch fixes, which is struct initialization. I just tried it
> > > with the arm64 patch to no avail.
> >
> > Maybe -ffreestanding can help here? That should cause the vdso to be built
> > with the assumption that there is no libc, so it would neither add nor
> > remove standard library calls. Not sure if that causes other problems,
> > e.g. if the calling conventions are different.
> 
> >From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90701:
> 
> | You need -ffreestanding but that is documented to emit memset and
> memcpy still.

Yeah.

'-nostdlib'
 Do not use the standard system startup files or libraries when
 linking.

This won't help a bit, the compiler will still optimise your
initialisation loop to a memset() call, it just won't link to libgcc.a
and crt*.o and its ilk (which is not where mem* are implemented in the
first place!)


Segher



Re: [PATCH v2 05/17] vdso: Avoid call to memset() by getrandom

2024-08-28 Thread Segher Boessenkool
On Wed, Aug 28, 2024 at 12:24:12PM +, Arnd Bergmann wrote:
> On Wed, Aug 28, 2024, at 11:18, Jason A. Donenfeld wrote:
> > On Tue, Aug 27, 2024 at 05:53:30PM -0500, Segher Boessenkool wrote:
> >> On Tue, Aug 27, 2024 at 11:08:19AM -0700, Eric Biggers wrote:
> >> > 
> >> > Is there a compiler flag that could be used to disable the generation of 
> >> > calls
> >> > to memset?
> >> 
> >> -fno-tree-loop-distribute-patterns .  But, as always, read up on it, see
> >> what it actually does (and how it avoids your problem, and mostly: learn
> >> what the actual problem *was*!)
> >
> > This might help with various loops, but it doesn't help with the matter
> > that this patch fixes, which is struct initialization. I just tried it
> > with the arm64 patch to no avail.
> 
> Maybe -ffreestanding can help here? That should cause the vdso to be built
> with the assumption that there is no libc, so it would neither add nor
> remove standard library calls. Not sure if that causes other problems,
> e.g. if the calling conventions are different.

"GCC requires the freestanding
environment provide 'memcpy', 'memmove', 'memset' and 'memcmp'."

This is precisely to implement things like struct initialisation.  Maybe
we should have a "-ffreeerstanding" or "-ffreefloating" or think of
something funnier still environment as well, this problem has been there
since the -ffreestanding flag has existed, but the problem is as old as
the night.

-fno-builtin might help a bit more, but just attack the problem at
its root, like I suggested?

(This isn't a new problem, originally it showed up as "GCC replaces
(part of) my memcpy() implementation by a (recursive) call to memcpy()"
and, well, that doesn't quite work!)


Segher



Re: [PATCH v2 05/17] vdso: Avoid call to memset() by getrandom

2024-08-28 Thread Segher Boessenkool
On Wed, Aug 28, 2024 at 01:18:34PM +0200, Jason A. Donenfeld wrote:
> On Tue, Aug 27, 2024 at 05:53:30PM -0500, Segher Boessenkool wrote:
> > On Tue, Aug 27, 2024 at 11:08:19AM -0700, Eric Biggers wrote:
> > > > +   for (i = 0; i < ARRAY_SIZE(params->reserved); i++)
> > > > +   params->reserved[i] = 0;

This is a loop.  With -ftree-loop-distribute-patterns, the default at
-O2, this is optimised to

memset(params->reserved, 0, ...);

(which is perfectly fine, since memset is required to be there even
for freestanding environments, this is documented!)

> > -fno-tree-loop-distribute-patterns .  But, as always, read up on it, see
> > what it actually does (and how it avoids your problem, and mostly: learn
> > what the actual problem *was*!)
> 
> This might help with various loops, but it doesn't help with the matter
> that this patch fixes, which is struct initialization. I just tried it
> with the arm64 patch to no avail.

It very much *does* help.  Try harder?  Maybe you typoed?


Segher



Re: [PATCH v2 05/17] vdso: Avoid call to memset() by getrandom

2024-08-27 Thread Segher Boessenkool
On Tue, Aug 27, 2024 at 11:08:19AM -0700, Eric Biggers wrote:
> On Thu, Aug 22, 2024 at 09:13:13AM +0200, Christophe Leroy wrote:
> > With the current implementation, __cvdso_getrandom_data() calls
> > memset(), which is unexpected in the VDSO.
> > 
> > Rewrite opaque data initialisation to avoid memset().
> > 
> > Signed-off-by: Christophe Leroy 
> > ---
> >  lib/vdso/getrandom.c | 15 ++-
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c
> > index cab153c5f9be..4a56f45141b4 100644
> > --- a/lib/vdso/getrandom.c
> > +++ b/lib/vdso/getrandom.c
> > @@ -4,6 +4,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -74,11 +75,15 @@ __cvdso_getrandom_data(const struct vdso_rng_data 
> > *rng_info, void *buffer, size_
> > u32 counter[2] = { 0 };
> >  
> > if (unlikely(opaque_len == ~0UL && !buffer && !len && !flags)) {
> > -   *(struct vgetrandom_opaque_params *)opaque_state = (struct 
> > vgetrandom_opaque_params) {
> > -   .size_of_opaque_state = sizeof(*state),
> > -   .mmap_prot = PROT_READ | PROT_WRITE,
> > -   .mmap_flags = MAP_DROPPABLE | MAP_ANONYMOUS
> > -   };
> > +   struct vgetrandom_opaque_params *params = opaque_state;
> > +   int i;
> > +
> > +   params->size_of_opaque_state = sizeof(*state);
> > +   params->mmap_prot = PROT_READ | PROT_WRITE;
> > +   params->mmap_flags = MAP_DROPPABLE | MAP_ANONYMOUS;
> > +   for (i = 0; i < ARRAY_SIZE(params->reserved); i++)
> > +   params->reserved[i] = 0;
> > +
> > return 0;
> > }
> 
> Is there a compiler flag that could be used to disable the generation of calls
> to memset?

-fno-tree-loop-distribute-patterns .  But, as always, read up on it, see
what it actually does (and how it avoids your problem, and mostly: learn
what the actual problem *was*!)

Have fun,


Segher



Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode

2024-08-24 Thread Segher Boessenkool
On Sat, Aug 24, 2024 at 09:01:33AM +, LEROY Christophe wrote:
> Le 23/08/2024 à 21:19, Segher Boessenkool a écrit :
> > The memset() code itself could chech for the storage attributes, but
> > that is probably more expensive than just assuming the happy case.
> > Maybe someone could try it out though!
> 
> But is it only memset() the problem ?
> 
> dcbz instruction is also used in:
> - memcpy()
> - csum_partial_copy_generic()
> - clear_page()
> - copy_page()
> - clear_user()
> - copy_to_user()
> - copy_from_user()

That is just a handful of functions.  Not sure about the _user things,
and the _page things for that matter, but the rest is certainly
measurable in real-life conditions.  So if we can avoid the problems
completely, and cheaply, we probably should.

I'm not so sure about the cheaply though :-/

> Are these functions also used on DMA coherent memory ?

Most won't show up high on most profiles, heh.  Which you already
can see from the problem not being attacked yet: if it was so obviously
a problem, some people would have wanted to do something about it :-)


Segher



Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode

2024-08-23 Thread Segher Boessenkool
Hi!

On Fri, Aug 23, 2024 at 03:54:59PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote:
> > What does "uncached memory" even mean here?  Literally it would be
> > I=1 memory (uncachEABLE memory), but more likely you want M=0 memory
> > here ("non-memory memory", "not well-behaved memory", MMIO often).
> 
> Regular kernel memory vmapped with pgprot_noncached().

So, I=1 (and G=1).  Caching inhibited and guarded.  But M=1 (memory
coherence required) as with any other real memory :-)

> > If memset() is expected to be used with M=0, you cannot do any serious
> > optimisations to it at all.  If memset() is expected to be used with I=1
> > it should use a separate code path for it, probably the caller should
> > make the distinction.
> 
> DMA coherent memory which uses uncached memory for platforms that
> do not provide hardware dma coherence can end up just about anywhere
> in the kernel.  We could use special routines for a few places in
> the DMA subsystem, but there might be plenty of others.

Yeah.  It will just be plenty slow, as we see here, that's what the
warning is for; but it works just fine :-)

The memset() code itself could chech for the storage attributes, but
that is probably more expensive than just assuming the happy case.
Maybe someone could try it out though!


Segher



Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode

2024-08-23 Thread Segher Boessenkool
Hi!

On Thu, Aug 22, 2024 at 06:39:33AM +, LEROY Christophe wrote:
> Le 22/08/2024 à 07:32, Christoph Hellwig a écrit :
> > On Thu, Aug 22, 2024 at 05:25:10AM +, LEROY Christophe wrote:
> >>> and this results in a call to dma_direct_allocation(), which has one
> >>> innocent looking memset():
> >>
> >>
> >> memset() can't be used on non-cached memory, memset_io() has to be used
> >> instead.
> > 
> > No, we use memset on uncached memory all the time.  Note that uncached
> > memory != __iomem memory, for which you DO have to use memset_io.
> > 
> 
> Then we have a subject here.
> 
> powerpc has a magic instruction 'dcbz' which clears a full cacheline in 
> one go. It is far more efficient than a loop to store zeros, and since 
> 2015 memset(0) has been implemented with that instruction (commit 
> 5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero"))
> 
> But that instruction generates an alignment exception when used on 
> non-cached memory (whether it is RAM or not doesn't matter).

What does "uncached memory" even mean here?  Literally it would be
I=1 memory (uncachEABLE memory), but more likely you want M=0 memory
here ("non-memory memory", "not well-behaved memory", MMIO often).

M=0 memory shouldn't ever have memset done on it, that is insane.  And
I=1 memory should not have the same optimised routines used, since
those only make things slower still.

> It is then 
> emulated by the kernel but it of course leads to a serious performance 
> degradation, hence the warning added by commit cbe654c77961 ("powerpc: 
> warn on emulation of dcbz instruction in kernel mode"). Until now it 
> helped identify and fix use of memset() on IO memory.
> 
> But if memset() is expected to be used with non-cached RAM, then I don't 
> know what to do. Any suggestion ?

If memset() is expected to be used with M=0, you cannot do any serious
optimisations to it at all.  If memset() is expected to be used with I=1
it should use a separate code path for it, probably the caller should
make the distinction.


Segher



Re: [PowerPC] [PASEMI] Issue with the identification of ATA drives after the of/irq updates 2024-05-29

2024-07-05 Thread Segher Boessenkool
On Fri, Jul 05, 2024 at 11:19:39AM +1000, Michael Ellerman wrote:
> + prom_printf("nemo: deleting interrupt-map properties\n");
> + rc = call_prom("interpret", 1, 1,
> +   " s\" /pxp@0,e000\" find-device"
> +   " s\" interrupt-map\" delete-property"
> +   " s\" interrupt-map-mask\" delete-property"
> +   " device-end");
> + prom_printf("nemo: interpret returned %d\n", rc);

It's very fragile no matter what.

If some scriupt does something bad, just add something better for it?
You can replace anything by just adding something with the same name!


Segher


Re: [PATCH] powerpc: Fixed duplicate copying in the early boot.

2024-06-18 Thread Segher Boessenkool
On Tue, Jun 18, 2024 at 10:12:54PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Mon, Jun 17, 2024 at 10:35:09AM +0800, Jinglin Wen wrote:
> >> +  cmplwi  cr0,r4,0/* runtime base addr is zero */
> >
> > Just write
> >cmpwi r4,0
> >
> > cr0 is the default, also implicit in many other instructions, please
> > don't clutter the source code.  All the extra stuff makes you miss the
> > things that do matter!
> >
> > The "l" is unnecessary, you only care about equality here after all.
> 
> In my mind it's an unsigned comparison, so I'd use cmpld, even though as
> you say all we actually care about is equality.

We want to know if it is zero or not, so in my mind "unsigned comparison"
does not apply at all, that is only for range checks.  Heh.

But it doesn't matter at all: if you think cmpld looks more natural / is
what you expect to see, then you should use cmpld, that is my point :-)


Segher


Re: [PATCH v5.10] powerpc/uaccess: Fix build errors seen with GCC 13/14

2024-06-17 Thread Segher Boessenkool
On Fri, Jun 14, 2024 at 09:27:14PM +1000, Michael Ellerman wrote:
> commit 2d43cc701b96f910f50915ac4c2a0cae5deb734c upstream.
> The 'std' instruction requires a 4-byte aligned displacement because
> it is a DS-form instruction, and as the assembler says, 18 is not a
> multiple of 4.

You learn something new every day :-)

> A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y.
> 
> The fix is to change the constraint on the memory operand to put_user(),
> from "m" which is a general memory reference to "YZ".
> 
> The "Z" constraint is documented in the GCC manual PowerPC machine
> constraints, and specifies a "memory operand accessed with indexed or
> indirect addressing". "Y" is not documented in the manual but specifies
> a "memory operand for a DS-form instruction". Using both allows the
> compiler to generate a DS-form "std" or X-form "stdx" as appropriate.

https://gcc.gnu.org/PR115289
It will be documented soon, thanks for the report!

> Although the build error is only seen with GCC 13/14, that appears
> to just be luck. The constraint has been incorrect since it was first
> added.

Yes, "m" allows any memory operand, an unaligned one is just fine.

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH] powerpc: Fixed duplicate copying in the early boot.

2024-06-17 Thread Segher Boessenkool
Hi!

On Mon, Jun 17, 2024 at 10:35:09AM +0800, Jinglin Wen wrote:
> + cmplwi  cr0,r4,0/* runtime base addr is zero */

Just write
   cmpwi r4,0

cr0 is the default, also implicit in many other instructions, please
don't clutter the source code.  All the extra stuff makes you miss the
things that do matter!

The "l" is unnecessary, you only care about equality here after all.

Should it not be cmpdi here, though?


Segher


Re: [kvm-unit-tests PATCH] build: retain intermediate .aux.o targets

2024-06-13 Thread Segher Boessenkool
On Fri, Jun 14, 2024 at 10:43:39AM +1000, Nicholas Piggin wrote:
> On Wed Jun 12, 2024 at 6:28 PM AEST, Segher Boessenkool wrote:
> > On Wed, Jun 12, 2024 at 02:42:32PM +1000, Nicholas Piggin wrote:
> > > arm, powerpc, riscv, build .aux.o targets with implicit pattern rules
> > > in dependency chains that cause them to be made as intermediate files,
> > > which get removed when make finishes. This results in unnecessary
> > > partial rebuilds. If make is run again, this time the .aux.o targets
> > > are not intermediate, possibly due to being made via different
> > > dependencies.
> > > 
> > > Adding .aux.o files to .PRECIOUS prevents them being removed and solves
> > > the rebuild problem.
> > > 
> > > s390x does not have the problem because .SECONDARY prevents dependancies
> > > from being built as intermediate. However the same change is made for
> > > s390x, for consistency.
> >
> > This is exactly what .SECONDARY is for, as its documentation says,
> > even.  Wouldn't it be better to just add a .SECONDARY to the other
> > targets as well?
> 
> Yeah we were debating that and agreed .PRECIOUS may not be the
> cleanest fix but since we already use that it's okay for a
> minimal fix.

But why add it to s390x then?  It is not a fix there at all!


Segher


Re: [kvm-unit-tests PATCH] build: retain intermediate .aux.o targets

2024-06-12 Thread Segher Boessenkool
On Wed, Jun 12, 2024 at 02:42:32PM +1000, Nicholas Piggin wrote:
> arm, powerpc, riscv, build .aux.o targets with implicit pattern rules
> in dependency chains that cause them to be made as intermediate files,
> which get removed when make finishes. This results in unnecessary
> partial rebuilds. If make is run again, this time the .aux.o targets
> are not intermediate, possibly due to being made via different
> dependencies.
> 
> Adding .aux.o files to .PRECIOUS prevents them being removed and solves
> the rebuild problem.
> 
> s390x does not have the problem because .SECONDARY prevents dependancies
> from being built as intermediate. However the same change is made for
> s390x, for consistency.

This is exactly what .SECONDARY is for, as its documentation says,
even.  Wouldn't it be better to just add a .SECONDARY to the other
targets as well?


Segher


Re: [PATCH] powerpc: vdso: fix building with wrong-endian toolchain

2024-06-07 Thread Segher Boessenkool
On Fri, Jun 07, 2024 at 10:42:44PM +1000, Michael Ellerman wrote:
> I use the korg toolchains every day, and kisskb uses them too.
> 
> What commit / defconfig are you seeing the errors with?
> 
> Is it just the 12.3.0 toolchain or all of them? I just tested 12.3.0
> here and it built OK.
> 
> I guess you're building on x86 or arm64? I build on ppc64le, I wonder if
> that makes a difference.

The core problem of course is pre-processing a linker script with the
C preprocessor (although linker scripts themselves have much more
capable facilities for this), and by doing this as-if it was a piece of
assembler code that for some strange reason you want fed through the C
preprocessor (as .S file).

What is it the C preprocessor is wanted for here?  Is there nothing
better that can be done?

> The patch is probably OK regardless, but I'd rather understand what the
> actual problem is.

Yeah.  The problem was found later in the thread (the CPP env var, or
shell var anyway, not sure what it is here) was set.  Fun and
surprising!  If you do nnot like such fun all that much, reduce the
surface of eternal surprise?  (I don't like saying "attack surface", but
that is what it is as well).


Segher


Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-16 Thread Segher Boessenkool
Hi!

On Thu, May 16, 2024 at 10:06:58PM +1000, Michael Ellerman wrote:
> Andy Polyakov  writes:
> >>> +.abiversion  2
> >>
> >> I'd prefer that was left to the compiler flags.
> >
> > Problem is that it's the compiler that is responsible for providing this
> > directive in the intermediate .s prior invoking the assembler. And there
> > is no assembler flag to pass through -Wa.
> 
> Hmm, right. But none of our existing .S files include .abiversion
> directives.
> 
> We build .S files with gcc, passing -mabi=elfv2, but it seems to have no
> effect.

Yup.  You coulds include some header file, maybe?  Since you run the
assembler code through the C preprocessor anyway, for some weird reason :-)

> But the actual code follows ELFv2, because we wrote it that way, and I
> guess the linker doesn't look at the actual ABI version of the .o ?

It isn't a version.  It is an actual different ABI.

GNU LD allows linking together whatever, yes.

> Is .abiversion documented anywhere? I can't see it in the manual.

Yeah me neither.  https://sourceware.org/bugzilla/enter_bug.cgi ?
A commandline flag (to GAS) would seem best?


Segher


Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

2024-05-16 Thread Segher Boessenkool
On Wed, May 15, 2024 at 10:29:56AM +0200, Andy Polyakov wrote:
> >+static void cswap(fe51 p, fe51 q, unsigned int bit)
> 
> The "c" in cswap stands for "constant-time," and the problem is that 
> contemporary compilers have exhibited the ability to produce 
> non-constant-time machine code as result of compilation of the above 
> kind of technique.

This can happen with *any* comnpiler, on *any* platform.  In general,
you have to write machine code if you want to be sure what machine code
will eventually be executed.

>  The outcome is platform-specific and ironically some 
> of PPC code generators were observed to generate "most" 
> non-constant-time code. "Most" in sense that execution time variations 
> would be most easy to catch. One way to work around the problem, at 
> least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
> you calculate 'c'. But there is no guarantee that the next compiler 
> version won't see through it, hence the permanent solution is to do it 
> in assembly. I can put together something...

Such tricks can help ameliorate the problem, sure.  But it is not a
solution ever.


Segher


Re: [PATCH 3/3] powerpc: Check only single values are passed to CPU/MMU feature checks

2024-05-10 Thread Segher Boessenkool
On Fri, May 10, 2024 at 04:45:37PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Thu, May 09, 2024 at 10:12:48PM +1000, Michael Ellerman wrote:
> >> cpu_has_feature()/mmu_has_feature() are only able to check a single
> >> feature at a time, but there is no enforcement of that.
> >> 
> >> In fact, as fixed in the previous commit, there was code that was
> >> passing multiple values to cpu_has_feature().
> >> 
> >> So add a check that only a single feature is passed using popcount.
> >> 
> >> Note that the test allows 0 or 1 bits to be set, because some code
> >> relies on cpu_has_feature(0) being false, the check with
> >> CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE.
> >
> > This btw is exactly
> >
> > BUILD_BUG_ON(feature & (feature - 1));
> >
> > but the popcount is more readable :-)
> 
> Yeah for those of us who don't see bits cascading in our sleep I think
> the popcount is easier to understand ;)

Absolutely :-)  This is just one of the most well-known bittricks, for
seeing if x is a power of two you write

  x && x & (x-1)

but here you do not need to test for x not being zero.  Hardly ever get
to use that simpler thing, so it jumped out at me here :-)

[ For understanding the x & (x-1) thing, it is perhaps easiest if you
first consider an x with more than one bit set: x-1 will have the same
topmost set bit. ]


Segher


Re: [PATCH 3/3] powerpc: Check only single values are passed to CPU/MMU feature checks

2024-05-09 Thread Segher Boessenkool
On Thu, May 09, 2024 at 10:12:48PM +1000, Michael Ellerman wrote:
> cpu_has_feature()/mmu_has_feature() are only able to check a single
> feature at a time, but there is no enforcement of that.
> 
> In fact, as fixed in the previous commit, there was code that was
> passing multiple values to cpu_has_feature().
> 
> So add a check that only a single feature is passed using popcount.
> 
> Note that the test allows 0 or 1 bits to be set, because some code
> relies on cpu_has_feature(0) being false, the check with
> CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE.

This btw is exactly

BUILD_BUG_ON(feature & (feature - 1));

but the popcount is more readable :-)


Segher


Re: powerpc: io-defs.h:43:1: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]

2024-04-16 Thread Segher Boessenkool
On Tue, Apr 16, 2024 at 03:02:52PM +0530, Naresh Kamboju wrote:
> In file included from arch/powerpc/include/asm/io.h:672:
> arch/powerpc/include/asm/io-defs.h:43:1: error: performing pointer
> arithmetic on a null pointer has undefined behavior
> [-Werror,-Wnull-pointer-arithmetic]

It is not UB, but just undefined: the program is meaningless.

It is not a null pointer but even a null pointer constant here.  It
matters in places, including here.

It would help if the warnings were more correct :-(


Segher


Re: Appropriate liburcu cache line size for Power

2024-03-26 Thread Segher Boessenkool
On Tue, Mar 26, 2024 at 06:19:38PM +1100, Michael Ellerman wrote:
> Mathieu Desnoyers  writes:
> The ISA doesn't specify the cache line size, other than it is smaller
> than a page.

It also says it is "aligned".  Nowhere is it said what an aligned size
is, but it seems clear it has to be a power of two.

> In practice all the 64-bit IBM server CPUs I'm aware of have used 128
> bytes.

Yup.  It is 128B on p3 already.

> It is possible to discover at runtime via AUXV headers. But that's no
> use if you want a compile-time constant.

The architecture does not require the data block size to be equal to the
instruction block size, already.  But many programs subscribe to an
overly simplified worldview, which is a big reason everything is 128B on
all modern PowerPC.

It is quite a nice tradeoff size, there has to be a huge change in the
world for this to ever change :-)


Segher


Re: Appropriate liburcu cache line size for Power

2024-03-25 Thread Segher Boessenkool
On Mon, Mar 25, 2024 at 03:34:30PM -0500, Nathan Lynch wrote:
> Mathieu Desnoyers  writes:
> For what it's worth, I found a copy of an IBM Journal of Research &
> Development article confirming that POWER5's L3 had a 256-byte line
> size:
> 
>   Each slice [of the L3] is 12-way set-associative, with 4,096
>   congruence classes of 256-byte lines managed as two 128-byte sectors
>   to match the L2 line size.
> 
> https://www.eecg.utoronto.ca/~moshovos/ACA08/readings/power5.pdf
> 
> I don't know of any reason to prefer 256 over 128 for current Power
> processors though.

The reason some old CPUs use bigger physical cache line sizes is to have
fewer cache lines, which speeds up lookup, or reduces power consumption
of lookup, or both.  This isn't trivial at all when implemented as a
parallel read and compare of all tags, which was the usual way to do
things long ago.

Nowadays usually a way predictor is used, severely limiting the number
of tags to be compared.  So we can use a 128B physical line size always
now.  Note that this was physical only, everything looked like 128B on
a P5 system as well.

P5 wasn't first like this fwiw, look at the L2 on a 604 for example :-)


Segher


Re: [PATCH 2/3] tools/erf/util/annotate: Set register_char and memory_ref_char for powerpc

2024-03-09 Thread Segher Boessenkool
All instructions with a primary opcode from 32 to 63 are memory insns,
and no others.  It's trivial to see whether it is a load or store, too
(just bit 3 of the insn).  Trying to parse disassembled code is much
harder, and you easily make some mistakes here.

On Sat, Mar 09, 2024 at 12:55:12PM +0530, Athira Rajeev wrote:
> To identify if the instruction has any memory reference,
> "memory_ref_char" field needs to be set for specific architecture.
> Example memory instruction:
> lwz r10,0(r9)
> 
> Here "(" is the memory_ref_char. Set this as part of arch->objdump

What about "lwzx r10,0,r19", semantically exactly the same instruction?
There is no paren in there.  Not all instructions using parens are
memory insns, either, not in assembler code at least.

> To get register number and access offset from the given instruction,
> arch->objdump.register_char is used. In case of powerpc, the register
> char is "r" (since reg names are r0 to r31). Hence set register_char
> as "r".

cr0..cr7
r0..r31
f0..f31
v0..v31
vs0..vs63
and many other spellings.  Plain "0..63" is also fine.

The "0" in my lwzx example is a register field as well (and it stands
for "no register", not for "r0").  Called "(RA|0)" usually (incidentally,
see the parens there as well, oh joy).

Don't you have the binary code here as well, not just a disassembled
representation of it?  It is way easier to just use that, and you'll get
much better results that way :-)


Segher


Re: [PATCH 5/5] powerpc: Remove cpu-as-y completely

2024-02-29 Thread Segher Boessenkool
On Thu, Feb 29, 2024 at 11:25:21PM +1100, Michael Ellerman wrote:
> From: Christophe Leroy 

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH 4/5] powerpc/fsl: Modernise mt/mfpmr

2024-02-29 Thread Segher Boessenkool
On Thu, Feb 29, 2024 at 11:25:20PM +1100, Michael Ellerman wrote:
> With the addition of the machine directives, these are no longer simple
> 1-2 liner macros. So modernise them to be static inlines and use named
> asm parameters.
> 
> Signed-off-by: Michael Ellerman 

You got rid of the __stringify blight as well.  Great :-)

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH 2/5] powerpc/64s: Use .machine power4 around dcbt

2024-02-29 Thread Segher Boessenkool
On Thu, Feb 29, 2024 at 11:25:18PM +1100, Michael Ellerman wrote:
> There are multiple decodings for the "dcbt" mnemonic, so the assembler
> has to pick one.
> 
> That requires passing -many to the assembler, which is not recommended.
> 
> Without -many the clang 14 / binutils 2.38 build fails with:
> 
>   arch/powerpc/kernel/exceptions-64s.S:2976: Error: junk at end of line: 
> `0b01010'
>   clang: error: assembler command failed with exit code 1 (use -v to see 
> invocation)
> 
> Fix it by adding .machine directives around the use of dcbt to specify
> which encoding is desired.
> 
> Signed-off-by: Michael Ellerman 

Looks good, thanks!

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH] tty: hvc-iucv: fix function pointer casts

2024-02-14 Thread Segher Boessenkool
On Wed, Feb 14, 2024 at 11:37:21AM +0100, Greg Kroah-Hartman wrote:
> On Wed, Feb 14, 2024 at 09:46:33AM +, David Laight wrote:
> > From: Segher Boessenkool
> > > Sent: 13 February 2024 19:13
> > > 
> > > On Tue, Feb 13, 2024 at 11:17:49AM +0100, Arnd Bergmann wrote:
> > > > clang warns about explicitly casting between incompatible function
> > > > pointers:
> > > >
> > > > drivers/tty/hvc/hvc_iucv.c:1100:23: error: cast from 'void (*)(const 
> > > > void *)' to 'void (*)(struct
> > > device *)' converts to incompatible function type 
> > > [-Werror,-Wcast-function-type-strict]
> > > >  1100 | priv->dev->release = (void (*)(struct device *)) kfree;
> > > >   |  ^
> > > 
> > > Such a cast of course is explicitly allowed by 6.3.2.3/8, only calling a
> > > function using a non-compatible type is UB.  This warning message is
> > > quite misleading.  Doubly so because of the -Werror, as always.
> > 
> > But it will get called using the wrong type.
> > And (is it) fine-ibt will reject the incorrect call.
> 
> And rightfully so, this type of casting abuse is just that, abuse.
> 
> Almost no one should be just calling kfree() on a device pointer, I'll
> look at the surrounding code as odds are something odd is going on.  But
> for now, this patch is correct.

Yes, and I said so.  My point is that the warning message pretends the
cast is bad or dangerous.  It is not, similar casts are the only way in
C to do certain things (yes, you can always avoid it completely, by
writing completely different code, like the patch does, and that
sometimes is a better idea even).

But the warning message is misleading and does more damage than it helps
avoid.


Segher


Re: [PATCH] tty: hvc-iucv: fix function pointer casts

2024-02-13 Thread Segher Boessenkool
On Tue, Feb 13, 2024 at 11:17:49AM +0100, Arnd Bergmann wrote:
> clang warns about explicitly casting between incompatible function
> pointers:
> 
> drivers/tty/hvc/hvc_iucv.c:1100:23: error: cast from 'void (*)(const void *)' 
> to 'void (*)(struct device *)' converts to incompatible function type 
> [-Werror,-Wcast-function-type-strict]
>  1100 | priv->dev->release = (void (*)(struct device *)) kfree;
>   |  ^

Such a cast of course is explicitly allowed by 6.3.2.3/8, only calling a
function using a non-compatible type is UB.  This warning message is
quite misleading.  Doubly so because of the -Werror, as always.

Your proposed new code of course is nice and simple (albeit a bit bigger
than it was before, both source and binary).  Such is life ;-)


Segher


Re: [PATCH] powerpc: Add gpr1 and fpu save/restore functions

2024-02-12 Thread Segher Boessenkool
On Mon, Feb 12, 2024 at 12:07:03PM -0600, Timothy Pearson wrote:
> > I have done it for *all* architectures some ten years ago.  Never found
> > any problem.
> 
> That makes sense, what I mean by invasive is that we'd need buy-in from the 
> other
> maintainers across all of the affected architectures.  Is that likely to 
> occur?

I don't know.  Here is my PowerPC-specific patch, it's a bit older, it
might not apply cleanly anymore, the changes needed should be obvious
though:


=== 8< ===
commit f16dfa5257eb14549ce22243fb2b465615085134
Author: Segher Boessenkool 
Date:   Sat May 3 03:48:06 2008 +0200

powerpc: Link vmlinux against libgcc.a

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index b7212b619c52..0a2fac6ffc1c 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -158,6 +158,9 @@ core-y  += arch/powerpc/kernel/ 
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 
+LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
+libs-y += $(LIBGCC)
+
 drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/
 
 # Default to zImage, override when needed
=== 8< ===


> > There are better options than -Os, fwiw.  Some --param's give smaller
> > *and* faster kernels.  What exactly is best is heavily arch-dependent
> > though (as well as dependent on the application code, the kernel code in
> > this case) :-(
> 
> I've been through this a few times, and -Os is the only option that makes
> things (just barely) fit unfortunately.

-O2 with appropriate inlining tuning beats -Os every day of the week,
in my experience.


Segher


Re: [PATCH] powerpc: Add gpr1 and fpu save/restore functions

2024-02-12 Thread Segher Boessenkool
On Mon, Feb 12, 2024 at 11:46:19AM -0600, Timothy Pearson wrote:
> Interesting, that make sense.
> 
> How should we proceed from the current situation?  Bringing in libgcc seems
> like a fairly invasive change,

I have done it for *all* architectures some ten years ago.  Never found
any problem.

> should we merge this to fix the current bug
> (cannot build ppc64 kernel in size-optimized mode) and start discussion on
> bringing in libgcc as the long-term fix across multiple architectures?
> 
> My goal here is to not have to carry a downstream patch in perpetuity for
> our embedded Linux firmware, which needs to be compiled in size-optimized
> mode due to hardware Flash limitations.

There are better options than -Os, fwiw.  Some --param's give smaller
*and* faster kernels.  What exactly is best is heavily arch-dependent
though (as well as dependent on the application code, the kernel code in
this case) :-(


Segher


Re: [PATCH] powerpc: Add gpr1 and fpu save/restore functions

2024-02-12 Thread Segher Boessenkool
On Mon, Feb 12, 2024 at 11:09:38AM -0600, Timothy Pearson wrote:
> There is existing code in the kernel right now to provide support functions 
> for gpr0 and altivec save/restore.  I don't know the full story here, but at 
> some point in the kernel's history it seems to have been decided to provide 
> the helper functions in lieu of linking libgcc directly.  If this is 
> incorrect, then I need to know that so I can rework the patch to enable libcc 
> and remove the existing support functions.
> 
> Is there anyone on-list that knows more of the history and decision-making 
> that went into the current state of the kernel here?

Long long time ago, linux-0.11 or something, it was discovered that some
programmiing mistakes resulted in double-length divisions (64x64->64 on
32-bit systems, say).  Most architectures have no hardware support for
that, x86 is one of those; so you need very expensive support routines
to do that (_udivdi3 or _divdi3 in that case, ...ti3 on 64-bit archs).

So it was decided to not link to libgcc to avoid this.  But that means
that all the extremely many other suppoort routines, more for some other
archs, are also not there.  While it would have been much easier to just
link to something that provides the _{u,}divdi3 symbol and then causes a
forced linking error from that!


Segher


Re: [PATCH] powerpc: Add gpr1 and fpu save/restore functions

2024-02-12 Thread Segher Boessenkool
On Mon, Feb 12, 2024 at 10:41:18AM -0600, Timothy Pearson wrote:
> Implement gpr1 and fpu save/restore functions per the ABI v2 documentation.

There is no "ABI v2".  This is the ELFv2 ABI, it is a name, it is not a
version 2 of anything (in fact, it is version 1 everywhere).

The same functions are needed and used in other ABIs, too.

But, why do this patch?  You just need

+LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)

+libs-y += $(LIBGCC)

and nothing more.  It is required for proper functioning of GCC to link
with the libgcc support library.


Segher


Re: [PATCH] powerpc/64: Set LR to a non-NULL value in task pt_regs on scv entry

2024-01-25 Thread Segher Boessenkool
Hi!

On Thu, Jan 25, 2024 at 05:12:28PM +0530, Naveen N Rao wrote:
> diff --git a/arch/powerpc/kernel/interrupt_64.S 
> b/arch/powerpc/kernel/interrupt_64.S
> index bd863702d812..5cf3758a19d3 100644
> --- a/arch/powerpc/kernel/interrupt_64.S
> +++ b/arch/powerpc/kernel/interrupt_64.S
> @@ -53,6 +53,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
>   ld  r1,PACAKSAVE(r13)
>   std r10,0(r1)
>   std r11,_NIP(r1)
> + std r11,_LINK(r1)

Please add a comment here then, saying what the store is for?


Segher


Re: [PATCH] powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large

2024-01-09 Thread Segher Boessenkool
On Tue, Jan 09, 2024 at 03:15:35PM +, Christophe Leroy wrote:
> >   CFLAGS-$(CONFIG_PPC64)+= $(call cc-option,-mcall-aixdesc)
> >   endif
> >   endif
> > -CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
> > cc-option,-mminimal-toc))
> > +CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium)
> 
> Should we still use $(call cc-option  here ?
> As we only deal with medium model now, shouldn't we make it such that it 
> fails in case the compiler doesn't support -mcmodel=medium ?

The -mcmodel= flag has been supported since 2010.  The kernel requires
a GCC from 2015 or later (GCC 5.1 is the minimum).  -mcmodel=medium is
(and always has been) the default, so it is always supported, yes.


Segher


Re: [PATCH 4/4] powerpc/Makefile: Auto detect cross compiler

2023-12-06 Thread Segher Boessenkool
On Wed, Dec 06, 2023 at 10:55:48PM +1100, Michael Ellerman wrote:
> Look for various combinations, matching:
>   powerpc(64(le)?)?(-unknown)?-linux(-gnu)?-
> 
> There are more possibilities, but the above is known to find a compiler
> on Fedora and Ubuntu (which use linux-gnu-), and also detects the
> kernel.org cross compilers (which use linux-).

$ sh ../config.sub powerpc64-linux
powerpc64-unknown-linux-gnu

I am very very lazy, so buildall uses short names everywhere :-)

Btw, can you build the kernel for a config that differs in 32/64 or
BE/LE with the default your compiler uses?  There is no reason this
shouldn't work (you don't use any system libraries), but you need to
use the correct compiler command-line flags for that always.

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH 2/4] powerpc/vdso: No need to undef powerpc for 64-bit build

2023-12-06 Thread Segher Boessenkool
On Wed, Dec 06, 2023 at 10:55:46PM +1100, Michael Ellerman wrote:
> But the 64-bit compiler doesn't define powerpc in the first place,

Yes, only __powerpc__ (and __powerpc64__) :-)


Segher


Re: [pci:controller/xilinx-xdma] BUILD REGRESSION 8d786149d78c7784144c7179e25134b6530b714b

2023-10-31 Thread Segher Boessenkool
On Tue, Oct 31, 2023 at 09:56:00AM -0500, Bjorn Helgaas wrote:
> That said, the unused functions do look legit:
> 
> grackle_set_stg() is a static function and the only call is under
> "#if 0".

It is a static inline.  It is *normal* that that is uncalled.  It is
very similar to a macro that has no invocation.  The warning is
overeager.


Segher


Re: [PATCH 0/7] arch/*: config: Remove ReiserFS from defconfig

2023-09-19 Thread Segher Boessenkool
On Tue, Sep 19, 2023 at 12:00:34AM +, Peter Lafreniere wrote:
> On Monday, September 18th, 2023 at 19:41, Segher Boessenkool 
>  wrote:
> > On Mon, Sep 18, 2023 at 05:56:09PM +, Peter Lafreniere wrote:
> > 
> > > ReiserFS has been considered deprecated for 19 months since commit
> > > eb103a51640e ("reiserfs: Deprecate reiserfs"). However, there are
> > > several architectures that still build it into their defconfig kernels.
> > > 
> > > As ReiserFS will be removed in 2025, delete all ReiserFS-related options
> > > from defconfig files before the filesystem's removal.
> > 
> > 
> > This is essentially equivalent to deleting the filesystem now. Why do
> > this? Is there such a hurry?
> 
> This is not equivalent to deleting the filesystem. The filesystem can still
> be configured into kernels, and few distros use a defconfig kernel anyway.

Most people who compile kernels use defconfigs though.  Distros are a
tiny minority if you look at builds.

Again: why do you want this?


Segher


Re: [PATCH 0/7] arch/*: config: Remove ReiserFS from defconfig

2023-09-18 Thread Segher Boessenkool
On Mon, Sep 18, 2023 at 05:56:09PM +, Peter Lafreniere wrote:
> ReiserFS has been considered deprecated for 19 months since commit
> eb103a51640e ("reiserfs: Deprecate reiserfs"). However, there are
> several architectures that still build it into their defconfig kernels.
> 
> As ReiserFS will be removed in 2025, delete all ReiserFS-related options
> from defconfig files before the filesystem's removal.

This is essentially equivalent to deleting the filesystem now.  Why do
this?  Is there such a hurry?


Segher


Re: [PATCH v1 10/21] powerpc/kexec: refactor for kernel/Kconfig.kexec

2023-06-15 Thread Segher Boessenkool
On Thu, Jun 15, 2023 at 01:34:25PM +1000, Michael Ellerman wrote:
> Eric DeVolder  writes:
> > -config KEXEC_FILE
> > -   bool "kexec file based system call"
> > -   select KEXEC_CORE
> > -   select HAVE_IMA_KEXEC if IMA
> > -   select KEXEC_ELF
> > -   depends on PPC64
> > -   depends on CRYPTO=y
> > -   depends on CRYPTO_SHA256=y
> ...
> > +
> > +config ARCH_HAS_KEXEC_FILE
> > +   def_bool PPC64 && CRYPTO && CRYPTO_SHA256
> 
> The =y's got lost here.
> 
> I think they were both meaningful, because both options are tristate. So
> this previously required them to be built-in (=y), whereas after your
> patch it will allow them to be modules.
> 
> I don't know for sure that those options need to be built-in, but that's
> what the code does now, so this patch shouldn't change it, at least
> without an explanation.

This patch shouldn't change it at all, period.  If you want to change it
(and that sounds like a good idea, if it is possible anyway), that
should be a separate patch.


Segher


Re: Passing the complex args in the GPR's

2023-06-06 Thread Segher Boessenkool
Hi!

On Tue, Jun 06, 2023 at 08:35:22PM +0530, Umesh Kalappa wrote:
> Hi Adnrew,
> Thank you for the quick response and for PPC64 too ,we do have
> mismatches in ABI b/w complex operations like
> https://godbolt.org/z/bjsYovx4c .
> 
> Any reason why GCC chose to use GPR 's here ?

What did you expect, what happened instead?  Why did you expect that,
and why then is it an error what did happen?

You used -O0.  As long as the code works, all is fine.  But unoptimised
code frequently is hard to read, please use -O2 instead?

As Andrew says, why did you use -m32 for GCC but -m64 for LLVM?  It is
hard to compare those at all!  32-bit PowerPC Linux ABI (based on 32-bit
PowerPC ELF ABI from 1995, BE version) vs. 64-bit ELFv2 ABI from 2015
(LE version).


Segher


Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail

2023-04-24 Thread Segher Boessenkool
On Mon, Apr 24, 2023 at 08:28:55AM -0700, Boqun Feng wrote:
> On Mon, Apr 24, 2023 at 10:13:51AM -0500, Segher Boessenkool wrote:
> > At what points can r13 change?  Only when some particular functions are
> > called?
> 
> r13 is the local paca:
> 
>   register struct paca_struct *local_paca asm("r13");
> 
> , which is a pointer to percpu data.

Yes, it is a global register variable.

> So if a task schedule from one CPU to anotehr CPU, the value gets
> changed.

But the compiler does not see that something else changes local_paca (or
r13 some other way, via assembler code perhaps)?  Or is there a compiler
bug?

If the latter is true:

Can you make a reproducer and open a GCC PR?  <https://gcc.gnu.org/bugs/>
for how to get started doing that.  We need *exact* code that shows the
problem, together with a compiler command line.  So that we can
reproduce the problem.  That is step 0 in figuring out what is going on,
and then maybe fixing the problem :-)


Segher


Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail

2023-04-24 Thread Segher Boessenkool
Hi!

On Mon, Apr 24, 2023 at 11:14:00PM +1000, Michael Ellerman wrote:
> Boqun Feng  writes:
> > On Sat, Apr 22, 2023 at 09:28:39PM +0200, Joel Fernandes wrote:
> >> On Sat, Apr 22, 2023 at 2:47 PM Zhouyi Zhou  wrote:
> >> > by debugging, I see the r10 is assigned with r13 on c0226eb4,
> >> > but if there is a context-switch before c0226edc, a false
> >> > positive will be reported.

> I've never understood why the compiler wants to make a copy of a
> register variable into another register!? >:#

It is usually because a) you told it to (maybe via an earlyclobber), or
b) it looked cheaper.  I don't see either here :-(

> > here I think that the compiler is using r10 as an alias to r13, since
> > for userspace program, it's safe to assume the TLS pointer doesn't
> > change. However this is not true for kernel percpu pointer.

r13 is a "fixed" register, but that means it has a fixed purpose (so not
available for allocation), it does not mean "unchanging".

> > The real intention here is to compare 40(r1) vs 3192(r13) for stack
> > guard checking, however since r13 is the percpu pointer in kernel, so
> > the value of r13 can be changed if the thread gets scheduled to a
> > different CPU after reading r13 for r10.
> 
> Yeah that's not good.

The GCC pattern here makes the four machine insns all stay together.
That is to make sure to not leak any secret value, which is impossible
to guarantee otherwise.

What tells GCC r13 can randomly change behind its back?  And, what then
makes GCC ignore that fact?

> > +   asm volatile("" : : : "r13", "memory");

Any asm without output is always volatile.

> > Needless to say, the correct fix is to make ppc stack protector aware of
> > r13 is volatile.
> 
> I suspect the compiler developers will tell us to go jump :)

Why would r13 change over the course of *this* function / this macro,
why can this not happen anywhere else?

> The problem of the compiler caching r13 has come up in the past, but I
> only remember it being "a worry" rather than causing an actual bug.

In most cases the compiler is smart enough to use r13 directly, instead
of copying it to another reg and then using that one.  But not here for
some strange reason.  That of course is a very minor generated machine
code quality bug and nothing more :-(

> We've had the DEBUG_PREEMPT checks in get_paca(), which have given us at
> least some comfort that if the compiler is caching r13, it shouldn't be
> doing it in preemptable regions.
> 
> But obviously that doesn't help at all with the stack protector check.
> 
> I don't see an easy fix.
> 
> Adding "volatile" to the definition of local_paca seems to reduce but
> not elimate the caching of r13, and the GCC docs explicitly say *not* to
> use volatile. It also triggers lots of warnings about volatile being
> discarded.

The point here is to say some code clobbers r13, not the asm volatile?

> Or something simple I haven't thought of? :)

At what points can r13 change?  Only when some particular functions are
called?


Segher


Re: [PATCH] powerpc/64: Always build with 128-bit long double

2023-04-05 Thread Segher Boessenkool
Hi!

On Wed, Apr 05, 2023 at 03:32:21PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote:
> >> The amdgpu driver builds some of its code with hard-float enabled,
> >> whereas the rest of the kernel is built with soft-float.
> >> 
> >> When building with 64-bit long double, if soft-float and hard-float
> >> objects are linked together, the build fails due to incompatible ABI
> >> tags.
> >
> >> Currently those build errors are avoided because the amdgpu driver is
> >> gated on 128-bit long double being enabled. But that's not a detail the
> >> amdgpu driver should need to be aware of, and if another driver starts
> >> using hard-float the same problem would occur.
> >
> > Well.  The kernel driver either has no business using long double (or
> > any other floating point even) at all, or it should know exactly what is
> > used: double precision, double-double, or quadruple precision.  Both of
> > the latter two are 128 bits.
> 
> In a perfect world ... :)

Well, without it knowing what exactly it calculates, does this code have
any business running in kernel space?  Is it acceptable to just do
random things in the kernel?  I don't know the kernel code that uses
long double at all (and I'm afraid to look for fear of going blind), but
all this sounds like the 64-bit IEEE double precision floating point is
not good enough for some certain calculation, but 80-bit extended double
precision as used on x86 is.  That does make it likely that both of our
128-bit formats would work, but there are lots and lots of "buts".  To
start with, what does that code require wrt fp contraction (so, floating
multiply-add)?

All of this suggests that there should not be floating point code here
*at all*, it is harder to use it in any acceptable way than to just do
things in fixed point or scaled integer or whatever.

> >> All versions of the 64-bit ABI specify that long-double is 128-bits.
> >> However some compilers, notably the kernel.org ones, are built to use
> >> 64-bit long double by default.
> >
> > Mea culpa, I suppose?  But buildall doesn't force 64 bit explicitly.
> > I wonder how this happened?  Is it maybe a problem in the powerpc64le
> > config in GCC itself?
> 
> Not blaming anyone, just one of those things that happens.

Oh I didn't say anyone is blaming me.  I want to fix the problem, that
is all :-)

> The
> toolchains the distros (Ubuntu/Fedora) build all seem to use 128, but
> possibly that's because someone told them to configure them that way at
> some point.

No, or yes, depending on how you look at it?  Default configurations all
have 128-bit long double.  But buildall uses (almost) the same
configuration on all targets, namely:

$GCC_SRC/configure \
--target=$TARGET --enable-targets=all --prefix=$PREFIX \
--enable-languages=c --without-headers --disable-bootstrap \
--disable-nls --disable-threads --disable-shared \
--disable-libmudflap --disable-libssp --disable-libgomp \
--disable-decimal-float --disable-libquadmath \
--disable-libatomic --disable-libcc1 --disable-libmpx

All of this is perfectly reasonable imnsho, but I guess the
--enable-targets=all causes the problem here?  That makes no sense, but
it is still my best guess.

> > I have a patch from summer last year (Arnd's
> > toolchains are built without it) that does
> > +   powerpc64le-*)  TARGET_GCC_CONF=--with-long-double-128
> > Unfortunately I don't remember why I did that, and I never investigated
> > what the deeper problem is :-/
> 
> Last summer (aka winter)

Oh right.  Last July :-)

> is when we first discovered this issue with the
> long double size being implicated.
> 
> See:
>   https://git.kernel.org/torvalds/c/c653c591789b3acfa4bf6ae45d5af4f330e50a91
> 
> So I guess that's what prompted your patch?

It was one day before my patch, maybe less than 12h even, so that could
be.  I don't update the kernel source automatically though (there are
50 to 100 build breaks every year, when things are in decent state I
tend to keep it for a while).  But it may have been our patches are due
to the same cause, and mine is no longer needed?  That would be nice.  I
never committed that patch (or there would be more context, sigh).

I'll dig, there is a real problem in the compiler it seems.  Thanks for
the help so far!


Segher


Re: [PATCH] powerpc/64: Always build with 128-bit long double

2023-04-04 Thread Segher Boessenkool
Hi!

On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote:
> The amdgpu driver builds some of its code with hard-float enabled,
> whereas the rest of the kernel is built with soft-float.
> 
> When building with 64-bit long double, if soft-float and hard-float
> objects are linked together, the build fails due to incompatible ABI
> tags.

> Currently those build errors are avoided because the amdgpu driver is
> gated on 128-bit long double being enabled. But that's not a detail the
> amdgpu driver should need to be aware of, and if another driver starts
> using hard-float the same problem would occur.

Well.  The kernel driver either has no business using long double (or
any other floating point even) at all, or it should know exactly what is
used: double precision, double-double, or quadruple precision.  Both of
the latter two are 128 bits.

> All versions of the 64-bit ABI specify that long-double is 128-bits.
> However some compilers, notably the kernel.org ones, are built to use
> 64-bit long double by default.

Mea culpa, I suppose?  But builddall doesn't force 64 bit explicitly.
I wonder how this happened?  Is it maybe a problem in the powerpc64le
config in GCC itself?  I have a patch from summer last year (Arnd's
toolchains are built without it) that does
+   powerpc64le-*)  TARGET_GCC_CONF=--with-long-double-128
Unfortunately I don't remember why I did that, and I never investigated
what the deeper problem is :-/

In either case, the kernel should always use specific types, not rely on
the toolchain to pick a type that may or may not work.  The correct size
floating point type alone is not enough, but it is a step in the right
direction certainly.

Reviewed-by: Segher Boessenkool 


Segher


Re: arch/powerpc/kernel/head_85xx.o: warning: objtool: .head.text+0x1a6c: unannotated intra-function call

2023-01-26 Thread Segher Boessenkool
Hi!

On Wed, Jan 25, 2023 at 12:57:35PM +0530, Naveen N. Rao wrote:
> Sathvika Vasireddy wrote:
> >--- a/arch/powerpc/kvm/booke.c
> >+++ b/arch/powerpc/kvm/booke.c
> >@@ -917,7 +917,9 @@ static void kvmppc_fill_pt_regs(struct pt_regs *regs)
> >     asm("mr %0, 1" : "=r"(r1));
> >     asm("mflr %0" : "=r"(lr));
> >     asm("mfmsr %0" : "=r"(msr));
> >+   asm(".pushsection .discard.intra_function_calls; .long 999f; 
> >.popsection; 999:");
> >     asm("bl 1f; 1: mflr %0" : "=r"(ip));
> 
> I don't think you can assume that there won't be anything in between two 
> asm statements.

It would be a false assumption.  There is nothing that stops the
compiler from moving, duplicating, or even removing these statements
(removing only if no outputs from the asm are required of course).

> Does it work if you combine both the above asm 
> statements into a single one?
> 
> Even if that works, I don't think it is good to expand the macro here.  
> That asm statement looks to be trying to grab the current nip. I don't 
> know enough about that code, and someone who knows more about KVM may be 
> able to help, but it looks like we should be able to simply set 'ip' to 
> the address of kvmppc_fill_pt_regs()?

Such things are much better as actual assembler code (like, a .s file).
You have to be certain the compiler doesn't transform this in unexpected
ways, like, copy and move it to all callers for example.  You need the
mfmsr to remain somewhat in place for example.

A big reason to not want inline asm for things like this is you need so
very many operands in a single asm that way; it becomes very hard to
write, esp. if you want it to be correct code as well.  That is a good
hint there are better way to do this ;-)


Segher


Re: [PATCH v4 02/24] powerpc/pseries: Fix alignment of PLPKS structures and buffers

2023-01-26 Thread Segher Boessenkool
On Thu, Jan 26, 2023 at 12:09:53AM +1100, Michael Ellerman wrote:
> Andrew Donnellan  writes:
> > A number of structures and buffers passed to PKS hcalls have alignment
> > requirements, which could on occasion cause problems:
> >
> > - Authorisation structures must be 16-byte aligned and must not cross a
> >   page boundary
> >
> > - Label structures must not cross page boundaries
> >
> > - Password output buffers must not cross page boundaries
> >
> > Round up the allocations of these structures/buffers to the next power of
> > 2 to make sure this happens.
> 
> It's not the *next* power of 2, it's the *nearest* power of 2, including
> the initial value if it's already a power of 2.

It's not the nearest either, the nearest power of two to 65 is 64.  You
could say "but, round up" to which I would say "round?"  :-P

"Adjust the allocation size to be the smallest power of two greater than
or equal to the given size."

"Pad to a power of two" in shorthand.  "Padded to a power of two if
necessary" if you want to emphasise it can be a no-op.


Segher


Re: [PATCH v2 07/14] powerpc/vdso: Improve linker flags

2023-01-24 Thread Segher Boessenkool
On Tue, Jan 24, 2023 at 09:14:52AM -0700, Nathan Chancellor wrote:
> On Mon, Jan 23, 2023 at 09:07:16AM -0600, Segher Boessenkool wrote:
> > And here it is even more obviously fine.  If you need obfuscation like
> > in your patch, it is better not to do this imo.
> 
> I do not think this patch really obfuscates anything? The filtering is
> pretty clear to me.

And not having such filtering is more obvious and more clear.

It doesn't matter much for just this patch of course, but it will make
the code significantly harder to read (and deal with in other ways) if
this continues.

> If this is a real objection to the patch, I suppose we could just
> localize '-Qunused-arguments' to this Makefile and be done with it but I
> do not think this change is a bad solution to the problem either.

It is a comment about the direction this patch is moving us in.  I don't
think it is a good idea at all to try to avoid all warnings, and even
more so it is a bad idea to make objectively worse source code just to
appease a trigger-happy and questionable warning.

As I said, you can often avoid warnings by writing better code, like
part of the patch did.  That is a good reaction to warnings.  Making
worse code to avoid warnings is not a good idea normally.

Just don't use -Werror by default, and don't make other people suffer
its yoke!


Segher


Re: [PATCH v2 07/14] powerpc/vdso: Improve linker flags

2023-01-23 Thread Segher Boessenkool
Hi!

On Wed, Jan 11, 2023 at 08:05:04PM -0700, Nathan Chancellor wrote:
> When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, there
> are several warnings in the PowerPC vDSO:
> 
>   clang-16: error: -Wl,-soname=linux-vdso32.so.1: 'linker' input unused 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: -Wl,--hash-style=both: 'linker' input unused 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: argument unused during compilation: '-shared' 
> [-Werror,-Wunused-command-line-argument]
> 
>   clang-16: error: argument unused during compilation: '-nostdinc' 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: argument unused during compilation: '-Wa,-maltivec' 
> [-Werror,-Wunused-command-line-argument]

There is nothing wrong with the warnings, but as usual, -Werror is very
counterproductive.

> The first group of warnings point out that linker flags were being added
> to all invocations of $(CC), even though they will only be used during
> the final vDSO link. Move those flags to ldflags-y.

Which is explicitly allowed, and won't do anything, so nothing harmful
either.  It is not a bad idea to avoid this if that is trivial to do,
of course.

> The second group of warnings are compiler or assembler flags that will
> be unused during linking. Filter them out from KBUILD_CFLAGS so that
> they are not used during linking.

And here it is even more obviously fine.  If you need obfuscation like
in your patch, it is better not to do this imo.

The warning text "linker input unused" is misleading btw.  It would be
good to warn about that, if it was true: very likely the user didn't
intend what he wrote.  But a linker input is an object file, or perhaps
a linker script.  These things are just compiler flags.


Segher


Re: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1

2023-01-20 Thread Segher Boessenkool
On Thu, Jan 19, 2023 at 09:31:21PM -0600, Rob Landley wrote:
> On 1/19/23 16:11, Michael.Karcher wrote:
> > I don't see a clear bug at this point. We are talking about the C expression
> > 
> >    __same_type((void*)0, (void*)0)? 0 : sizeof((void*)0)/sizeof(*((void*0))

(__same_type is a kernel macro, it expands to something with
__builtin_compatible_type()).

> *(void*) is type "void" which does not have a size.

It has size 1, in GCC, so that you can do arithmetic on pointers to
void.  This is a long-standing and very widely used GCC extension.

"""
6.24 Arithmetic on 'void'- and Function-Pointers


In GNU C, addition and subtraction operations are supported on pointers
to 'void' and on pointers to functions.  This is done by treating the
size of a 'void' or of a function as 1.

 A consequence of this is that 'sizeof' is also allowed on 'void' and on
function types, and returns 1.

 The option '-Wpointer-arith' requests a warning if these extensions are
used.
"""

> The problem is gcc "optimizing out" an earlier type check, the same way it
> "optimizes out" checks for signed integer math overflowing, or "optimizes 
> out" a
> comparison to pointers from two different local variables from different
> function calls trying to calculate the amount of stack used, or "optimizes 
> out"

Are you saying something in the kernel code here is invalid code?
Because your other examples are.

> using char *x = (char *)1; as a flag value and then doing "if (!(x-1)) because
> it can "never happen"...

Like here.  And no, this is not allowed by -fno-strict-aliasing.

> > I suggest to file a bug against gcc complaining about a "spurious 
> > warning", and using "-Werror -Wno-error-sizeof-pointer-div" until gcc is 
> > adapted to not emit the warning about the pointer division if the result 
> > is not used.

Yeah.  If the first operand of a conditional operator is non-zero, the
second operand is not evaluated, and if the first is zero, the third
operand is not evaluated.  It is better if we do not warn about
something we do not evaluate.  In cases like here where it is clear at
compile time which branch is taken, that shouldn't be too hard.

Can someone please file a GCC PR?  With reduced testcase preferably.


Segher


Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS

2023-01-10 Thread Segher Boessenkool
On Mon, Jan 09, 2023 at 05:51:23PM -0700, Nathan Chancellor wrote:
> So for this patch, I have
> 
>   When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
>   warns:
> 
> clang-16: error: argument unused during compilation: '-s' 
> [-Werror,-Wunused-command-line-argument]
> 
>   The compiler's '-s' flag is a linking option (it is passed along to the
>   linker directly), which means it does nothing when the linker is not
>   invoked by the compiler. The kernel builds all .o files with either '-c'
>   or '-S', which do not run the linker, so '-s' can be safely dropped from
>   ASFLAGS.
> 
> as a new commit message. Is that sufficient for everyone? If so, I'll
> adjust the s390 commit to match, as it is the same exact problem.

Almost?  -S doesn't write .o files, it writes a .s file.  To go from an
assembler file (.s, or .S if you want to run the C preprocessor on non-C
code for some strange reason, the assembler macro facilities are vastly
superior) to an object file is just -c as well.

> Alternatively, if '-s' should actually remain around, we could move it
> to ldflags-y, which is added in patch 7. However, I assume that nobody
> has noticed that it has not been doing its job for a while, so it should
> be safe to remove.

+1


Segher


Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS

2023-01-09 Thread Segher Boessenkool
On Tue, Jan 10, 2023 at 01:22:38AM +0100, Andreas Schwab wrote:
> On Jan 09 2023, Segher Boessenkool wrote:
> 
> > It is required by POSIX (for the c99 command, anyway).  It *also* is
> > required to be supported when producing object files (so when no linking
> > is done).
> >
> > It is a GCC flag, and documented just fine:
> > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s
> 
> Most assembler flags are unrelated to the flags passed to the compiler
> driver, and -s is no exception.  POSIX has nothing to say about the
> sub-commands of the compiler anyway.

But this is not an assembler flag!

quiet_cmd_vdso32as = VDSO32A $@
  cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) $(AS32FLAGS) -c -o $@ $<

where

ifdef CROSS32_COMPILE
VDSOCC := $(CROSS32_COMPILE)gcc
else
VDSOCC := $(CC)
endif


The name of the make variable AS32FLAGS is a bit misleading.  This
variable does not hold arguments to as, it holds arguments to the
compiler driver, "gcc".


Segher


Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS

2023-01-09 Thread Segher Boessenkool
On Mon, Jan 09, 2023 at 03:37:53PM -0700, Nathan Chancellor wrote:
> On Mon, Jan 09, 2023 at 04:23:37PM -0600, Segher Boessenkool wrote:
> > Hi!  Happy new year all.
> > 
> > On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote:
> > > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor  
> > > wrote:
> > > >
> > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
> > > > warns that ASFLAGS contains '-s', which is a linking phase option, so it
> > > > is unused.
> > > >
> > > >   clang-16: error: argument unused during compilation: '-s' 
> > > > [-Werror,-Wunused-command-line-argument]
> > > >
> > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris
> > > > and it is ignored for the powerpc target so just drop the flag
> > > > altogether, as it is not needed.
> > > 
> > > Do you have any more info where you found this?  I don't see -s
> > > documented as an assembler flag.
> > > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html
> > > https://sourceware.org/binutils/docs/as/Invoking.html
> > 
> > It is required by POSIX (for the c99 command, anyway).  It *also* is
> > required to be supported when producing object files (so when no linking
> > is done).
> > 
> > It is a GCC flag, and documented just fine:
> > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s
> > 
> > (Yes, that says it is for linking; but the option is allowed without
> > error of any kind always).
> > 
> > (ASFLAGS sounds like it is for assembler commands, but it really is
> > for compiler commands that just happen to get .S input files).
> > 
> > > The patch seems fine to me, but what was this ever supposed to be?
> > > FWICT it predates git history (looking at
> > > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63)
> > 
> > Yeah, good question.  This compiler flag does the moral equivalent of
> > strip -s (aka --strip-all).  Maybe this was needed at some point, or
> > the symbol or debug info was just annoying (during bringup or similar)?
> > 
> > > Reviewed-by: Nick Desaulniers 
> > Reviewed-by: Segher Boessenkool 
> 
> Thank you for the review and extra explanation! I had kind of expanded
> on this in the s390 version of this patch [1], I will go ahead and do
> the same thing for the powerpc version too.
> 
> [1]: 
> https://lore.kernel.org/llvm/20221228-drop-qunused-arguments-v1-9-658cbc8fc...@kernel.org/

The underwhelming GCC source code for this is
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=d629ca5e424ad3120be13e82b9f38b7bf8f1cdf2;hb=HEAD#l1148
which says that the -s flag is passed through unmodified to the linker
when invoking the linker.  See #l603 for the %{s} specs syntax.

This is not a flag to the assembler at all.  It is a flag to the
compiler, which passes it on to the linker :-)

> > (ASFLAGS sounds like it is for assembler commands, but it really is
> > for compiler commands that just happen to get .S input files).


Segher


Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS

2023-01-09 Thread Segher Boessenkool
Hi!  Happy new year all.

On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote:
> On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor  wrote:
> >
> > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
> > warns that ASFLAGS contains '-s', which is a linking phase option, so it
> > is unused.
> >
> >   clang-16: error: argument unused during compilation: '-s' 
> > [-Werror,-Wunused-command-line-argument]
> >
> > Looking at the GAS sources, '-s' is only useful when targeting Solaris
> > and it is ignored for the powerpc target so just drop the flag
> > altogether, as it is not needed.
> 
> Do you have any more info where you found this?  I don't see -s
> documented as an assembler flag.
> https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html
> https://sourceware.org/binutils/docs/as/Invoking.html

It is required by POSIX (for the c99 command, anyway).  It *also* is
required to be supported when producing object files (so when no linking
is done).

It is a GCC flag, and documented just fine:
https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s

(Yes, that says it is for linking; but the option is allowed without
error of any kind always).

(ASFLAGS sounds like it is for assembler commands, but it really is
for compiler commands that just happen to get .S input files).

> The patch seems fine to me, but what was this ever supposed to be?
> FWICT it predates git history (looking at
> arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63)

Yeah, good question.  This compiler flag does the moral equivalent of
strip -s (aka --strip-all).  Maybe this was needed at some point, or
the symbol or debug info was just annoying (during bringup or similar)?

> Reviewed-by: Nick Desaulniers 
Reviewed-by: Segher Boessenkool 


Segher


Re: [PATCH v1] powerpc/64: Set default CPU in Kconfig

2022-12-16 Thread Segher Boessenkool
On Fri, Dec 16, 2022 at 11:23:59PM +0100, Pali Rohár wrote:
> > It appears the E500MC64 never made it outside of FSL, so it is best not
> > to use it at all, imo.
> 
> Yes, it really makes sense to not use e500mc64 flag. Maybe gcc
> documentation could be updated to mention this fact?

Thanks.  I filed  so that we don't forget
about this.


Segher


Re: [PATCH v1] powerpc/64: Set default CPU in Kconfig

2022-12-16 Thread Segher Boessenkool
Hi!

On Thu, Dec 15, 2022 at 09:42:02PM +0100, Pali Rohár wrote:
> On Wednesday 07 December 2022 14:38:40 Christophe Leroy wrote:
> > default "power8" if POWER8_CPU
> > default "power9" if POWER9_CPU
> > default "power10" if POWER10_CPU
> > +   default "e500mc64" if E5500_CPU
> 
> Now I'm looking at this change again... and should not E5500_CPU rather
> enforce -mcpu=e5500 flag? I know that your patch moves e500mc64 flag
> from the Makefile to Kconfig, but maybe it could be changed in some
> other followup patch...
> 
> Anyway, do you know what is e500mc64 core? I was trying to find some
> information about it, but it looks like some unreleased freescale core
> which predates e5500 core.

It looks that way yes.  It was submitted at

and committed as .  It looks as if
it was based on the e500mc core, while e5500 is a new core (or
significantly different anyway).

> ISA (without extensions like altivec) seems
> to be same for e500mc64, e5500 and e6500 cores and difference is only
> pipeline definitions in gcc config files. So if my understanding is
> correct then kernel binary compiled with any of these -mcpu= flag should
> work on any of those cores. Just for mismatches core binary will not be
> optimized for speed.

It appears the E500MC64 never made it outside of FSL, so it is best not
to use it at all, imo.


Segher


Re: [PATCH v2] powerpc: Pass correct CPU reference to assembler

2022-12-16 Thread Segher Boessenkool
On Fri, Dec 16, 2022 at 05:57:46PM +, Christophe Leroy wrote:
> Le 16/12/2022 à 18:18, Segher Boessenkool a écrit :
> > On Fri, Dec 16, 2022 at 09:35:50AM +0100, Christophe Leroy wrote:
> >> Today we have CONFIG_TARGET_CPU which provides the identification of the
> >> expected CPU, it is used for GCC. Use it as well for the assembler.
> > 
> > Why do you use -Wa, at all for this?  The compiler should already pass
> > proper options always!
> 
> That's historical I guess. Comes from commit 14cf11af6cf6 ("powerpc: 
> Merge enough to start building in arch/powerpc.")

Ah.  The patch moves stuff around, I thought more of it is new than it
really is.  Sorry.

It would be good to get rid of all such things that do no good and can
easily cause problems, of course, but that does not belong to this patch
of course.

> >> +cpu-as-$(CONFIG_PPC_BOOK3S_64)+= $(call as-option,-Wa$(comma)-many)
> > 
> > What is this for?  Using -many is a huge step back, it hides many
> > problems :-(
> 
> The only thing I did is removed the -Wa,-mpower4 from the line, leaving 
> the remaining part. Initialy it was:
> 
> cpu-as-$(CONFIG_PPC_BOOK3S_64) += $(call as-option,-Wa$(comma)-mpower4) 
> $(call as-option,-Wa$(comma)-many)
> 
> It was added in 2018 by commit 960e30029863 ("powerpc/Makefile: Fix 
> PPC_BOOK3S_64 ASFLAGS"). There is a long explanation it the commit.
> 
> Should we remove it ?

The commit says it is a workaround for clang problems, so it needs
testing there.  It also needs testing everywhere else, because as I said
it hides a lot of problems, so removing it will make a lot of sloppy
code that has crept in since 2018 scream bloody murder :-(


Segher


Re: [PATCH v2] powerpc: Pass correct CPU reference to assembler

2022-12-16 Thread Segher Boessenkool
Hi!

On Fri, Dec 16, 2022 at 09:35:50AM +0100, Christophe Leroy wrote:
> The problem comes from the fact that CONFIG_PPC_E500MC is selected for
> both the e500mc (32 bits) and the e5500 (64 bits), and therefore the
> following makefile rule is wrong:
> 
>   cpu-as-$(CONFIG_PPC_E500MC)+= $(call as-option,-Wa$(comma)-me500mc)

Yes.

> Today we have CONFIG_TARGET_CPU which provides the identification of the
> expected CPU, it is used for GCC. Use it as well for the assembler.

Why do you use -Wa, at all for this?  The compiler should already pass
proper options always!

> +cpu-as-$(CONFIG_PPC_BOOK3S_64)   += $(call as-option,-Wa$(comma)-many)

What is this for?  Using -many is a huge step back, it hides many
problems :-(


Segher


Re: [PATCH] powerpc/64: Implement arch_within_stack_frames

2022-12-15 Thread Segher Boessenkool
On Thu, Dec 15, 2022 at 10:29:38AM -0600, Segher Boessenkool wrote:
> The kernel does not do any of the more
> problematic cases I think, but no promises (dynamic stack alignment,
> recursive functions

Before people get scared: I meant *nested* functions.  With a static
chain and all that :-)


Segher


Re: [PATCH] powerpc/64: Implement arch_within_stack_frames

2022-12-15 Thread Segher Boessenkool
On Thu, Dec 15, 2022 at 10:52:35AM +1000, Nicholas Piggin wrote:
> On Thu Dec 15, 2022 at 10:17 AM AEST, Segher Boessenkool wrote:
> > On Wed, Dec 14, 2022 at 09:39:25PM +1000, Nicholas Piggin wrote:
> > > What about just copying x86's implementation including using
> > > __builtin_frame_address(1/2)? Are those builtins reliable for all
> > > our targets and compiler versions?
> >
> > __builtin_frame_address(n) with n > 0 is specifically not supported, not
> > on any architecture; it does not work in all situations on *any* arch,
> > and not in *any* situation on some archs.
> 
> No, but the particular case of powerpc where we have frame pointers and
> calling from a function where we know we have at least n + 1 frames on
> the stack, it would be okay, right? The code is not really different
> than using r1 directly and dereferencing that.

We do not typically have frame pointers; those make quite a bit slower
code.  But we do not *need* frame pointers for most purposes, perhaps
that is what you were getting at?

In simple cases r1 is the address of the current frame, and accessing
the machine word there gets you the previous frame, etc. (until you hit
a zero, there is no parent frame above that).  This is the *machine*
frame, not the C frame.  Often they are the same.  But there are
complications.  As long as you only use it for debug purposes, do not
use it for program logic, and are sure to check any pointer for nil,
things will work fine.  The kernel does not do any of the more
problematic cases I think, but no promises (dynamic stack alignment,
recursive functions, variable size stack allocations, PIC code (the
model of it used for shared libraries that is, not position-independent
code in general, as PowerPC mostly is no matter what), split stack, ...)

So at least do not follow a stack chain past the terminator, and do not
expect there to be exactly one frame per C function (there can be zero,
or more than one).  There also is the complication that there is no
sure-fire way to detect if a new frame has been set up for a function
(yet), of course.


Segher


Re: [PATCH] powerpc/64: Implement arch_within_stack_frames

2022-12-14 Thread Segher Boessenkool
On Wed, Dec 14, 2022 at 09:39:25PM +1000, Nicholas Piggin wrote:
> What about just copying x86's implementation including using
> __builtin_frame_address(1/2)? Are those builtins reliable for all
> our targets and compiler versions?

__builtin_frame_address(n) with n > 0 is specifically not supported, not
on any architecture; it does not work in all situations on *any* arch,
and not in *any* situation on some archs.

This is clearly documented:

 Calling this function with a nonzero argument can have
 unpredictable effects, including crashing the calling program.  As
 a result, calls that are considered unsafe are diagnosed when the
 '-Wframe-address' option is in effect.  Such calls should only be
 made in debugging situations.

(and that warning is enabled by -Wall).


Segher


Re: Mass-building defconfigs: many fail with assembler errors

2022-12-14 Thread Segher Boessenkool
Hi!

On Wed, Dec 14, 2022 at 07:36:32PM +0100, Jan-Benedict Glaw wrote:
> So we have these remaining build issues:
> 
> linux-powerpc-cell_defconfig   bad asm 
> (arch/powerpc/boot/pseries-head.S)
> linux-powerpc-mvme5100_defconfig   bad asm 
> (arch/powerpc/kernel/epapr_hcalls.S)
> linux-powerpc-asp8347_defconfigbad asm (arch/powerpc/kernel/pmc.c)
> linux-powerpc-ppc6xx_defconfig bad asm (arch/powerpc/kernel/pmc.c)
> linux-powerpc-ppc64e_defconfig bad asm 
> (arch/powerpc/kernel/vdso/gettimeofday.S)
> linux-powerpc-corenet64_smp_defconfig  bad asm 
> (arch/powerpc/kernel/vdso/gettimeofday.S)
> 
> I do *not* have CROSS32_COMPILE=... set for my builds. Maybe that
> could cure at least the issues within the ./boot and ./kernel/vdso
> directories?

I never set that, -m32 does the trick, every powerpc compiler is
biarch :-)

> Let's try that...  But I guess that won't help for the
> other two remaining files (arch/powerpc/kernel/{epapr_hcalls.S,pmc.c).

Not likely no.  Can you show the error of those again?

> linux-powerpc-pseries_defconfig sstep (out of array bounds)
> linux-powerpc-powernv_defconfig sstep
> linux-powerpc-ppc64_defconfig   sstep
> linux-powerpc-pseries_le_defconfig  sstep
> linux-powerpc-ppc64le_defconfig sstep
> linux-powerpc-ppc64le_guest_defconfig   sstep
> linux-powerpc-ppc64_guest_defconfig sstep
> linux-powerpc-powernv_be_defconfig  sstep
> 
> My first guess on these is that it's a wrong warning. The union's
> `u8 b[2 * sizeof(double)]` seems to be large enough.

A false positive, yes.  Which is *not* wrong.  What is wrong is using
-Werror in any unknown environment.  I have a stack of patches I use for
all my kernel builds, and half of those are eradicating harmful -Werror
instances.

> linux-powerpc-akebono_defconfig ahci (BUILD_BUG_ON failed: sizeof(_s) 
> > sizeof(long))
> linux-powerpc-xes_mpc85xx_defconfig ahci
> linux-powerpc-ge_imp3a_defconfigahci
> linux-powerpc-mpc85xx_defconfig ahci
> linux-powerpc-mpc85xx_smp_defconfig ahci
> linux-powerpc-corenet32_smp_defconfig   ahci
> linux-powerpc-mpc86xx_defconfig ahci
> linux-powerpc-mpc86xx_smp_defconfig ahci
> 
> I've seen the AHCI issue on other (non-powerpc) builds as well,
> haven't looked into this so I won't guess about whether this is a real
> bug or a compiler issue.

It is a real bug afaics.


Segher


Re: [PATCH] [BACKPORT FOR 4.14] libtraceevent: Fix build with binutils 2.35

2022-12-14 Thread Segher Boessenkool
On Wed, Dec 14, 2022 at 03:33:24PM +, Christophe Leroy wrote:
> Le 13/12/2022 à 21:25, Segher Boessenkool a écrit :
> > On Tue, Dec 13, 2022 at 07:03:07PM +0100, Christophe Leroy wrote:
> >> In binutils 2.35, 'nm -D' changed to show symbol versions along with
> >> symbol names, with the usual @@ separator.
> > 
> > 2.37 instead?  And --without-symbol-versions is there to restore the
> > old behaviour.  The script is parsing the output already so this patch
> > is simpler maybe, but :-)
> 
> Do you mean that the original commit from Ben should have done it 
> differently ?

Probably.

> My patch is only a backport of original commit 39efdd94e314 
> ("libtraceevent: Fix build with binutils 2.35") due to Makefile being at 
> a different place in 4.14.

.  It's not such a great idea to spread misinformation further,
imo.  Maybe the mailing list archives will help dampen that already now.

Thanks,


Segher


Re: [PATCH] [BACKPORT FOR 4.14] libtraceevent: Fix build with binutils 2.35

2022-12-13 Thread Segher Boessenkool
On Tue, Dec 13, 2022 at 07:03:07PM +0100, Christophe Leroy wrote:
> In binutils 2.35, 'nm -D' changed to show symbol versions along with
> symbol names, with the usual @@ separator.

2.37 instead?  And --without-symbol-versions is there to restore the
old behaviour.  The script is parsing the output already so this patch
is simpler maybe, but :-)


Segher


Re: Mass-building defconfigs: many fail with assembler errors

2022-12-13 Thread Segher Boessenkool
On Tue, Dec 13, 2022 at 09:41:59AM +0100, Jan-Benedict Glaw wrote:
> On Tue, 2022-12-13 14:49:20 +1100, Michael Ellerman  
> wrote:
> > So your script should exclude all files that end in ".config".
> 
> Thanks!  Will just drop those.
> 
> > To find the names of the generated configs you can use something like:
> > 
> >  $ awk '/PHONY \+= .*config/ {print $3}' arch/powerpc/Makefile
> 
> ...and integrate these instead. Thanks a lot!

You can also pretend you are a simple user and use the targets
"make help" and "make help-boards" suggest :-)


Segher


Re: Mass-building defconfigs: many fail with assembler errors

2022-12-12 Thread Segher Boessenkool
Hi!

On Mon, Dec 12, 2022 at 10:51:17PM +0100, Jan-Benedict Glaw wrote:
> Is anybody else routinely building current Binutils + GCC, to try to
> build all the Linux defconfigs?

I do regularly build kernels for powerpc-linux, powerpc64-linux,
powerpc64le-linux; ppc6xx_defconfig and ppc64_defconfig and
ppc64le_defconfig.  Totally boring, but it frequently does not build.
Not as frequently as for the other Linux targets I build (32 total at
the moment).

> For PPC, a good number of those fail,
> and I probably don't understand PPC well enough to propose patches. Or
> did I pick wrongly targeted toolchains? Most of the time, my suspicion
> is that we're not giving the correct -m flags in
> ./arch/powerpc/boot/?  (My setup for doing test builds is fairly automated, I
> can easily throw in patches for testing.)

Many of those use a 32-bit toolchain with a 64-bit kernel, or they
require some e500 specific config but not getting it (or the other way
around).

> 64-bit.config

>   ==> Why "-m32 -mcpu=powerpc"? Binutils/GCC are for 
> --target=powerpc64-linux

Something in your config is forcing that.

> 85xx-64bit.config
> powerpc64-linux-gcc 
> -Wp,-MMD,arch/powerpc/kernel/vdso/.gettimeofday-64.o.d -nostdinc 
> -I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include 
> -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
> -I./include/uapi -I./include/generated/uapi -include 
> ./include/linux/compiler-version.h -include ./include/linux/kconfig.h 
> -D__KERNEL__ -I ./arch/powerpc -DHAVE_AS_ATHIGH=1 -fmacro-prefix-map=./= 
> -D__ASSEMBLY__ -fno-PIE -m64 -Wl,-a64 -mabi=elfv1 -Wa,-me500 -Wa,-me500mc 
> -mabi=elfv1 -mbig-endian-Wl,-soname=linux-vdso64.so.1 -D__VDSO64__ -s -c 
> -o arch/powerpc/kernel/vdso/gettimeofday-64.o 
> arch/powerpc/kernel/vdso/gettimeofday.S

e500mc is a 32-bit core.  This cannot fly.

> 85xx-hw.config
> powerpc-linux-gcc -Wp,-MMD,arch/powerpc/kernel/.epapr_hcalls.o.d 
> -nostdinc -I./arch/powerpc/include -I./arch/powerpc/include/generated  
> -I./include -I./arch/powerpc/include/uapi 
> -I./arch/powerpc/include/generated/uapi -I./include/uapi 
> -I./include/generated/uapi -include ./include/linux/compiler-version.h 
> -include ./include/linux/kconfig.h -D__KERNEL__ -I ./arch/powerpc 
> -fmacro-prefix-map=./= -D__ASSEMBLY__ -fno-PIE -m32 -Wl,-a32 -mcpu=powerpc 
> -mbig-endian-c -o arch/powerpc/kernel/epapr_hcalls.o 
> arch/powerpc/kernel/epapr_hcalls.S 
>   arch/powerpc/kernel/epapr_hcalls.S: Assembler messages:
>   arch/powerpc/kernel/epapr_hcalls.S:24: Error: unrecognized opcode: 
> `wrteei'

wrteei is a BookE instruction (not a PowerPC instruction), and something
specifically asked for just PowerPC.

> powernv_defconfig

>   cc1: all warnings being treated as errors

Self-inflicted wound.  The warnings may reveal more, or they might show
something that GCC isn't as good at as you may want; all warnings have
false positives, that is why they are warnings and not errors.

>   Compiler ICEs (during GIMPLE pass: ccp) in align.c:
> 
> powerpc-linux-gcc -Wp,-MMD,arch/powerpc/kernel/.align.o.d -nostdinc 
> -I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include 
> -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
> -I./include/uapi -I./include/generated/uapi -include 
> ./include/linux/compiler-version.h -include ./include/linux/kconfig.h 
> -include ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc 
> -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs 
> -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE 
> -Werror=implicit-function-declaration -Werror=implicit-int 
> -Werror=return-type -Wno-format-security -std=gnu11 -mbig-endian -m32 
> -msoft-float -pipe -ffixed-r2 -mmultiple -mno-readonly-in-sdata -mcpu=440 
> -mno-prefixed -mno-pcrel -mno-altivec -mno-vsx -mno-mma 
> -fno-asynchronous-unwind-tables -mno-string -Wa,-m440 -mbig-endian 
> -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 
> -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation 
> -Wno-format-overflow -Wno-address-of-packed-member -O2 
> -fno-allow-store-data-races -Wframe-larger-than=1024 -fstack-protector-strong 
> -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable 
> -Wno-dangling-pointer -fomit-frame-pointer -ftrivial-auto-var-init=zero 
> -fno-stack-clash-protection -Wdeclaration-after-statement -Wvla 
> -Wno-pointer-sign -Wcast-function-type -Wno-stringop-truncation 
> -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized 
> -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -fno-strict-overflow 
> -fno-stack-check -fconserve-stack -Werror=date-time 
> -Werror=incompatible-pointer-types -Werror=designated-init 
> -Wno-packed-not-aligned -g -mstack-protector-guard-offset=1080 -Werror
> -DKBUILD_MODFILE='"arch/powerpc/kernel/align"' -DKBUILD_BASENAME='"align"' 
> -DKBUILD_MODNAME='"align"' -D__KBUIL

Re: [PATCH v2 1/4] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig

2022-11-07 Thread Segher Boessenkool
Hi!

On Mon, Nov 07, 2022 at 02:31:59PM +1100, Rohan McLure wrote:
> Add Kconfig option for enabling clearing of registers on arrival in an
> interrupt handler. This reduces the speculation influence of registers
> on kernel internals.

Assuming you are talking about existing PowerPC CPUs from the last 30
years:

There is no data speculation.  At all.  Ever.

There is branch prediction, but that is not influenced by register
contents, either (for any current CPUs at least).  (Except when you get
a flush because of a mispredict, but if this zeroing changes anything,
we will have used wild (but user controlled) values in the old
non-zeroing situation, and that is a much bigger problem itself already,
also for security!  This can be an unlikely kernel bug, or a very
unlikely compiler bug.)

All GPRs are renamed, always.  If you zero all GPRs on interrupt entry
(which is context synchronising, importantly), this will guarantee there
can be no timing influence from the GPRs, because all of the physical
registers depend on nothing that happened before.  So that is good, at
least it can give some peace of mind.  Except that this makes 30 new
registers in just a few cycles, which *itself* can cause stalls, if the
renaming things are still busy.  Context synchronising does not
necessarily help there, the renaming machinery can do stuff *after* an
insn completes.

I don't see how this helps anything.  If it does, "reduces speculation
influence" is not a good description of what it does, afaics?


Segher


Re: [RFC PATCH 01/19] powerpc/perf: callchain validate kernel stack pointer bounds

2022-11-04 Thread Segher Boessenkool
On Mon, Oct 31, 2022 at 03:54:22PM +1000, Nicholas Piggin wrote:
> Could the user set r1 to be equal to the address matching the first
> interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page
> due to the kernel redzone, and induce the kernel to load the marker from
> there? Possibly it could cause a crash at least.

Yes, the user can set r1 to anything, it is just a general purpose
register.  This isn't a valid thing to do of course, the ABI requires
r1 to point at a valid stack at all times, but it is an obvious attack
point if we do not harden against this :-)


Segher


Re: Issues with the first PowerPC updates for the kernel 6.1

2022-10-29 Thread Segher Boessenkool
On Mon, Oct 17, 2022 at 09:53:04AM +0200, Christian Zigotzky wrote:
> > On 17. Oct 2022, at 02:43, Michael Ellerman  wrote:
> > Previously BIG_ENDIAN && GENERIC_CPU would use -mcpu=power5, now it uses
> > -mcpu=power4.
> 
> Maybe this is the issue. We will wait and not release the RC1 for testing 
> because it is a risk for our testers to test these new kernels because of 
> this issue.
> 
> It is really important do not to rewrite code, that is well worked before.
> Bugfixing and adding some new features is ok but rewriting of good code is 
> expensive and doesn’t make any sense.

It was just a bugfix, and a (partial) revert.

471d7ff8b51b says it removed ISA 2.00 support (original power4, "GP").
Support for ISA 2.01 was retained it says.  That is power4+, "GQ", but
also 970 (Apple G5).  That patch actually switched to ISA 2.02 though,
unintendedly, and code generated for ISA 2.02 will not run on systems
like 970, in principle.  It is just one uncommon instruction that is
problematical, namely popcntb, because the kernel does not use floating
point at all, so that is why we got away with it for so long (most code
that does use fp will fall flat on its face in no time).  It still is a
bug fix though!

PA6T is ISA 2.04, it's not clear how this (bugfix, and revert!) change
made code not run on PA6T anymore.  Smells a lot like something indirect
(or triply indirect), a separate bug, something that was introduced in
the last two years maybe, but I'll even bet it is something *exposed* in
that time, a bug that has been here for longer!


Segher


Re: Issues with the first PowerPC updates for the kernel 6.1

2022-10-16 Thread Segher Boessenkool
On Fri, Oct 14, 2022 at 06:11:21PM +0200, Christian Zigotzky wrote:
> make oldconfig has asked because of the CPU family. I choosed GENERIC for my 
> P.A. Semi PWRficient PA6T-1682M. Is this correct? Maybe this is the problem.
> 
> config GENERIC_CPU
> - bool "Generic (POWER4 and above)"
> + bool "Generic (POWER5 and PowerPC 970 and above)"
>   depends on PPC_BOOK3S_64 && !CPU_LITTLE_ENDIAN
>   select PPC_64S_HASH_MMU
> 
> There isn’t a POWER4 anymore and I used it via CONFIG_GENERIC_CPU=y before.

PA6T is ISA 2.04, just like POWER5+.  It should be fine.


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-07 Thread Segher Boessenkool
On Fri, Oct 07, 2022 at 10:03:38AM +1000, Nicholas Piggin wrote:
> On Fri Oct 7, 2022 at 9:23 AM AEST, Segher Boessenkool wrote:
> > > For pcrel addressing? Bootstrapping the C environment is one, the
> > > module dynamic linker is another.
> >
> > I don't know what either of those mean.
> 
> arch/powerpc/kernel/head_64.S and arch/powerpc/kernel/module_64.c
> 
> Can discuss in the pcrel patch series thread if you would like to know
> more.

So "bootstrapping the C environment" is meant to mean "initialising it",
like *rt*.o ("C runtime") does normally?

And "module dynamic linker" is "module loader" here?

Yes, those things probably need some attention for pcrel, but
"bootstrapping" and "dynamic" had me scratch my head: there is nothing
that pulls itself up by its bootstraps (like, the initialisation itself
would be done in C code), and module loading is much closer to static
loading than to dynamic loading :-)

> > Just say in a comment where you disable stuff that it is to prevent
> > possible problems, this is a WIP, that kind of thing?  Otherwise other
> > people (like me :-) ) will read it and think there must be some deeper
> > reason.  Like, changing code to work with pcrel is hard or a lot of
> > work -- it isn't :-)  As you say in 0/7 yourself btw!
> 
> I will describe limitations and issues a bit more in changelog of patches
> to enable prefix and pcrel when I submit as non-RFC candidate. It would
> probably not be too hard to get things to a workable state that could be
> merged.

Looking forward to it!

> > VMX and VSX are disabled here because the compiler *will* use those
> > registers if it feels like it (that is, if it thinks that will be
> > faster).  MMA is a very different beast: the compiler can never know if
> > it will be faster, to start with.
> 
> True, but now I don't have to find the exact clause and have my lawyer
> confirm that it definitely probably won't change in future and break
> things.

Your lawyer won't be able to tell you, but I can.  And I did already.


The reason I care about these things is that very often people look at
what the kernel does as a "best practices" example.  And then copy this
stuff as some cargo cult incantations :-/


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-07 Thread Segher Boessenkool
On Fri, Oct 07, 2022 at 04:31:28PM +1100, Michael Ellerman wrote:
> "Nicholas Piggin"  writes:
> > On Fri Oct 7, 2022 at 9:23 AM AEST, Segher Boessenkool wrote:
> >> On Fri, Oct 07, 2022 at 07:56:09AM +1000, Nicholas Piggin wrote:
> >> > On Fri Oct 7, 2022 at 5:54 AM AEST, Segher Boessenkool wrote:
> >> > > On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> ...
> >> > > > +# No AltiVec or VSX or MMA instructions when building kernel
> >> > > >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
> >> > > >  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> >> > > > +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
> >> > >
> >> > > MMA code is never generated unless the code asks for it explicitly.
> >> > > This is fundamental, not just an implementations side effect.
> >> > 
> >> > Well, now it double won't be generated :)
> >>
> >> Yeah, but there are many other things you can unnecessarily disable as
> >> well!  :-)
> >>
> >> VMX and VSX are disabled here because the compiler *will* use those
> >> registers if it feels like it (that is, if it thinks that will be
> >> faster).  MMA is a very different beast: the compiler can never know if
> >> it will be faster, to start with.
> >
> > True, but now I don't have to find the exact clause and have my lawyer
> > confirm that it definitely probably won't change in future and break
> > things.
> 
> Right. If someone asks "does the kernel ever use MMA instructions?" we
> can just point at that line and we have a definite answer. No need to
> audit the behaviour of all GCC and Clang versions ever released.

As I said, no sane compiler can use MMA ever (unless asked for it
directly of course).  But yeah, who knows what clang does!


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
On Fri, Oct 07, 2022 at 07:56:09AM +1000, Nicholas Piggin wrote:
> On Fri Oct 7, 2022 at 5:54 AM AEST, Segher Boessenkool wrote:
> > On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> > > This adds basic POWER10_CPU option, which builds with -mcpu=power10.
> >
> > > +# No prefix or pcrel
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
> >
> > Why do you disable all prefixed insns?  What goes wrong if you don't?
> 
> Potentially things like kprobes.

So mention that?  "This patch is due to an abundance of caution".

What I meant to ask is if you *saw* something going wrong, not if you
can imagine something going wrong.  I can imagine ten gazillion things
going wrong, that is not why I asked :-)

> > Same question for pcrel.  I'm sure you want to optimise it better, but
> > it's not clear to me how it fails now?
> 
> For pcrel addressing? Bootstrapping the C environment is one, the
> module dynamic linker is another.

I don't know what either of those mean.

> Some details in this series.
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-September/248521.html

I've watched that series with great interest, but I don't remember
anything like that?  Are you refering to the commentary in 7/7?
"Definitely ftrace and probes, possibly BPF and KVM have some breakage.
I haven't looked closely yet."...  This doesn't mean much does it :-)
It can be a triviality or two.  Or it could be a massive roadblock.

Just say in a comment where you disable stuff that it is to prevent
possible problems, this is a WIP, that kind of thing?  Otherwise other
people (like me :-) ) will read it and think there must be some deeper
reason.  Like, changing code to work with pcrel is hard or a lot of
work -- it isn't :-)  As you say in 0/7 yourself btw!

> > > +# No AltiVec or VSX or MMA instructions when building kernel
> > >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
> > >  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
> >
> > MMA code is never generated unless the code asks for it explicitly.
> > This is fundamental, not just an implementations side effect.
> 
> Well, now it double won't be generated :)

Yeah, but there are many other things you can unnecessarily disable as
well!  :-)

VMX and VSX are disabled here because the compiler *will* use those
registers if it feels like it (that is, if it thinks that will be
faster).  MMA is a very different beast: the compiler can never know if
it will be faster, to start with.


Segher


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 08:50:31PM +, Christophe Leroy wrote:
> Le 06/10/2022 à 22:45, Segher Boessenkool a écrit :
> > I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
> > or something like that :-)  And, on 64 bit, which is what the question
> > was about!
> 
> Ah, does the size really matters here ? I was thinking more in terms of 
> performance when I made the comment.

Other than some unused code there should not be much performance impact
at all from enabling modules when not needed, on 64 bit.  Security (in
depth) is a very different kettle of fish of course ;-)


Segher


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 06:38:00PM +, Christophe Leroy wrote:
> Le 06/10/2022 à 20:22, Segher Boessenkool a écrit :
> > Long ago I built kernels that fit together with the boot firmware and a
> > root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
> > close to that at all these days :-)
> 
> 4MB, not easy. But 8M still achievable. Well our smaller board has 32M, 
> we have thousands of it spread all over Europe and have to keep it up to 
> date 

The smallest of these systems had 256MB RAM.  This 4MB is flash ROM :-)

> > What is the overhead if you enable modules but do not use them, these
> > days?
> 
> On the 8xx it is mainly the instruction TLB miss handler:

I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
or something like that :-)  And, on 64 bit, which is what the question
was about!


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
Hi!

On Thu, Oct 06, 2022 at 06:07:32PM +, Christophe Leroy wrote:
> Le 23/09/2022 à 08:23, Nicholas Piggin a écrit :
> > I would rather complete prefixed support in the kernel and use pcrel
> > addressing. Actually even if we don't compile with pcrel or prefixed,
> > there are some instructions and we will probably get more that require
> > prefixed, possible we might want to use them in kernel. Some of it is
> > required to handle user mode instructions too. So I think removing
> > it is premature, but I guess it's up for debate.
> 
> Well ok, in fact I only had code_patching in mind.
> 
> Code patching is only for kernel text. Today code patching is used for 
> things like kprobe, ftrace, etc  which really do not seems to be 
> prepared for prefixed instructions.
> 
> If you are adding -mno-prefixed, it is worth keeping that code which 
> sometimes gives us some headacke ?

-mpcrel requires -mprefixed.  Using PC relative addressing will be a
significant performance benefit.

> Of course if there are plans to get real prefixed instruction in kernel 
> code anytime soon, lets live with it, in that case the support should 
> get completed. But otherwise I think it would be better to get rid of it 
> for now, and implement it completely when we need it in years.

The future is unstoppable, certainly the near future is :-)

> When I see the following, I'm having hard time believing it would work 
> with prefixed instructions in the kernel text:
> 
>   typedef u32 kprobe_opcode_t;
> 
>   struct kprobe {
>   ...
>   /* Saved opcode (which has been replaced with breakpoint) */
>   kprobe_opcode_t opcode;
> 
> 
>   void arch_disarm_kprobe(struct kprobe *p)
>   {
>   WARN_ON_ONCE(patch_instruction(p->addr, ppc_inst(p->opcode)));
>   }

Why would it not work?


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
Hi!

On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> This adds basic POWER10_CPU option, which builds with -mcpu=power10.

> +# No prefix or pcrel
> +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)

Why do you disable all prefixed insns?  What goes wrong if you don't?

Same question for pcrel.  I'm sure you want to optimise it better, but
it's not clear to me how it fails now?

Please say in the comment what is wrong, don't spread fear :-)

> +# No AltiVec or VSX or MMA instructions when building kernel
>  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
>  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> +KBUILD_CFLAGS += $(call cc-option,-mno-mma)

MMA code is never generated unless the code asks for it explicitly.
This is fundamental, not just an implementations side effect.


Segher


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 11:39:50AM +1100, Michael Ellerman wrote:
> Christophe Leroy  writes:
> > However, thinking out loudly, I'm wondering, could we make things any 
> > simpler when CONFIG_MODULES is not selected, or is that a too much 
> > corner case on PPC64 ?
> 
> I'd say it's mostly a corner case.
> 
> Obviously no distros ship with modules disabled. 
> 
> AFAIK even the stripped down kernels we use in CPU bringup have modules
> enabled.
> 
> So I think it's probably not worth worrying about, unless there's an
> obvious and fairly simple optimisation.

Long ago I built kernels that fit together with the boot firmware and a
root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
close to that at all these days :-)

What is the overhead if you enable modules but do not use them, these
days?


Segher


Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

2022-09-24 Thread Segher Boessenkool
On Fri, Sep 23, 2022 at 05:15:43PM -0500, Segher Boessenkool wrote:
> On Sat, Sep 24, 2022 at 02:26:52AM +1000, Nicholas Piggin wrote:
> > I still don't see what clauses guarantees asm("%0" ::"r"(foo)) to give
> > 13. It doesn't say access via inline assembly is special,
> 
> But it is.  It is for all register variables, local and global.  I agree
> this isn't documented clearly.  For local register variables this is the
> *only* thing guaranteed; for global register vars there is more (it
> changes the ABI, there are safe/restore effects, that kind of thing).

I filed <https://gcc.gnu.org/PR107027> to improve the docs.  Thanks for
bringing this to our attention!


Segher


Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

2022-09-24 Thread Segher Boessenkool
On Sat, Sep 24, 2022 at 02:00:55PM +1000, Nicholas Piggin wrote:
> On Sat Sep 24, 2022 at 8:15 AM AEST, Segher Boessenkool wrote:
> > Never it is guaranteed that all accesses through this variable will use
> > the register directly: this fundamentally cannot work on all archs, and
> > also not at -O0.  More in general it doesn't work if some basic
> > optimisations are not done, be it because of a compiler deficiency, or a
> > straight out bug, or maybe it is a conscious choice in some cases.
> 
> Right, and we know better than to rely on a spec that is not 100% air
> tight with no possibility of lawyering. This may be what the intention is,
> it may be what gcc and clang do now, and everybody involved today agrees
> with that interpretation. We still have to maintain the kernel tomorrow
> though, so explicit r13 it must be.

It has *always* been this way.  Very old GCC (say, GCC < 3.x) tried to
guarantee more, even, but that turned out to be untenable.  But this is
all in the distant past.

I have no idea if clang implements the GCC C extensions correctly.  If
they don't it is just another compiler bug and they'll just have to fix
it.

The rules *are* airtight.  But this does not mean you can assume random
other stuff, adjacent or not :-P

> > (Please use "n" instead of "i".  Doesn't matter here, but it does in
> > many other places.)
> 
> What is the difference? Just "i" allows assmebly-time constants?

"n" means "number": constant integers.  "i" means "immediate": any
constant.  The address of a global variable is "i" but not "n" (in most
ABIs, no -fPIC and such) for example.

> How about "I"? that looks like it was made for it. Gives much better
> errors.

For PowerPC, "I" is a signed 16-bit number.  "K" is unsigned 16-bit,
and there are more as well.  Just like for "n" you'll have to make
sure the number you feed in will work in the assembler, and you'll get
the same error message (but, as you say, for "I" in some cases the
compiler will give errors already).  It's otherwise only useful if you
use e.g. "IL" as constraint, and then write "addi%e2 %0,%1,%2" for
example, so the asm can generate "addis" insns.  Such things aren't very
often useful in internal asm.  The main reason any of this exists is
this is how GCC works internally; extended inline asm exposes a lot of
that.


Segher


Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

2022-09-23 Thread Segher Boessenkool
On Sat, Sep 24, 2022 at 02:26:52AM +1000, Nicholas Piggin wrote:
> I still don't see what clauses guarantees asm("%0" ::"r"(foo)) to give
> 13. It doesn't say access via inline assembly is special,

But it is.  It is for all register variables, local and global.  I agree
this isn't documented clearly.  For local register variables this is the
*only* thing guaranteed; for global register vars there is more (it
changes the ABI, there are safe/restore effects, that kind of thing).

Never it is guaranteed that all accesses through this variable will use
the register directly: this fundamentally cannot work on all archs, and
also not at -O0.  More in general it doesn't work if some basic
optimisations are not done, be it because of a compiler deficiency, or a
straight out bug, or maybe it is a conscious choice in some cases.

> I think if it was obviously guaranteed then this might be marginally
> better than explicit r13 in the asm
> 
>asm volatile(
>"stb %0,%2(%1)"
>:
>: "r" (mask),
>"r" (local_paca),
>  "i" (offsetof(struct paca_struct, irq_soft_mask))
>: "memory");

(Please use "n" instead of "i".  Doesn't matter here, but it does in
many other places.)


Segher


Re: [PATCH 1/2] powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5

2022-09-23 Thread Segher Boessenkool
On Fri, Sep 23, 2022 at 05:17:45PM +1000, Nicholas Piggin wrote:
> On Thu Sep 22, 2022 at 4:50 AM AEST, Segher Boessenkool wrote:
> > On Wed, Sep 21, 2022 at 11:41:02AM +1000, Nicholas Piggin wrote:
> > > Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5.
> > > POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added
> > > the popcntb instruction which a compiler might use.
> > > 
> > > Use -mcpu=power4.
> > > 
> > > Fixes: 471d7ff8b51b ("powerpc/64s: Remove POWER4 support")
> > > Signed-off-by: Nicholas Piggin 
> >
> > Reviewed-by: Segher Boessenkool 
> >
> > Thank you!
> >
> > Maybe superfluous, but some more context: GCC's -mcpu=power4 means
> > POWER4+, ISA 2.01, according to our documentation.  There is no
> > difference with ISA 2.00 (what plain POWER4 implements) for anything
> > GCC does.
> 
> Huh, okay. Well I guess we are past that point now, interesting that
> another ISA version was done for 4+ though, and then another for 5.
> I don't see a list of changes from 2.00 in the public version, I
> wonder what else changed other than mtmsrd.

I think searching for "POWER4+" will give you everything.  I cannot find
a public 2.00 either, yeah.  I listed everything I think changed
elsewhere in the thread.


Segher


Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

2022-09-23 Thread Segher Boessenkool
On Fri, Sep 23, 2022 at 05:08:13PM +1000, Nicholas Piggin wrote:
> On Tue Sep 20, 2022 at 4:41 PM AEST, Christophe Leroy wrote:
> > local_paca is declared as global register asm("r13"), it is therefore
> > garantied to always ever be r13.
> >
> > It is therefore not required to opencode r13 in the assembly, use
> > a reference to local_paca->irq_soft_mask instead.

> The code matches the changelog AFAIKS. But I don't know where it is
> guaranteed it will always be r13 in GCC and Clang. I still don't know
> where in the specification or documentation suggests this.

"Global Register Variables" in the GCC manual.

> There was some assertion it would always be r13, but that can't be a
> *general* rule. e.g., the following code:
> 
> struct foo {
> #ifdef BIGDISP
> int array[1024*1024];
> #endif
> char bar;
> };
> 
> register struct foo *foo asm("r13");
> 
> static void setval(char val)
> {
> asm("stb%X0 %1,%0" : "=m" (foo->bar) : "r" (val));
> }
> 
> int main(void)
> {
> setval(10);
> }

Just use r13 directly in the asm, if that is what you want!

> With -O0 this generates stb 9,0(10) for me for GCC 12, and with -O2
> -DBIGDISP it generates stb 10,0(9). So that makes me nervious. GCC
> does not have some kind of correctness guarantee here, so it must not
> have this in its regression tests etc., and who knows about clang.

GCC has all kinds of correctness guarantees, here and elsewhere, that is
90% of a compiler's job.  But you don't *tell* it what you consider
"correct" here.

You wrote "foo->bar", and this expression was translated to something
that derived from r13.  If you made the asm something like
asm("stb%X0 %1,0(%0)" : : "r" (foo), "r" (val) : "memory");
it would work fine.  It would also work fine if you wrote 13 in the
template directly.  These things follow the rules, so are guaranteed.

The most important pieces of doc here may be
   * Accesses to the variable may be optimized as usual and the register
 remains available for allocation and use in any computations,
 provided that observable values of the variable are not affected.
   * If the variable is referenced in inline assembly, the type of
 access must be provided to the compiler via constraints (*note
 Constraints::).  Accesses from basic asms are not supported.
but read the whole "Global Register Variables" chapter?

> If it is true for some particular subset of cases that we can guarantee,
> e.g., using -O2 and irq_soft_mask offset within range of stb offset and
> we can point to specification such that both GCC and Clang will follow
> it, then okay. Otherwise I still think it's more trouble than it is
> worth.

-O2 makes it much more likely some inline assembler things work as
intended, but it guarantees nothing.  Sorry.


Segher


Re: [PATCH 2/2] powerpc/64s: update cpu selection options

2022-09-21 Thread Segher Boessenkool
On Wed, Sep 21, 2022 at 11:41:03AM +1000, Nicholas Piggin wrote:
> Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
> make that clear in the option name. The POWER5_CPU option is dropped
> because it's uncommon, and GENERIC_CPU covers it.
> 
> -mtune= before power8 is dropped because the minimum gcc version
> supports power8, and tuning is made consistent between big and little
> endian.
> 
> A 970 option is added for PowerPC 970 / G5 because they still have a
> user base, and -mtune=power8 does not generate good code for the 970.
> 
> This also updates the ISA versions document to add Power4/Power4+
> because I didn't realise Power4+ used 2.01.
> 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Segher Boessenkool 

Cheers,


Segher


Re: [PATCH 1/2] powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5

2022-09-21 Thread Segher Boessenkool
On Wed, Sep 21, 2022 at 11:41:02AM +1000, Nicholas Piggin wrote:
> Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5.
> POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added
> the popcntb instruction which a compiler might use.
> 
> Use -mcpu=power4.
> 
> Fixes: 471d7ff8b51b ("powerpc/64s: Remove POWER4 support")
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Segher Boessenkool 

Thank you!

Maybe superfluous, but some more context: GCC's -mcpu=power4 means
POWER4+, ISA 2.01, according to our documentation.  There is no
difference with ISA 2.00 (what plain POWER4 implements) for anything
GCC does.


Segher


Re: [RFC PATCH 5/7] powerpc/64s: update generic cpu option name and compiler flags

2022-09-21 Thread Segher Boessenkool
Hi!

On Wed, Sep 21, 2022 at 11:01:18AM +1000, Nicholas Piggin wrote:
> On Wed Sep 21, 2022 at 8:16 AM AEST, Segher Boessenkool wrote:
> > On Tue, Sep 20, 2022 at 12:01:47AM +1000, Nicholas Piggin wrote:
> > > Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
> > > make that clear in the option name.
> >
> > AFAIR the minimum now is POWER4+ (ISA 2.01), not POWER5 (ISA 2.02).
> 
> It's POWER5 now, because of commit 471d7ff8b5 ("powerpc/64s: Remove
> POWER4 support"), which is misguided about POWER4+ and also introduced
> the -mcpu=power5 bug on 970 builds :\

ISA 2.01 added just a few things (LPES[0], HDEC, some PM things, but
crucially also anything that sets MSR[PR] also sets MSR[EE] since then).

> Not sure it's worth adding POWER4+ support back but if someone has a
> POWER4+ or adds it to QEMU TCG, I will do the patch.

970 is 2.01 -- pretending it is 2.02 is a ticking time bomb: the popcntb
insn will be generated for popcount and parity intrinsics, which can be
generated by generic code!

> > > -mtune= before power8 is dropped because the minimum gcc version
> > > supports power8, and tuning is made consistent between big and little
> > > endian.
> >
> > Tuning for p8 on e.g. 970 gives quite bad results.  No idea if anyone
> > cares, but this is a serious regression if so.
> 
> It's for "generic" kernel so we set low minimum but higher tune,
> assuming that people would usually have newer, so it was already
> doing -mtune=power7.
> 
> We could make a specific 970/G5 entry though, since those still
> have users.

If that uses -mcpu=power4 (which means ISA 2.01 btw!) all is fine
already?  (Or -mcpu=970, same thing really, it just allows VMX as well).

Thanks for taking care of this Nick, much appreciated!


Segher


Re: [RFC PATCH 5/7] powerpc/64s: update generic cpu option name and compiler flags

2022-09-20 Thread Segher Boessenkool
Hi!

On Tue, Sep 20, 2022 at 12:01:47AM +1000, Nicholas Piggin wrote:
> Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
> make that clear in the option name.

AFAIR the minimum now is POWER4+ (ISA 2.01), not POWER5 (ISA 2.02).

> -mtune= before power8 is dropped because the minimum gcc version
> supports power8, and tuning is made consistent between big and little
> endian.

Tuning for p8 on e.g. 970 gives quite bad results.  No idea if anyone
cares, but this is a serious regression if so.

> Big endian drops -mcpu=power4 in favour of power5. Effectively the
> minimum compiler version means power5 was always being selected here,
> so this should not change anything. 970 / G5 code generation does not
> seem to have been a problem with -mcpu=power5, but it's possible we
> should go back to power4 to be really safe.

Yes, -mcpu=power5 code does *not* run on 970, if you are unlucky enough
that the compiler does something smart with popcntb (the sole non-float
insn new on p5, not counting hrfid).

> +# -mcpu=power5 should generate 970 compatible kernel code

It doesn't.  Even if it did, it would need more explanation!


Segher


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-14 Thread Segher Boessenkool
On Wed, Sep 14, 2022 at 04:55:27PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 14, 2022 at 02:28:26PM +, Michael Matz wrote:
> > Don't mix DWARF debug info with DWARF-based unwinding info, the latter 
> > doesn't imply the former.  Out of interest: how does ORC get around the 
> > need for CFI annotations (or equivalents to restore registers) and what 
> 
> Objtool 'interprets' the stackops. So it follows the call-graph and is
> an interpreter for all instructions that modify the stack. Doing that it
> konws what the stackframe is at 'most' places.

To get correct backtraces on e.g. PowerPC you need to emulate many of
the integer insns.  That is why GCC enables -fasynchronous-unwind-tables
by default for us.

The same is true for s390, aarch64, and x86 (unless 32-bit w/ frame
pointer).

The problem is that you do not know how to access anything on the stack,
whether in the current frame or in a previous frame, from a random point
in the program.  GDB has many heuristics for this, and it still does not
get it right in all cases.

> > makes it fast?  I want faster unwinding for DWARF as well, when there's 
> > feature parity :-)  Maybe something can be learned for integration into 
> > dwarf-unwind.
> 
> I think we have some details here:
> 
>  https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html

It is faster because it does a whole lot less.  Is that still enough?
It's not clear (to me) what exact information it wants to provide :-(


Segher


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-14 Thread Segher Boessenkool
On Wed, Sep 14, 2022 at 11:21:00AM +0100, Josh Poimboeuf wrote:
> On Mon, Sep 12, 2022 at 06:31:14AM -0500, Segher Boessenkool wrote:
> > On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > > 2) Noreturn functions:
> > >
> > >There's no reliable way to determine which functions are designated
> > >by the compiler to be noreturn (either explictly via function
> > >attribute, or implicitly via a static function which is a wrapper
> > >around a noreturn function.)
> > 
> > Or just a function that does not return for any other reason.
> > 
> > The compiler makes no difference between functions that have the
> > attribute and functions that do not.  There are good reasons to not
> > have the attribute on functions that do in fact not return.  The
> > not-returningness of the function may be just an implementation
> > accident, something you do not want part of the API, so it *should* not
> > have that attribute; or you may want the callers to a function to not be
> > optimised according to this knowledge (you cannot *prevent* that, the
> > compiler can figure it out it other ways, but still) for any other
> > reason.
> 
> Yes, many static functions that are wrappers around noreturn functions
> have this "implicit noreturn" property.

I meant functions that are noreturn intrinsically.  The trivial example:

void f(void)
{
for (;;)
;
}

>  I agree we would need to know
> about those functions (or, as Michael suggested, their call sites) as
> well.

Many "potentially does not return" functions (there are very many such
functions!) turn into "never returns" functions, for some inputs (or
something in the environment).  If the compiler specialises a code path
that does not return, you'll not see that marked up any way.  Of course
such a path should not be taken in the kernel, normally :-)

> > >This information is needed because the
> > >code after the call to such a function is optimized out as
> > >unreachable and objtool has no way of knowing that.
> > 
> > Since June we (GCC) have -funreachable-traps.  This creates a trap insn
> > wherever control flow would otherwise go into limbo.
> 
> Ah, that's interesting, though I'm not sure if we'd be able to
> distinguish between "call doesn't return" traps and other traps or
> reasons for UD2.

The trap handler can see where the trap came from.  And then look up
that address in some tables or such.  Just like __bug_table?


Segher


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-12 Thread Segher Boessenkool
Hi!

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> 2) Noreturn functions:
>
>There's no reliable way to determine which functions are designated
>by the compiler to be noreturn (either explictly via function
>attribute, or implicitly via a static function which is a wrapper
>around a noreturn function.)

Or just a function that does not return for any other reason.

The compiler makes no difference between functions that have the
attribute and functions that do not.  There are good reasons to not
have the attribute on functions that do in fact not return.  The
not-returningness of the function may be just an implementation
accident, something you do not want part of the API, so it *should* not
have that attribute; or you may want the callers to a function to not be
optimised according to this knowledge (you cannot *prevent* that, the
compiler can figure it out it other ways, but still) for any other
reason.

>This information is needed because the
>code after the call to such a function is optimized out as
>unreachable and objtool has no way of knowing that.

Since June we (GCC) have -funreachable-traps.  This creates a trap insn
wherever control flow would otherwise go into limbo.


Segher


Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

2022-09-08 Thread Segher Boessenkool
Hi!

On Thu, Sep 08, 2022 at 05:07:24PM +0200, Arnd Bergmann wrote:
> - if the XOR code has its frame size explode like this, it's
>   probably an indication of the compiler doing something wrong,
>   not the kernel code.

On the contrary, it is most likely an indication that the kernel code
wants something unreasonable.  Like, having 20 variables live at the
same time, but still wanting nicely scheduled machine code generated.

But I suspect GCC unrolled the loops here, even?  Best way to prevent
that here is to put an option in the Makefile, for these files.  We
don't want any of this unrolled after all?  Or, alternatively, remove
all the manual unrolling from this code, let GCC do its thing, without
painting it in a corner.

>   The result is likely that the "optimized"
>   XOR implementation is slower than the default version as a
>   result, and the kernel will pick the other one at boot time.

Yes.  So it's self-healing even, of a sort :-)


Segher


Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

2022-09-08 Thread Segher Boessenkool
On Thu, Sep 08, 2022 at 06:00:24AM +, Christophe Leroy wrote:
> Looking at it more deeply, I see strange things.

I'll have to see full generated machine code to be able to see strange
things, there isn't enough information at all here yet.  Sorry.

Use private mail if it is too big or uninteresting for the list :-)

> What is that frame size ? I thought it was the number of bytes r1 is 
> decremented at the begining of the function, but it seems not, at least 
> on GCC. It seems GCC substrats 112 bytes while clang doesn't.

That is the vars size + the fixed size + the size of the parameter
save area + the size of the regs save area, rounded up to a multiple
of 16.  Fixed size is 8 on 32-bit PowerPC ELF.  Frame size used by GCC
here is just the vars size.

> So it seems that GCC and CLANG don't warn on the same thing, is that 
> expected ? GCC substrats 112 bytes, which is the minimum frame size on a 
> ppc64, but here I'm building a ppc32 kernel, min frame size is 16.

I need to see the generated code to make sense of what is happening
here.  It sounds like it is doing varargs calls or similar expensive
stack juggling.  Or just saving a boatload of registers on the stack.

> And CLANG is still using stack a lot more than GCC.

Good to hear!  Well, good for GCC, anyway ;-)


Segher


Re: [PATCH v2 16/16] objtool/powerpc: Add --mcount specific implementation

2022-09-05 Thread Segher Boessenkool
Hi!

On Mon, Sep 05, 2022 at 04:15:07PM +0530, Naveen N. Rao wrote:
> Segher Boessenkool wrote:
> >>> + if ((insn & 3) == 1) {
> >>> + *type = INSN_CALL;
> >>> + *immediate = insn & 0x3fc;
> >>> + if (*immediate & 0x200)
> >>> + *immediate -= 0x400;
> >>> + }
> >>> + break;
> >>> + }
> >
> >Does this handle AA=1 correctly at all?  That is valid both with and
> >without relocations, just like AA=0.  Same for AA=1 LK=0 btw.
> >
> >If you only handle AA=0, the code should explicitly test for that.
> 
> The code does test for AA=0 LK=1 with the if statement there?

Yes, but that is not what I said :-)

It may be fine to not *handle* AA=1 at all, but the code should at least
scream bloody murder when it encounters it anyway :-)


Segher


Re: [PATCH v2] powerpc: Fix irq_soft_mask_set() and irq_soft_mask_return() with sanitizer

2022-09-02 Thread Segher Boessenkool
On Fri, Sep 02, 2022 at 10:57:27AM -0500, Peter Bergner wrote:
> On 8/31/22 5:45 PM, Segher Boessenkool wrote:
> > Yes, this is guaranteed.
> 
> Agree with Segher here.  That said, there was a gcc bug a long time
> ago where gcc copied r13 into a temporary register and used it from there.

r13 is a fixed register on most of our ABIs (everything that is not AIX
or Darwin, even), so this can never happen.  Except if there are bugs,
of course ;-)

> That's ok (correctness wise, but not ideal) from user land standpoint,
> but we took a context switch after the reg copy and it was restarted on
> a different cpu, so differnt local_paca and r13 value.  We went boom
> because the copy wasn't pointing to the correct local_paca anymore.
> So it is very important the compiler always use r13 when accessing
> the local_paca.

Yes.  So we either whould use -ffixed-r13, or just not use unsupported
compilers.  powerpc*-linux and powerpc*-elf work fine for example :-)


Segher


Re: [PATCH v2 2/2] powerpc/math-emu: Remove -w build flag and fix warnings

2022-09-02 Thread Segher Boessenkool
Hi!

On Fri, Sep 02, 2022 at 09:11:48AM -0700, Nathan Chancellor wrote:
> On Fri, Sep 02, 2022 at 10:59:54AM -0500, Segher Boessenkool wrote:
> > Maybe add -Wno-implicit-fallthrough?  This code is a copy from outside
> > the kernel, no one has ever wanted to maintain it, if nothing else (the
> > more politically correct formulation is "we cannot as easily pick up
> > improvements from upstream if we modify stuff").
> 
> Sure, we could do something like this if you preferred:
> 
> diff --git a/arch/powerpc/math-emu/Makefile b/arch/powerpc/math-emu/Makefile
> index 26fef2e5672e..ed775747a2a5 100644
> --- a/arch/powerpc/math-emu/Makefile
> +++ b/arch/powerpc/math-emu/Makefile
> @@ -16,3 +16,7 @@ obj-$(CONFIG_SPE)   += math_efp.o
>  
>  CFLAGS_fabs.o = -fno-builtin-fabs
>  CFLAGS_math.o = -fno-builtin-fabs
> +
> +ifdef CONFIG_CC_IS_CLANG
> +ccflags-remove-y := $(CONFIG_CC_IMPLICIT_FALLTHROUGH)
> +endif

That is a GCC warning as well.  It needs some $(call cc-option ...)
thing then, though (GCC versions of more than two or so years ago are
supported as well).

> At the same time, I see other modifications to these files that appear
> to be for the kernel only so I suspect that this is already in the "we
> cannot as easily pick up improvements from upstream" category,
> regardless of that diff.

So maybe someone should really maintain this stuff, bring it up to some
reasonably modern state?  :-)


Segher


Re: [PATCH v2 2/2] powerpc/math-emu: Remove -w build flag and fix warnings

2022-09-02 Thread Segher Boessenkool
On Fri, Sep 02, 2022 at 08:37:23AM -0700, Nathan Chancellor wrote:
> On Fri, Sep 02, 2022 at 12:08:55PM +0200, Christophe Leroy wrote:
> > This should have been detected by gcc at build time, but due to
> > '-w' flag it went undetected.
> > 
> > Removing that flag leads to many warnings hence errors.

> Thanks for figuring out what was going on here! I took this patch for a
> spin with clang and it has a few more errors around
> -Wimplicit-fallthrough:

Maybe add -Wno-implicit-fallthrough?  This code is a copy from outside
the kernel, no one has ever wanted to maintain it, if nothing else (the
more politically correct formulation is "we cannot as easily pick up
improvements from upstream if we modify stuff").


Segher


Re: [PATCH] powerpc/vdso: link with -z noexecstack

2022-09-02 Thread Segher Boessenkool
Hi!

On Fri, Sep 02, 2022 at 09:57:09AM +0200, Christophe Leroy wrote:
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -128,7 +128,8 @@ extra-y   += vmlinux.lds
>  
>  obj-$(CONFIG_RELOCATABLE)+= reloc_$(BITS).o
>  
> -obj-$(CONFIG_PPC32)  += entry_32.o setup_32.o early_32.o 
> static_call.o
> +obj-$(CONFIG_PPC32)  += entry_32.o setup_32.o early_32.o
> +obj-$(CONFIG_HAVE_STATIC_CALL)   += static_call.o

Did you want to commit this part?  The commit message doesn't mention it.


Segher


Re: [PATCH] powerpc/math_emu/efp: Include module.h

2022-09-02 Thread Segher Boessenkool
On Fri, Sep 02, 2022 at 08:43:49AM +, Christophe Leroy wrote:
> Le 01/09/2022 à 21:47, Segher Boessenkool a écrit :
> > On Thu, Sep 01, 2022 at 05:41:33AM +, Christophe Leroy wrote:
> >> I think it would be worth a GCC bug report.
> > 
> > We need a stand-alone testcase for this.  When you have created one, at
> > least 98% of the time you discover the bug is in user code after all.
> > 
> > Which is a very good thing, it means the problem can be fixed simpler,
> > cheaper, and a lot faster :-)
> 
> Easy to reproduce with a .c file that has a single line:
> 
> non_existing_macro(xxx);

That was fast (and cheap and simple) :-)

> Apparently that's due to the -w option in arch/powerpc/math_emu/Makefile:
> 
>   ccflags-y = -w
> 
> Was introduced by commit d2b194ed8208 ("powerpc/math-emu: Use kernel 
> generic math-emu code")
> 
> If I understand correctly it means 'ignore all warnings'. Then it seems 
> CLANG doesn't honor that request.

'-w'
 Inhibit all warning messages.

GCC's initial commit has this already (1992).


Segher


  1   2   3   4   5   6   7   8   9   10   >