Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-16 Thread Segher Boessenkool
Hi!

On Thu, May 16, 2024 at 10:06:58PM +1000, Michael Ellerman wrote:
> Andy Polyakov  writes:
> >>> +.abiversion  2
> >>
> >> I'd prefer that was left to the compiler flags.
> >
> > Problem is that it's the compiler that is responsible for providing this
> > directive in the intermediate .s prior invoking the assembler. And there
> > is no assembler flag to pass through -Wa.
> 
> Hmm, right. But none of our existing .S files include .abiversion
> directives.
> 
> We build .S files with gcc, passing -mabi=elfv2, but it seems to have no
> effect.

Yup.  You coulds include some header file, maybe?  Since you run the
assembler code through the C preprocessor anyway, for some weird reason :-)

> But the actual code follows ELFv2, because we wrote it that way, and I
> guess the linker doesn't look at the actual ABI version of the .o ?

It isn't a version.  It is an actual different ABI.

GNU LD allows linking together whatever, yes.

> Is .abiversion documented anywhere? I can't see it in the manual.

Yeah me neither.  https://sourceware.org/bugzilla/enter_bug.cgi ?
A commandline flag (to GAS) would seem best?


Segher


Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

2024-05-16 Thread Segher Boessenkool
On Wed, May 15, 2024 at 10:29:56AM +0200, Andy Polyakov wrote:
> >+static void cswap(fe51 p, fe51 q, unsigned int bit)
> 
> The "c" in cswap stands for "constant-time," and the problem is that 
> contemporary compilers have exhibited the ability to produce 
> non-constant-time machine code as result of compilation of the above 
> kind of technique.

This can happen with *any* comnpiler, on *any* platform.  In general,
you have to write machine code if you want to be sure what machine code
will eventually be executed.

>  The outcome is platform-specific and ironically some 
> of PPC code generators were observed to generate "most" 
> non-constant-time code. "Most" in sense that execution time variations 
> would be most easy to catch. One way to work around the problem, at 
> least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
> you calculate 'c'. But there is no guarantee that the next compiler 
> version won't see through it, hence the permanent solution is to do it 
> in assembly. I can put together something...

Such tricks can help ameliorate the problem, sure.  But it is not a
solution ever.


Segher


Re: [PATCH 3/3] powerpc: Check only single values are passed to CPU/MMU feature checks

2024-05-10 Thread Segher Boessenkool
On Fri, May 10, 2024 at 04:45:37PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Thu, May 09, 2024 at 10:12:48PM +1000, Michael Ellerman wrote:
> >> cpu_has_feature()/mmu_has_feature() are only able to check a single
> >> feature at a time, but there is no enforcement of that.
> >> 
> >> In fact, as fixed in the previous commit, there was code that was
> >> passing multiple values to cpu_has_feature().
> >> 
> >> So add a check that only a single feature is passed using popcount.
> >> 
> >> Note that the test allows 0 or 1 bits to be set, because some code
> >> relies on cpu_has_feature(0) being false, the check with
> >> CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE.
> >
> > This btw is exactly
> >
> > BUILD_BUG_ON(feature & (feature - 1));
> >
> > but the popcount is more readable :-)
> 
> Yeah for those of us who don't see bits cascading in our sleep I think
> the popcount is easier to understand ;)

Absolutely :-)  This is just one of the most well-known bittricks, for
seeing if x is a power of two you write

  x && x & (x-1)

but here you do not need to test for x not being zero.  Hardly ever get
to use that simpler thing, so it jumped out at me here :-)

[ For understanding the x & (x-1) thing, it is perhaps easiest if you
first consider an x with more than one bit set: x-1 will have the same
topmost set bit. ]


Segher


Re: [PATCH 3/3] powerpc: Check only single values are passed to CPU/MMU feature checks

2024-05-09 Thread Segher Boessenkool
On Thu, May 09, 2024 at 10:12:48PM +1000, Michael Ellerman wrote:
> cpu_has_feature()/mmu_has_feature() are only able to check a single
> feature at a time, but there is no enforcement of that.
> 
> In fact, as fixed in the previous commit, there was code that was
> passing multiple values to cpu_has_feature().
> 
> So add a check that only a single feature is passed using popcount.
> 
> Note that the test allows 0 or 1 bits to be set, because some code
> relies on cpu_has_feature(0) being false, the check with
> CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE.

This btw is exactly

BUILD_BUG_ON(feature & (feature - 1));

but the popcount is more readable :-)


Segher


Re: powerpc: io-defs.h:43:1: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]

2024-04-16 Thread Segher Boessenkool
On Tue, Apr 16, 2024 at 03:02:52PM +0530, Naresh Kamboju wrote:
> In file included from arch/powerpc/include/asm/io.h:672:
> arch/powerpc/include/asm/io-defs.h:43:1: error: performing pointer
> arithmetic on a null pointer has undefined behavior
> [-Werror,-Wnull-pointer-arithmetic]

It is not UB, but just undefined: the program is meaningless.

It is not a null pointer but even a null pointer constant here.  It
matters in places, including here.

It would help if the warnings were more correct :-(


Segher


Re: Appropriate liburcu cache line size for Power

2024-03-26 Thread Segher Boessenkool
On Tue, Mar 26, 2024 at 06:19:38PM +1100, Michael Ellerman wrote:
> Mathieu Desnoyers  writes:
> The ISA doesn't specify the cache line size, other than it is smaller
> than a page.

It also says it is "aligned".  Nowhere is it said what an aligned size
is, but it seems clear it has to be a power of two.

> In practice all the 64-bit IBM server CPUs I'm aware of have used 128
> bytes.

Yup.  It is 128B on p3 already.

> It is possible to discover at runtime via AUXV headers. But that's no
> use if you want a compile-time constant.

The architecture does not require the data block size to be equal to the
instruction block size, already.  But many programs subscribe to an
overly simplified worldview, which is a big reason everything is 128B on
all modern PowerPC.

It is quite a nice tradeoff size, there has to be a huge change in the
world for this to ever change :-)


Segher


Re: Appropriate liburcu cache line size for Power

2024-03-25 Thread Segher Boessenkool
On Mon, Mar 25, 2024 at 03:34:30PM -0500, Nathan Lynch wrote:
> Mathieu Desnoyers  writes:
> For what it's worth, I found a copy of an IBM Journal of Research &
> Development article confirming that POWER5's L3 had a 256-byte line
> size:
> 
>   Each slice [of the L3] is 12-way set-associative, with 4,096
>   congruence classes of 256-byte lines managed as two 128-byte sectors
>   to match the L2 line size.
> 
> https://www.eecg.utoronto.ca/~moshovos/ACA08/readings/power5.pdf
> 
> I don't know of any reason to prefer 256 over 128 for current Power
> processors though.

The reason some old CPUs use bigger physical cache line sizes is to have
fewer cache lines, which speeds up lookup, or reduces power consumption
of lookup, or both.  This isn't trivial at all when implemented as a
parallel read and compare of all tags, which was the usual way to do
things long ago.

Nowadays usually a way predictor is used, severely limiting the number
of tags to be compared.  So we can use a 128B physical line size always
now.  Note that this was physical only, everything looked like 128B on
a P5 system as well.

P5 wasn't first like this fwiw, look at the L2 on a 604 for example :-)


Segher


Re: [PATCH 2/3] tools/erf/util/annotate: Set register_char and memory_ref_char for powerpc

2024-03-09 Thread Segher Boessenkool
All instructions with a primary opcode from 32 to 63 are memory insns,
and no others.  It's trivial to see whether it is a load or store, too
(just bit 3 of the insn).  Trying to parse disassembled code is much
harder, and you easily make some mistakes here.

On Sat, Mar 09, 2024 at 12:55:12PM +0530, Athira Rajeev wrote:
> To identify if the instruction has any memory reference,
> "memory_ref_char" field needs to be set for specific architecture.
> Example memory instruction:
> lwz r10,0(r9)
> 
> Here "(" is the memory_ref_char. Set this as part of arch->objdump

What about "lwzx r10,0,r19", semantically exactly the same instruction?
There is no paren in there.  Not all instructions using parens are
memory insns, either, not in assembler code at least.

> To get register number and access offset from the given instruction,
> arch->objdump.register_char is used. In case of powerpc, the register
> char is "r" (since reg names are r0 to r31). Hence set register_char
> as "r".

cr0..cr7
r0..r31
f0..f31
v0..v31
vs0..vs63
and many other spellings.  Plain "0..63" is also fine.

The "0" in my lwzx example is a register field as well (and it stands
for "no register", not for "r0").  Called "(RA|0)" usually (incidentally,
see the parens there as well, oh joy).

Don't you have the binary code here as well, not just a disassembled
representation of it?  It is way easier to just use that, and you'll get
much better results that way :-)


Segher


Re: [PATCH 5/5] powerpc: Remove cpu-as-y completely

2024-02-29 Thread Segher Boessenkool
On Thu, Feb 29, 2024 at 11:25:21PM +1100, Michael Ellerman wrote:
> From: Christophe Leroy 

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH 4/5] powerpc/fsl: Modernise mt/mfpmr

2024-02-29 Thread Segher Boessenkool
On Thu, Feb 29, 2024 at 11:25:20PM +1100, Michael Ellerman wrote:
> With the addition of the machine directives, these are no longer simple
> 1-2 liner macros. So modernise them to be static inlines and use named
> asm parameters.
> 
> Signed-off-by: Michael Ellerman 

You got rid of the __stringify blight as well.  Great :-)

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH 2/5] powerpc/64s: Use .machine power4 around dcbt

2024-02-29 Thread Segher Boessenkool
On Thu, Feb 29, 2024 at 11:25:18PM +1100, Michael Ellerman wrote:
> There are multiple decodings for the "dcbt" mnemonic, so the assembler
> has to pick one.
> 
> That requires passing -many to the assembler, which is not recommended.
> 
> Without -many the clang 14 / binutils 2.38 build fails with:
> 
>   arch/powerpc/kernel/exceptions-64s.S:2976: Error: junk at end of line: 
> `0b01010'
>   clang: error: assembler command failed with exit code 1 (use -v to see 
> invocation)
> 
> Fix it by adding .machine directives around the use of dcbt to specify
> which encoding is desired.
> 
> Signed-off-by: Michael Ellerman 

Looks good, thanks!

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH] tty: hvc-iucv: fix function pointer casts

2024-02-14 Thread Segher Boessenkool
On Wed, Feb 14, 2024 at 11:37:21AM +0100, Greg Kroah-Hartman wrote:
> On Wed, Feb 14, 2024 at 09:46:33AM +, David Laight wrote:
> > From: Segher Boessenkool
> > > Sent: 13 February 2024 19:13
> > > 
> > > On Tue, Feb 13, 2024 at 11:17:49AM +0100, Arnd Bergmann wrote:
> > > > clang warns about explicitly casting between incompatible function
> > > > pointers:
> > > >
> > > > drivers/tty/hvc/hvc_iucv.c:1100:23: error: cast from 'void (*)(const 
> > > > void *)' to 'void (*)(struct
> > > device *)' converts to incompatible function type 
> > > [-Werror,-Wcast-function-type-strict]
> > > >  1100 | priv->dev->release = (void (*)(struct device *)) kfree;
> > > >   |  ^
> > > 
> > > Such a cast of course is explicitly allowed by 6.3.2.3/8, only calling a
> > > function using a non-compatible type is UB.  This warning message is
> > > quite misleading.  Doubly so because of the -Werror, as always.
> > 
> > But it will get called using the wrong type.
> > And (is it) fine-ibt will reject the incorrect call.
> 
> And rightfully so, this type of casting abuse is just that, abuse.
> 
> Almost no one should be just calling kfree() on a device pointer, I'll
> look at the surrounding code as odds are something odd is going on.  But
> for now, this patch is correct.

Yes, and I said so.  My point is that the warning message pretends the
cast is bad or dangerous.  It is not, similar casts are the only way in
C to do certain things (yes, you can always avoid it completely, by
writing completely different code, like the patch does, and that
sometimes is a better idea even).

But the warning message is misleading and does more damage than it helps
avoid.


Segher


Re: [PATCH] tty: hvc-iucv: fix function pointer casts

2024-02-13 Thread Segher Boessenkool
On Tue, Feb 13, 2024 at 11:17:49AM +0100, Arnd Bergmann wrote:
> clang warns about explicitly casting between incompatible function
> pointers:
> 
> drivers/tty/hvc/hvc_iucv.c:1100:23: error: cast from 'void (*)(const void *)' 
> to 'void (*)(struct device *)' converts to incompatible function type 
> [-Werror,-Wcast-function-type-strict]
>  1100 | priv->dev->release = (void (*)(struct device *)) kfree;
>   |  ^

Such a cast of course is explicitly allowed by 6.3.2.3/8, only calling a
function using a non-compatible type is UB.  This warning message is
quite misleading.  Doubly so because of the -Werror, as always.

Your proposed new code of course is nice and simple (albeit a bit bigger
than it was before, both source and binary).  Such is life ;-)


Segher


Re: [PATCH] powerpc: Add gpr1 and fpu save/restore functions

2024-02-12 Thread Segher Boessenkool
On Mon, Feb 12, 2024 at 12:07:03PM -0600, Timothy Pearson wrote:
> > I have done it for *all* architectures some ten years ago.  Never found
> > any problem.
> 
> That makes sense, what I mean by invasive is that we'd need buy-in from the 
> other
> maintainers across all of the affected architectures.  Is that likely to 
> occur?

I don't know.  Here is my PowerPC-specific patch, it's a bit older, it
might not apply cleanly anymore, the changes needed should be obvious
though:


=== 8< ===
commit f16dfa5257eb14549ce22243fb2b465615085134
Author: Segher Boessenkool 
Date:   Sat May 3 03:48:06 2008 +0200

powerpc: Link vmlinux against libgcc.a

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index b7212b619c52..0a2fac6ffc1c 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -158,6 +158,9 @@ core-y  += arch/powerpc/kernel/ 
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 
+LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
+libs-y += $(LIBGCC)
+
 drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/
 
 # Default to zImage, override when needed
=== 8< ===


> > There are better options than -Os, fwiw.  Some --param's give smaller
> > *and* faster kernels.  What exactly is best is heavily arch-dependent
> > though (as well as dependent on the application code, the kernel code in
> > this case) :-(
> 
> I've been through this a few times, and -Os is the only option that makes
> things (just barely) fit unfortunately.

-O2 with appropriate inlining tuning beats -Os every day of the week,
in my experience.


Segher


Re: [PATCH] powerpc: Add gpr1 and fpu save/restore functions

2024-02-12 Thread Segher Boessenkool
On Mon, Feb 12, 2024 at 11:46:19AM -0600, Timothy Pearson wrote:
> Interesting, that make sense.
> 
> How should we proceed from the current situation?  Bringing in libgcc seems
> like a fairly invasive change,

I have done it for *all* architectures some ten years ago.  Never found
any problem.

> should we merge this to fix the current bug
> (cannot build ppc64 kernel in size-optimized mode) and start discussion on
> bringing in libgcc as the long-term fix across multiple architectures?
> 
> My goal here is to not have to carry a downstream patch in perpetuity for
> our embedded Linux firmware, which needs to be compiled in size-optimized
> mode due to hardware Flash limitations.

There are better options than -Os, fwiw.  Some --param's give smaller
*and* faster kernels.  What exactly is best is heavily arch-dependent
though (as well as dependent on the application code, the kernel code in
this case) :-(


Segher


Re: [PATCH] powerpc: Add gpr1 and fpu save/restore functions

2024-02-12 Thread Segher Boessenkool
On Mon, Feb 12, 2024 at 11:09:38AM -0600, Timothy Pearson wrote:
> There is existing code in the kernel right now to provide support functions 
> for gpr0 and altivec save/restore.  I don't know the full story here, but at 
> some point in the kernel's history it seems to have been decided to provide 
> the helper functions in lieu of linking libgcc directly.  If this is 
> incorrect, then I need to know that so I can rework the patch to enable libcc 
> and remove the existing support functions.
> 
> Is there anyone on-list that knows more of the history and decision-making 
> that went into the current state of the kernel here?

Long long time ago, linux-0.11 or something, it was discovered that some
programmiing mistakes resulted in double-length divisions (64x64->64 on
32-bit systems, say).  Most architectures have no hardware support for
that, x86 is one of those; so you need very expensive support routines
to do that (_udivdi3 or _divdi3 in that case, ...ti3 on 64-bit archs).

So it was decided to not link to libgcc to avoid this.  But that means
that all the extremely many other suppoort routines, more for some other
archs, are also not there.  While it would have been much easier to just
link to something that provides the _{u,}divdi3 symbol and then causes a
forced linking error from that!


Segher


Re: [PATCH] powerpc: Add gpr1 and fpu save/restore functions

2024-02-12 Thread Segher Boessenkool
On Mon, Feb 12, 2024 at 10:41:18AM -0600, Timothy Pearson wrote:
> Implement gpr1 and fpu save/restore functions per the ABI v2 documentation.

There is no "ABI v2".  This is the ELFv2 ABI, it is a name, it is not a
version 2 of anything (in fact, it is version 1 everywhere).

The same functions are needed and used in other ABIs, too.

But, why do this patch?  You just need

+LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)

+libs-y += $(LIBGCC)

and nothing more.  It is required for proper functioning of GCC to link
with the libgcc support library.


Segher


Re: [PATCH] powerpc/64: Set LR to a non-NULL value in task pt_regs on scv entry

2024-01-25 Thread Segher Boessenkool
Hi!

On Thu, Jan 25, 2024 at 05:12:28PM +0530, Naveen N Rao wrote:
> diff --git a/arch/powerpc/kernel/interrupt_64.S 
> b/arch/powerpc/kernel/interrupt_64.S
> index bd863702d812..5cf3758a19d3 100644
> --- a/arch/powerpc/kernel/interrupt_64.S
> +++ b/arch/powerpc/kernel/interrupt_64.S
> @@ -53,6 +53,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
>   ld  r1,PACAKSAVE(r13)
>   std r10,0(r1)
>   std r11,_NIP(r1)
> + std r11,_LINK(r1)

Please add a comment here then, saying what the store is for?


Segher


Re: [PATCH] powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large

2024-01-09 Thread Segher Boessenkool
On Tue, Jan 09, 2024 at 03:15:35PM +, Christophe Leroy wrote:
> >   CFLAGS-$(CONFIG_PPC64)+= $(call cc-option,-mcall-aixdesc)
> >   endif
> >   endif
> > -CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
> > cc-option,-mminimal-toc))
> > +CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium)
> 
> Should we still use $(call cc-option  here ?
> As we only deal with medium model now, shouldn't we make it such that it 
> fails in case the compiler doesn't support -mcmodel=medium ?

The -mcmodel= flag has been supported since 2010.  The kernel requires
a GCC from 2015 or later (GCC 5.1 is the minimum).  -mcmodel=medium is
(and always has been) the default, so it is always supported, yes.


Segher


Re: [PATCH 4/4] powerpc/Makefile: Auto detect cross compiler

2023-12-06 Thread Segher Boessenkool
On Wed, Dec 06, 2023 at 10:55:48PM +1100, Michael Ellerman wrote:
> Look for various combinations, matching:
>   powerpc(64(le)?)?(-unknown)?-linux(-gnu)?-
> 
> There are more possibilities, but the above is known to find a compiler
> on Fedora and Ubuntu (which use linux-gnu-), and also detects the
> kernel.org cross compilers (which use linux-).

$ sh ../config.sub powerpc64-linux
powerpc64-unknown-linux-gnu

I am very very lazy, so buildall uses short names everywhere :-)

Btw, can you build the kernel for a config that differs in 32/64 or
BE/LE with the default your compiler uses?  There is no reason this
shouldn't work (you don't use any system libraries), but you need to
use the correct compiler command-line flags for that always.

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH 2/4] powerpc/vdso: No need to undef powerpc for 64-bit build

2023-12-06 Thread Segher Boessenkool
On Wed, Dec 06, 2023 at 10:55:46PM +1100, Michael Ellerman wrote:
> But the 64-bit compiler doesn't define powerpc in the first place,

Yes, only __powerpc__ (and __powerpc64__) :-)


Segher


Re: [pci:controller/xilinx-xdma] BUILD REGRESSION 8d786149d78c7784144c7179e25134b6530b714b

2023-10-31 Thread Segher Boessenkool
On Tue, Oct 31, 2023 at 09:56:00AM -0500, Bjorn Helgaas wrote:
> That said, the unused functions do look legit:
> 
> grackle_set_stg() is a static function and the only call is under
> "#if 0".

It is a static inline.  It is *normal* that that is uncalled.  It is
very similar to a macro that has no invocation.  The warning is
overeager.


Segher


Re: [PATCH 0/7] arch/*: config: Remove ReiserFS from defconfig

2023-09-19 Thread Segher Boessenkool
On Tue, Sep 19, 2023 at 12:00:34AM +, Peter Lafreniere wrote:
> On Monday, September 18th, 2023 at 19:41, Segher Boessenkool 
>  wrote:
> > On Mon, Sep 18, 2023 at 05:56:09PM +, Peter Lafreniere wrote:
> > 
> > > ReiserFS has been considered deprecated for 19 months since commit
> > > eb103a51640e ("reiserfs: Deprecate reiserfs"). However, there are
> > > several architectures that still build it into their defconfig kernels.
> > > 
> > > As ReiserFS will be removed in 2025, delete all ReiserFS-related options
> > > from defconfig files before the filesystem's removal.
> > 
> > 
> > This is essentially equivalent to deleting the filesystem now. Why do
> > this? Is there such a hurry?
> 
> This is not equivalent to deleting the filesystem. The filesystem can still
> be configured into kernels, and few distros use a defconfig kernel anyway.

Most people who compile kernels use defconfigs though.  Distros are a
tiny minority if you look at builds.

Again: why do you want this?


Segher


Re: [PATCH 0/7] arch/*: config: Remove ReiserFS from defconfig

2023-09-18 Thread Segher Boessenkool
On Mon, Sep 18, 2023 at 05:56:09PM +, Peter Lafreniere wrote:
> ReiserFS has been considered deprecated for 19 months since commit
> eb103a51640e ("reiserfs: Deprecate reiserfs"). However, there are
> several architectures that still build it into their defconfig kernels.
> 
> As ReiserFS will be removed in 2025, delete all ReiserFS-related options
> from defconfig files before the filesystem's removal.

This is essentially equivalent to deleting the filesystem now.  Why do
this?  Is there such a hurry?


Segher


Re: [PATCH v1 10/21] powerpc/kexec: refactor for kernel/Kconfig.kexec

2023-06-15 Thread Segher Boessenkool
On Thu, Jun 15, 2023 at 01:34:25PM +1000, Michael Ellerman wrote:
> Eric DeVolder  writes:
> > -config KEXEC_FILE
> > -   bool "kexec file based system call"
> > -   select KEXEC_CORE
> > -   select HAVE_IMA_KEXEC if IMA
> > -   select KEXEC_ELF
> > -   depends on PPC64
> > -   depends on CRYPTO=y
> > -   depends on CRYPTO_SHA256=y
> ...
> > +
> > +config ARCH_HAS_KEXEC_FILE
> > +   def_bool PPC64 && CRYPTO && CRYPTO_SHA256
> 
> The =y's got lost here.
> 
> I think they were both meaningful, because both options are tristate. So
> this previously required them to be built-in (=y), whereas after your
> patch it will allow them to be modules.
> 
> I don't know for sure that those options need to be built-in, but that's
> what the code does now, so this patch shouldn't change it, at least
> without an explanation.

This patch shouldn't change it at all, period.  If you want to change it
(and that sounds like a good idea, if it is possible anyway), that
should be a separate patch.


Segher


Re: Passing the complex args in the GPR's

2023-06-06 Thread Segher Boessenkool
Hi!

On Tue, Jun 06, 2023 at 08:35:22PM +0530, Umesh Kalappa wrote:
> Hi Adnrew,
> Thank you for the quick response and for PPC64 too ,we do have
> mismatches in ABI b/w complex operations like
> https://godbolt.org/z/bjsYovx4c .
> 
> Any reason why GCC chose to use GPR 's here ?

What did you expect, what happened instead?  Why did you expect that,
and why then is it an error what did happen?

You used -O0.  As long as the code works, all is fine.  But unoptimised
code frequently is hard to read, please use -O2 instead?

As Andrew says, why did you use -m32 for GCC but -m64 for LLVM?  It is
hard to compare those at all!  32-bit PowerPC Linux ABI (based on 32-bit
PowerPC ELF ABI from 1995, BE version) vs. 64-bit ELFv2 ABI from 2015
(LE version).


Segher


Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail

2023-04-24 Thread Segher Boessenkool
On Mon, Apr 24, 2023 at 08:28:55AM -0700, Boqun Feng wrote:
> On Mon, Apr 24, 2023 at 10:13:51AM -0500, Segher Boessenkool wrote:
> > At what points can r13 change?  Only when some particular functions are
> > called?
> 
> r13 is the local paca:
> 
>   register struct paca_struct *local_paca asm("r13");
> 
> , which is a pointer to percpu data.

Yes, it is a global register variable.

> So if a task schedule from one CPU to anotehr CPU, the value gets
> changed.

But the compiler does not see that something else changes local_paca (or
r13 some other way, via assembler code perhaps)?  Or is there a compiler
bug?

If the latter is true:

Can you make a reproducer and open a GCC PR?  <https://gcc.gnu.org/bugs/>
for how to get started doing that.  We need *exact* code that shows the
problem, together with a compiler command line.  So that we can
reproduce the problem.  That is step 0 in figuring out what is going on,
and then maybe fixing the problem :-)


Segher


Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail

2023-04-24 Thread Segher Boessenkool
Hi!

On Mon, Apr 24, 2023 at 11:14:00PM +1000, Michael Ellerman wrote:
> Boqun Feng  writes:
> > On Sat, Apr 22, 2023 at 09:28:39PM +0200, Joel Fernandes wrote:
> >> On Sat, Apr 22, 2023 at 2:47 PM Zhouyi Zhou  wrote:
> >> > by debugging, I see the r10 is assigned with r13 on c0226eb4,
> >> > but if there is a context-switch before c0226edc, a false
> >> > positive will be reported.

> I've never understood why the compiler wants to make a copy of a
> register variable into another register!? >:#

It is usually because a) you told it to (maybe via an earlyclobber), or
b) it looked cheaper.  I don't see either here :-(

> > here I think that the compiler is using r10 as an alias to r13, since
> > for userspace program, it's safe to assume the TLS pointer doesn't
> > change. However this is not true for kernel percpu pointer.

r13 is a "fixed" register, but that means it has a fixed purpose (so not
available for allocation), it does not mean "unchanging".

> > The real intention here is to compare 40(r1) vs 3192(r13) for stack
> > guard checking, however since r13 is the percpu pointer in kernel, so
> > the value of r13 can be changed if the thread gets scheduled to a
> > different CPU after reading r13 for r10.
> 
> Yeah that's not good.

The GCC pattern here makes the four machine insns all stay together.
That is to make sure to not leak any secret value, which is impossible
to guarantee otherwise.

What tells GCC r13 can randomly change behind its back?  And, what then
makes GCC ignore that fact?

> > +   asm volatile("" : : : "r13", "memory");

Any asm without output is always volatile.

> > Needless to say, the correct fix is to make ppc stack protector aware of
> > r13 is volatile.
> 
> I suspect the compiler developers will tell us to go jump :)

Why would r13 change over the course of *this* function / this macro,
why can this not happen anywhere else?

> The problem of the compiler caching r13 has come up in the past, but I
> only remember it being "a worry" rather than causing an actual bug.

In most cases the compiler is smart enough to use r13 directly, instead
of copying it to another reg and then using that one.  But not here for
some strange reason.  That of course is a very minor generated machine
code quality bug and nothing more :-(

> We've had the DEBUG_PREEMPT checks in get_paca(), which have given us at
> least some comfort that if the compiler is caching r13, it shouldn't be
> doing it in preemptable regions.
> 
> But obviously that doesn't help at all with the stack protector check.
> 
> I don't see an easy fix.
> 
> Adding "volatile" to the definition of local_paca seems to reduce but
> not elimate the caching of r13, and the GCC docs explicitly say *not* to
> use volatile. It also triggers lots of warnings about volatile being
> discarded.

The point here is to say some code clobbers r13, not the asm volatile?

> Or something simple I haven't thought of? :)

At what points can r13 change?  Only when some particular functions are
called?


Segher


Re: [PATCH] powerpc/64: Always build with 128-bit long double

2023-04-05 Thread Segher Boessenkool
Hi!

On Wed, Apr 05, 2023 at 03:32:21PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote:
> >> The amdgpu driver builds some of its code with hard-float enabled,
> >> whereas the rest of the kernel is built with soft-float.
> >> 
> >> When building with 64-bit long double, if soft-float and hard-float
> >> objects are linked together, the build fails due to incompatible ABI
> >> tags.
> >
> >> Currently those build errors are avoided because the amdgpu driver is
> >> gated on 128-bit long double being enabled. But that's not a detail the
> >> amdgpu driver should need to be aware of, and if another driver starts
> >> using hard-float the same problem would occur.
> >
> > Well.  The kernel driver either has no business using long double (or
> > any other floating point even) at all, or it should know exactly what is
> > used: double precision, double-double, or quadruple precision.  Both of
> > the latter two are 128 bits.
> 
> In a perfect world ... :)

Well, without it knowing what exactly it calculates, does this code have
any business running in kernel space?  Is it acceptable to just do
random things in the kernel?  I don't know the kernel code that uses
long double at all (and I'm afraid to look for fear of going blind), but
all this sounds like the 64-bit IEEE double precision floating point is
not good enough for some certain calculation, but 80-bit extended double
precision as used on x86 is.  That does make it likely that both of our
128-bit formats would work, but there are lots and lots of "buts".  To
start with, what does that code require wrt fp contraction (so, floating
multiply-add)?

All of this suggests that there should not be floating point code here
*at all*, it is harder to use it in any acceptable way than to just do
things in fixed point or scaled integer or whatever.

> >> All versions of the 64-bit ABI specify that long-double is 128-bits.
> >> However some compilers, notably the kernel.org ones, are built to use
> >> 64-bit long double by default.
> >
> > Mea culpa, I suppose?  But buildall doesn't force 64 bit explicitly.
> > I wonder how this happened?  Is it maybe a problem in the powerpc64le
> > config in GCC itself?
> 
> Not blaming anyone, just one of those things that happens.

Oh I didn't say anyone is blaming me.  I want to fix the problem, that
is all :-)

> The
> toolchains the distros (Ubuntu/Fedora) build all seem to use 128, but
> possibly that's because someone told them to configure them that way at
> some point.

No, or yes, depending on how you look at it?  Default configurations all
have 128-bit long double.  But buildall uses (almost) the same
configuration on all targets, namely:

$GCC_SRC/configure \
--target=$TARGET --enable-targets=all --prefix=$PREFIX \
--enable-languages=c --without-headers --disable-bootstrap \
--disable-nls --disable-threads --disable-shared \
--disable-libmudflap --disable-libssp --disable-libgomp \
--disable-decimal-float --disable-libquadmath \
--disable-libatomic --disable-libcc1 --disable-libmpx

All of this is perfectly reasonable imnsho, but I guess the
--enable-targets=all causes the problem here?  That makes no sense, but
it is still my best guess.

> > I have a patch from summer last year (Arnd's
> > toolchains are built without it) that does
> > +   powerpc64le-*)  TARGET_GCC_CONF=--with-long-double-128
> > Unfortunately I don't remember why I did that, and I never investigated
> > what the deeper problem is :-/
> 
> Last summer (aka winter)

Oh right.  Last July :-)

> is when we first discovered this issue with the
> long double size being implicated.
> 
> See:
>   https://git.kernel.org/torvalds/c/c653c591789b3acfa4bf6ae45d5af4f330e50a91
> 
> So I guess that's what prompted your patch?

It was one day before my patch, maybe less than 12h even, so that could
be.  I don't update the kernel source automatically though (there are
50 to 100 build breaks every year, when things are in decent state I
tend to keep it for a while).  But it may have been our patches are due
to the same cause, and mine is no longer needed?  That would be nice.  I
never committed that patch (or there would be more context, sigh).

I'll dig, there is a real problem in the compiler it seems.  Thanks for
the help so far!


Segher


Re: [PATCH] powerpc/64: Always build with 128-bit long double

2023-04-04 Thread Segher Boessenkool
Hi!

On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote:
> The amdgpu driver builds some of its code with hard-float enabled,
> whereas the rest of the kernel is built with soft-float.
> 
> When building with 64-bit long double, if soft-float and hard-float
> objects are linked together, the build fails due to incompatible ABI
> tags.

> Currently those build errors are avoided because the amdgpu driver is
> gated on 128-bit long double being enabled. But that's not a detail the
> amdgpu driver should need to be aware of, and if another driver starts
> using hard-float the same problem would occur.

Well.  The kernel driver either has no business using long double (or
any other floating point even) at all, or it should know exactly what is
used: double precision, double-double, or quadruple precision.  Both of
the latter two are 128 bits.

> All versions of the 64-bit ABI specify that long-double is 128-bits.
> However some compilers, notably the kernel.org ones, are built to use
> 64-bit long double by default.

Mea culpa, I suppose?  But builddall doesn't force 64 bit explicitly.
I wonder how this happened?  Is it maybe a problem in the powerpc64le
config in GCC itself?  I have a patch from summer last year (Arnd's
toolchains are built without it) that does
+   powerpc64le-*)  TARGET_GCC_CONF=--with-long-double-128
Unfortunately I don't remember why I did that, and I never investigated
what the deeper problem is :-/

In either case, the kernel should always use specific types, not rely on
the toolchain to pick a type that may or may not work.  The correct size
floating point type alone is not enough, but it is a step in the right
direction certainly.

Reviewed-by: Segher Boessenkool 


Segher


Re: arch/powerpc/kernel/head_85xx.o: warning: objtool: .head.text+0x1a6c: unannotated intra-function call

2023-01-26 Thread Segher Boessenkool
Hi!

On Wed, Jan 25, 2023 at 12:57:35PM +0530, Naveen N. Rao wrote:
> Sathvika Vasireddy wrote:
> >--- a/arch/powerpc/kvm/booke.c
> >+++ b/arch/powerpc/kvm/booke.c
> >@@ -917,7 +917,9 @@ static void kvmppc_fill_pt_regs(struct pt_regs *regs)
> >     asm("mr %0, 1" : "=r"(r1));
> >     asm("mflr %0" : "=r"(lr));
> >     asm("mfmsr %0" : "=r"(msr));
> >+   asm(".pushsection .discard.intra_function_calls; .long 999f; 
> >.popsection; 999:");
> >     asm("bl 1f; 1: mflr %0" : "=r"(ip));
> 
> I don't think you can assume that there won't be anything in between two 
> asm statements.

It would be a false assumption.  There is nothing that stops the
compiler from moving, duplicating, or even removing these statements
(removing only if no outputs from the asm are required of course).

> Does it work if you combine both the above asm 
> statements into a single one?
> 
> Even if that works, I don't think it is good to expand the macro here.  
> That asm statement looks to be trying to grab the current nip. I don't 
> know enough about that code, and someone who knows more about KVM may be 
> able to help, but it looks like we should be able to simply set 'ip' to 
> the address of kvmppc_fill_pt_regs()?

Such things are much better as actual assembler code (like, a .s file).
You have to be certain the compiler doesn't transform this in unexpected
ways, like, copy and move it to all callers for example.  You need the
mfmsr to remain somewhat in place for example.

A big reason to not want inline asm for things like this is you need so
very many operands in a single asm that way; it becomes very hard to
write, esp. if you want it to be correct code as well.  That is a good
hint there are better way to do this ;-)


Segher


Re: [PATCH v4 02/24] powerpc/pseries: Fix alignment of PLPKS structures and buffers

2023-01-26 Thread Segher Boessenkool
On Thu, Jan 26, 2023 at 12:09:53AM +1100, Michael Ellerman wrote:
> Andrew Donnellan  writes:
> > A number of structures and buffers passed to PKS hcalls have alignment
> > requirements, which could on occasion cause problems:
> >
> > - Authorisation structures must be 16-byte aligned and must not cross a
> >   page boundary
> >
> > - Label structures must not cross page boundaries
> >
> > - Password output buffers must not cross page boundaries
> >
> > Round up the allocations of these structures/buffers to the next power of
> > 2 to make sure this happens.
> 
> It's not the *next* power of 2, it's the *nearest* power of 2, including
> the initial value if it's already a power of 2.

It's not the nearest either, the nearest power of two to 65 is 64.  You
could say "but, round up" to which I would say "round?"  :-P

"Adjust the allocation size to be the smallest power of two greater than
or equal to the given size."

"Pad to a power of two" in shorthand.  "Padded to a power of two if
necessary" if you want to emphasise it can be a no-op.


Segher


Re: [PATCH v2 07/14] powerpc/vdso: Improve linker flags

2023-01-24 Thread Segher Boessenkool
On Tue, Jan 24, 2023 at 09:14:52AM -0700, Nathan Chancellor wrote:
> On Mon, Jan 23, 2023 at 09:07:16AM -0600, Segher Boessenkool wrote:
> > And here it is even more obviously fine.  If you need obfuscation like
> > in your patch, it is better not to do this imo.
> 
> I do not think this patch really obfuscates anything? The filtering is
> pretty clear to me.

And not having such filtering is more obvious and more clear.

It doesn't matter much for just this patch of course, but it will make
the code significantly harder to read (and deal with in other ways) if
this continues.

> If this is a real objection to the patch, I suppose we could just
> localize '-Qunused-arguments' to this Makefile and be done with it but I
> do not think this change is a bad solution to the problem either.

It is a comment about the direction this patch is moving us in.  I don't
think it is a good idea at all to try to avoid all warnings, and even
more so it is a bad idea to make objectively worse source code just to
appease a trigger-happy and questionable warning.

As I said, you can often avoid warnings by writing better code, like
part of the patch did.  That is a good reaction to warnings.  Making
worse code to avoid warnings is not a good idea normally.

Just don't use -Werror by default, and don't make other people suffer
its yoke!


Segher


Re: [PATCH v2 07/14] powerpc/vdso: Improve linker flags

2023-01-23 Thread Segher Boessenkool
Hi!

On Wed, Jan 11, 2023 at 08:05:04PM -0700, Nathan Chancellor wrote:
> When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, there
> are several warnings in the PowerPC vDSO:
> 
>   clang-16: error: -Wl,-soname=linux-vdso32.so.1: 'linker' input unused 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: -Wl,--hash-style=both: 'linker' input unused 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: argument unused during compilation: '-shared' 
> [-Werror,-Wunused-command-line-argument]
> 
>   clang-16: error: argument unused during compilation: '-nostdinc' 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: argument unused during compilation: '-Wa,-maltivec' 
> [-Werror,-Wunused-command-line-argument]

There is nothing wrong with the warnings, but as usual, -Werror is very
counterproductive.

> The first group of warnings point out that linker flags were being added
> to all invocations of $(CC), even though they will only be used during
> the final vDSO link. Move those flags to ldflags-y.

Which is explicitly allowed, and won't do anything, so nothing harmful
either.  It is not a bad idea to avoid this if that is trivial to do,
of course.

> The second group of warnings are compiler or assembler flags that will
> be unused during linking. Filter them out from KBUILD_CFLAGS so that
> they are not used during linking.

And here it is even more obviously fine.  If you need obfuscation like
in your patch, it is better not to do this imo.

The warning text "linker input unused" is misleading btw.  It would be
good to warn about that, if it was true: very likely the user didn't
intend what he wrote.  But a linker input is an object file, or perhaps
a linker script.  These things are just compiler flags.


Segher


Re: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1

2023-01-20 Thread Segher Boessenkool
On Thu, Jan 19, 2023 at 09:31:21PM -0600, Rob Landley wrote:
> On 1/19/23 16:11, Michael.Karcher wrote:
> > I don't see a clear bug at this point. We are talking about the C expression
> > 
> >    __same_type((void*)0, (void*)0)? 0 : sizeof((void*)0)/sizeof(*((void*0))

(__same_type is a kernel macro, it expands to something with
__builtin_compatible_type()).

> *(void*) is type "void" which does not have a size.

It has size 1, in GCC, so that you can do arithmetic on pointers to
void.  This is a long-standing and very widely used GCC extension.

"""
6.24 Arithmetic on 'void'- and Function-Pointers


In GNU C, addition and subtraction operations are supported on pointers
to 'void' and on pointers to functions.  This is done by treating the
size of a 'void' or of a function as 1.

 A consequence of this is that 'sizeof' is also allowed on 'void' and on
function types, and returns 1.

 The option '-Wpointer-arith' requests a warning if these extensions are
used.
"""

> The problem is gcc "optimizing out" an earlier type check, the same way it
> "optimizes out" checks for signed integer math overflowing, or "optimizes 
> out" a
> comparison to pointers from two different local variables from different
> function calls trying to calculate the amount of stack used, or "optimizes 
> out"

Are you saying something in the kernel code here is invalid code?
Because your other examples are.

> using char *x = (char *)1; as a flag value and then doing "if (!(x-1)) because
> it can "never happen"...

Like here.  And no, this is not allowed by -fno-strict-aliasing.

> > I suggest to file a bug against gcc complaining about a "spurious 
> > warning", and using "-Werror -Wno-error-sizeof-pointer-div" until gcc is 
> > adapted to not emit the warning about the pointer division if the result 
> > is not used.

Yeah.  If the first operand of a conditional operator is non-zero, the
second operand is not evaluated, and if the first is zero, the third
operand is not evaluated.  It is better if we do not warn about
something we do not evaluate.  In cases like here where it is clear at
compile time which branch is taken, that shouldn't be too hard.

Can someone please file a GCC PR?  With reduced testcase preferably.


Segher


Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS

2023-01-10 Thread Segher Boessenkool
On Mon, Jan 09, 2023 at 05:51:23PM -0700, Nathan Chancellor wrote:
> So for this patch, I have
> 
>   When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
>   warns:
> 
> clang-16: error: argument unused during compilation: '-s' 
> [-Werror,-Wunused-command-line-argument]
> 
>   The compiler's '-s' flag is a linking option (it is passed along to the
>   linker directly), which means it does nothing when the linker is not
>   invoked by the compiler. The kernel builds all .o files with either '-c'
>   or '-S', which do not run the linker, so '-s' can be safely dropped from
>   ASFLAGS.
> 
> as a new commit message. Is that sufficient for everyone? If so, I'll
> adjust the s390 commit to match, as it is the same exact problem.

Almost?  -S doesn't write .o files, it writes a .s file.  To go from an
assembler file (.s, or .S if you want to run the C preprocessor on non-C
code for some strange reason, the assembler macro facilities are vastly
superior) to an object file is just -c as well.

> Alternatively, if '-s' should actually remain around, we could move it
> to ldflags-y, which is added in patch 7. However, I assume that nobody
> has noticed that it has not been doing its job for a while, so it should
> be safe to remove.

+1


Segher


Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS

2023-01-09 Thread Segher Boessenkool
On Tue, Jan 10, 2023 at 01:22:38AM +0100, Andreas Schwab wrote:
> On Jan 09 2023, Segher Boessenkool wrote:
> 
> > It is required by POSIX (for the c99 command, anyway).  It *also* is
> > required to be supported when producing object files (so when no linking
> > is done).
> >
> > It is a GCC flag, and documented just fine:
> > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s
> 
> Most assembler flags are unrelated to the flags passed to the compiler
> driver, and -s is no exception.  POSIX has nothing to say about the
> sub-commands of the compiler anyway.

But this is not an assembler flag!

quiet_cmd_vdso32as = VDSO32A $@
  cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) $(AS32FLAGS) -c -o $@ $<

where

ifdef CROSS32_COMPILE
VDSOCC := $(CROSS32_COMPILE)gcc
else
VDSOCC := $(CC)
endif


The name of the make variable AS32FLAGS is a bit misleading.  This
variable does not hold arguments to as, it holds arguments to the
compiler driver, "gcc".


Segher


Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS

2023-01-09 Thread Segher Boessenkool
On Mon, Jan 09, 2023 at 03:37:53PM -0700, Nathan Chancellor wrote:
> On Mon, Jan 09, 2023 at 04:23:37PM -0600, Segher Boessenkool wrote:
> > Hi!  Happy new year all.
> > 
> > On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote:
> > > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor  
> > > wrote:
> > > >
> > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
> > > > warns that ASFLAGS contains '-s', which is a linking phase option, so it
> > > > is unused.
> > > >
> > > >   clang-16: error: argument unused during compilation: '-s' 
> > > > [-Werror,-Wunused-command-line-argument]
> > > >
> > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris
> > > > and it is ignored for the powerpc target so just drop the flag
> > > > altogether, as it is not needed.
> > > 
> > > Do you have any more info where you found this?  I don't see -s
> > > documented as an assembler flag.
> > > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html
> > > https://sourceware.org/binutils/docs/as/Invoking.html
> > 
> > It is required by POSIX (for the c99 command, anyway).  It *also* is
> > required to be supported when producing object files (so when no linking
> > is done).
> > 
> > It is a GCC flag, and documented just fine:
> > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s
> > 
> > (Yes, that says it is for linking; but the option is allowed without
> > error of any kind always).
> > 
> > (ASFLAGS sounds like it is for assembler commands, but it really is
> > for compiler commands that just happen to get .S input files).
> > 
> > > The patch seems fine to me, but what was this ever supposed to be?
> > > FWICT it predates git history (looking at
> > > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63)
> > 
> > Yeah, good question.  This compiler flag does the moral equivalent of
> > strip -s (aka --strip-all).  Maybe this was needed at some point, or
> > the symbol or debug info was just annoying (during bringup or similar)?
> > 
> > > Reviewed-by: Nick Desaulniers 
> > Reviewed-by: Segher Boessenkool 
> 
> Thank you for the review and extra explanation! I had kind of expanded
> on this in the s390 version of this patch [1], I will go ahead and do
> the same thing for the powerpc version too.
> 
> [1]: 
> https://lore.kernel.org/llvm/20221228-drop-qunused-arguments-v1-9-658cbc8fc...@kernel.org/

The underwhelming GCC source code for this is
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=d629ca5e424ad3120be13e82b9f38b7bf8f1cdf2;hb=HEAD#l1148
which says that the -s flag is passed through unmodified to the linker
when invoking the linker.  See #l603 for the %{s} specs syntax.

This is not a flag to the assembler at all.  It is a flag to the
compiler, which passes it on to the linker :-)

> > (ASFLAGS sounds like it is for assembler commands, but it really is
> > for compiler commands that just happen to get .S input files).


Segher


Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS

2023-01-09 Thread Segher Boessenkool
Hi!  Happy new year all.

On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote:
> On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor  wrote:
> >
> > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
> > warns that ASFLAGS contains '-s', which is a linking phase option, so it
> > is unused.
> >
> >   clang-16: error: argument unused during compilation: '-s' 
> > [-Werror,-Wunused-command-line-argument]
> >
> > Looking at the GAS sources, '-s' is only useful when targeting Solaris
> > and it is ignored for the powerpc target so just drop the flag
> > altogether, as it is not needed.
> 
> Do you have any more info where you found this?  I don't see -s
> documented as an assembler flag.
> https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html
> https://sourceware.org/binutils/docs/as/Invoking.html

It is required by POSIX (for the c99 command, anyway).  It *also* is
required to be supported when producing object files (so when no linking
is done).

It is a GCC flag, and documented just fine:
https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s

(Yes, that says it is for linking; but the option is allowed without
error of any kind always).

(ASFLAGS sounds like it is for assembler commands, but it really is
for compiler commands that just happen to get .S input files).

> The patch seems fine to me, but what was this ever supposed to be?
> FWICT it predates git history (looking at
> arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63)

Yeah, good question.  This compiler flag does the moral equivalent of
strip -s (aka --strip-all).  Maybe this was needed at some point, or
the symbol or debug info was just annoying (during bringup or similar)?

> Reviewed-by: Nick Desaulniers 
Reviewed-by: Segher Boessenkool 


Segher


Re: [PATCH v1] powerpc/64: Set default CPU in Kconfig

2022-12-16 Thread Segher Boessenkool
On Fri, Dec 16, 2022 at 11:23:59PM +0100, Pali Rohár wrote:
> > It appears the E500MC64 never made it outside of FSL, so it is best not
> > to use it at all, imo.
> 
> Yes, it really makes sense to not use e500mc64 flag. Maybe gcc
> documentation could be updated to mention this fact?

Thanks.  I filed  so that we don't forget
about this.


Segher


Re: [PATCH v1] powerpc/64: Set default CPU in Kconfig

2022-12-16 Thread Segher Boessenkool
Hi!

On Thu, Dec 15, 2022 at 09:42:02PM +0100, Pali Rohár wrote:
> On Wednesday 07 December 2022 14:38:40 Christophe Leroy wrote:
> > default "power8" if POWER8_CPU
> > default "power9" if POWER9_CPU
> > default "power10" if POWER10_CPU
> > +   default "e500mc64" if E5500_CPU
> 
> Now I'm looking at this change again... and should not E5500_CPU rather
> enforce -mcpu=e5500 flag? I know that your patch moves e500mc64 flag
> from the Makefile to Kconfig, but maybe it could be changed in some
> other followup patch...
> 
> Anyway, do you know what is e500mc64 core? I was trying to find some
> information about it, but it looks like some unreleased freescale core
> which predates e5500 core.

It looks that way yes.  It was submitted at

and committed as .  It looks as if
it was based on the e500mc core, while e5500 is a new core (or
significantly different anyway).

> ISA (without extensions like altivec) seems
> to be same for e500mc64, e5500 and e6500 cores and difference is only
> pipeline definitions in gcc config files. So if my understanding is
> correct then kernel binary compiled with any of these -mcpu= flag should
> work on any of those cores. Just for mismatches core binary will not be
> optimized for speed.

It appears the E500MC64 never made it outside of FSL, so it is best not
to use it at all, imo.


Segher


Re: [PATCH v2] powerpc: Pass correct CPU reference to assembler

2022-12-16 Thread Segher Boessenkool
On Fri, Dec 16, 2022 at 05:57:46PM +, Christophe Leroy wrote:
> Le 16/12/2022 à 18:18, Segher Boessenkool a écrit :
> > On Fri, Dec 16, 2022 at 09:35:50AM +0100, Christophe Leroy wrote:
> >> Today we have CONFIG_TARGET_CPU which provides the identification of the
> >> expected CPU, it is used for GCC. Use it as well for the assembler.
> > 
> > Why do you use -Wa, at all for this?  The compiler should already pass
> > proper options always!
> 
> That's historical I guess. Comes from commit 14cf11af6cf6 ("powerpc: 
> Merge enough to start building in arch/powerpc.")

Ah.  The patch moves stuff around, I thought more of it is new than it
really is.  Sorry.

It would be good to get rid of all such things that do no good and can
easily cause problems, of course, but that does not belong to this patch
of course.

> >> +cpu-as-$(CONFIG_PPC_BOOK3S_64)+= $(call as-option,-Wa$(comma)-many)
> > 
> > What is this for?  Using -many is a huge step back, it hides many
> > problems :-(
> 
> The only thing I did is removed the -Wa,-mpower4 from the line, leaving 
> the remaining part. Initialy it was:
> 
> cpu-as-$(CONFIG_PPC_BOOK3S_64) += $(call as-option,-Wa$(comma)-mpower4) 
> $(call as-option,-Wa$(comma)-many)
> 
> It was added in 2018 by commit 960e30029863 ("powerpc/Makefile: Fix 
> PPC_BOOK3S_64 ASFLAGS"). There is a long explanation it the commit.
> 
> Should we remove it ?

The commit says it is a workaround for clang problems, so it needs
testing there.  It also needs testing everywhere else, because as I said
it hides a lot of problems, so removing it will make a lot of sloppy
code that has crept in since 2018 scream bloody murder :-(


Segher


Re: [PATCH v2] powerpc: Pass correct CPU reference to assembler

2022-12-16 Thread Segher Boessenkool
Hi!

On Fri, Dec 16, 2022 at 09:35:50AM +0100, Christophe Leroy wrote:
> The problem comes from the fact that CONFIG_PPC_E500MC is selected for
> both the e500mc (32 bits) and the e5500 (64 bits), and therefore the
> following makefile rule is wrong:
> 
>   cpu-as-$(CONFIG_PPC_E500MC)+= $(call as-option,-Wa$(comma)-me500mc)

Yes.

> Today we have CONFIG_TARGET_CPU which provides the identification of the
> expected CPU, it is used for GCC. Use it as well for the assembler.

Why do you use -Wa, at all for this?  The compiler should already pass
proper options always!

> +cpu-as-$(CONFIG_PPC_BOOK3S_64)   += $(call as-option,-Wa$(comma)-many)

What is this for?  Using -many is a huge step back, it hides many
problems :-(


Segher


Re: [PATCH] powerpc/64: Implement arch_within_stack_frames

2022-12-15 Thread Segher Boessenkool
On Thu, Dec 15, 2022 at 10:29:38AM -0600, Segher Boessenkool wrote:
> The kernel does not do any of the more
> problematic cases I think, but no promises (dynamic stack alignment,
> recursive functions

Before people get scared: I meant *nested* functions.  With a static
chain and all that :-)


Segher


Re: [PATCH] powerpc/64: Implement arch_within_stack_frames

2022-12-15 Thread Segher Boessenkool
On Thu, Dec 15, 2022 at 10:52:35AM +1000, Nicholas Piggin wrote:
> On Thu Dec 15, 2022 at 10:17 AM AEST, Segher Boessenkool wrote:
> > On Wed, Dec 14, 2022 at 09:39:25PM +1000, Nicholas Piggin wrote:
> > > What about just copying x86's implementation including using
> > > __builtin_frame_address(1/2)? Are those builtins reliable for all
> > > our targets and compiler versions?
> >
> > __builtin_frame_address(n) with n > 0 is specifically not supported, not
> > on any architecture; it does not work in all situations on *any* arch,
> > and not in *any* situation on some archs.
> 
> No, but the particular case of powerpc where we have frame pointers and
> calling from a function where we know we have at least n + 1 frames on
> the stack, it would be okay, right? The code is not really different
> than using r1 directly and dereferencing that.

We do not typically have frame pointers; those make quite a bit slower
code.  But we do not *need* frame pointers for most purposes, perhaps
that is what you were getting at?

In simple cases r1 is the address of the current frame, and accessing
the machine word there gets you the previous frame, etc. (until you hit
a zero, there is no parent frame above that).  This is the *machine*
frame, not the C frame.  Often they are the same.  But there are
complications.  As long as you only use it for debug purposes, do not
use it for program logic, and are sure to check any pointer for nil,
things will work fine.  The kernel does not do any of the more
problematic cases I think, but no promises (dynamic stack alignment,
recursive functions, variable size stack allocations, PIC code (the
model of it used for shared libraries that is, not position-independent
code in general, as PowerPC mostly is no matter what), split stack, ...)

So at least do not follow a stack chain past the terminator, and do not
expect there to be exactly one frame per C function (there can be zero,
or more than one).  There also is the complication that there is no
sure-fire way to detect if a new frame has been set up for a function
(yet), of course.


Segher


Re: [PATCH] powerpc/64: Implement arch_within_stack_frames

2022-12-14 Thread Segher Boessenkool
On Wed, Dec 14, 2022 at 09:39:25PM +1000, Nicholas Piggin wrote:
> What about just copying x86's implementation including using
> __builtin_frame_address(1/2)? Are those builtins reliable for all
> our targets and compiler versions?

__builtin_frame_address(n) with n > 0 is specifically not supported, not
on any architecture; it does not work in all situations on *any* arch,
and not in *any* situation on some archs.

This is clearly documented:

 Calling this function with a nonzero argument can have
 unpredictable effects, including crashing the calling program.  As
 a result, calls that are considered unsafe are diagnosed when the
 '-Wframe-address' option is in effect.  Such calls should only be
 made in debugging situations.

(and that warning is enabled by -Wall).


Segher


Re: Mass-building defconfigs: many fail with assembler errors

2022-12-14 Thread Segher Boessenkool
Hi!

On Wed, Dec 14, 2022 at 07:36:32PM +0100, Jan-Benedict Glaw wrote:
> So we have these remaining build issues:
> 
> linux-powerpc-cell_defconfig   bad asm 
> (arch/powerpc/boot/pseries-head.S)
> linux-powerpc-mvme5100_defconfig   bad asm 
> (arch/powerpc/kernel/epapr_hcalls.S)
> linux-powerpc-asp8347_defconfigbad asm (arch/powerpc/kernel/pmc.c)
> linux-powerpc-ppc6xx_defconfig bad asm (arch/powerpc/kernel/pmc.c)
> linux-powerpc-ppc64e_defconfig bad asm 
> (arch/powerpc/kernel/vdso/gettimeofday.S)
> linux-powerpc-corenet64_smp_defconfig  bad asm 
> (arch/powerpc/kernel/vdso/gettimeofday.S)
> 
> I do *not* have CROSS32_COMPILE=... set for my builds. Maybe that
> could cure at least the issues within the ./boot and ./kernel/vdso
> directories?

I never set that, -m32 does the trick, every powerpc compiler is
biarch :-)

> Let's try that...  But I guess that won't help for the
> other two remaining files (arch/powerpc/kernel/{epapr_hcalls.S,pmc.c).

Not likely no.  Can you show the error of those again?

> linux-powerpc-pseries_defconfig sstep (out of array bounds)
> linux-powerpc-powernv_defconfig sstep
> linux-powerpc-ppc64_defconfig   sstep
> linux-powerpc-pseries_le_defconfig  sstep
> linux-powerpc-ppc64le_defconfig sstep
> linux-powerpc-ppc64le_guest_defconfig   sstep
> linux-powerpc-ppc64_guest_defconfig sstep
> linux-powerpc-powernv_be_defconfig  sstep
> 
> My first guess on these is that it's a wrong warning. The union's
> `u8 b[2 * sizeof(double)]` seems to be large enough.

A false positive, yes.  Which is *not* wrong.  What is wrong is using
-Werror in any unknown environment.  I have a stack of patches I use for
all my kernel builds, and half of those are eradicating harmful -Werror
instances.

> linux-powerpc-akebono_defconfig ahci (BUILD_BUG_ON failed: sizeof(_s) 
> > sizeof(long))
> linux-powerpc-xes_mpc85xx_defconfig ahci
> linux-powerpc-ge_imp3a_defconfigahci
> linux-powerpc-mpc85xx_defconfig ahci
> linux-powerpc-mpc85xx_smp_defconfig ahci
> linux-powerpc-corenet32_smp_defconfig   ahci
> linux-powerpc-mpc86xx_defconfig ahci
> linux-powerpc-mpc86xx_smp_defconfig ahci
> 
> I've seen the AHCI issue on other (non-powerpc) builds as well,
> haven't looked into this so I won't guess about whether this is a real
> bug or a compiler issue.

It is a real bug afaics.


Segher


Re: [PATCH] [BACKPORT FOR 4.14] libtraceevent: Fix build with binutils 2.35

2022-12-14 Thread Segher Boessenkool
On Wed, Dec 14, 2022 at 03:33:24PM +, Christophe Leroy wrote:
> Le 13/12/2022 à 21:25, Segher Boessenkool a écrit :
> > On Tue, Dec 13, 2022 at 07:03:07PM +0100, Christophe Leroy wrote:
> >> In binutils 2.35, 'nm -D' changed to show symbol versions along with
> >> symbol names, with the usual @@ separator.
> > 
> > 2.37 instead?  And --without-symbol-versions is there to restore the
> > old behaviour.  The script is parsing the output already so this patch
> > is simpler maybe, but :-)
> 
> Do you mean that the original commit from Ben should have done it 
> differently ?

Probably.

> My patch is only a backport of original commit 39efdd94e314 
> ("libtraceevent: Fix build with binutils 2.35") due to Makefile being at 
> a different place in 4.14.

.  It's not such a great idea to spread misinformation further,
imo.  Maybe the mailing list archives will help dampen that already now.

Thanks,


Segher


Re: [PATCH] [BACKPORT FOR 4.14] libtraceevent: Fix build with binutils 2.35

2022-12-13 Thread Segher Boessenkool
On Tue, Dec 13, 2022 at 07:03:07PM +0100, Christophe Leroy wrote:
> In binutils 2.35, 'nm -D' changed to show symbol versions along with
> symbol names, with the usual @@ separator.

2.37 instead?  And --without-symbol-versions is there to restore the
old behaviour.  The script is parsing the output already so this patch
is simpler maybe, but :-)


Segher


Re: Mass-building defconfigs: many fail with assembler errors

2022-12-13 Thread Segher Boessenkool
On Tue, Dec 13, 2022 at 09:41:59AM +0100, Jan-Benedict Glaw wrote:
> On Tue, 2022-12-13 14:49:20 +1100, Michael Ellerman  
> wrote:
> > So your script should exclude all files that end in ".config".
> 
> Thanks!  Will just drop those.
> 
> > To find the names of the generated configs you can use something like:
> > 
> >  $ awk '/PHONY \+= .*config/ {print $3}' arch/powerpc/Makefile
> 
> ...and integrate these instead. Thanks a lot!

You can also pretend you are a simple user and use the targets
"make help" and "make help-boards" suggest :-)


Segher


Re: Mass-building defconfigs: many fail with assembler errors

2022-12-12 Thread Segher Boessenkool
Hi!

On Mon, Dec 12, 2022 at 10:51:17PM +0100, Jan-Benedict Glaw wrote:
> Is anybody else routinely building current Binutils + GCC, to try to
> build all the Linux defconfigs?

I do regularly build kernels for powerpc-linux, powerpc64-linux,
powerpc64le-linux; ppc6xx_defconfig and ppc64_defconfig and
ppc64le_defconfig.  Totally boring, but it frequently does not build.
Not as frequently as for the other Linux targets I build (32 total at
the moment).

> For PPC, a good number of those fail,
> and I probably don't understand PPC well enough to propose patches. Or
> did I pick wrongly targeted toolchains? Most of the time, my suspicion
> is that we're not giving the correct -m flags in
> ./arch/powerpc/boot/?  (My setup for doing test builds is fairly automated, I
> can easily throw in patches for testing.)

Many of those use a 32-bit toolchain with a 64-bit kernel, or they
require some e500 specific config but not getting it (or the other way
around).

> 64-bit.config

>   ==> Why "-m32 -mcpu=powerpc"? Binutils/GCC are for 
> --target=powerpc64-linux

Something in your config is forcing that.

> 85xx-64bit.config
> powerpc64-linux-gcc 
> -Wp,-MMD,arch/powerpc/kernel/vdso/.gettimeofday-64.o.d -nostdinc 
> -I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include 
> -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
> -I./include/uapi -I./include/generated/uapi -include 
> ./include/linux/compiler-version.h -include ./include/linux/kconfig.h 
> -D__KERNEL__ -I ./arch/powerpc -DHAVE_AS_ATHIGH=1 -fmacro-prefix-map=./= 
> -D__ASSEMBLY__ -fno-PIE -m64 -Wl,-a64 -mabi=elfv1 -Wa,-me500 -Wa,-me500mc 
> -mabi=elfv1 -mbig-endian-Wl,-soname=linux-vdso64.so.1 -D__VDSO64__ -s -c 
> -o arch/powerpc/kernel/vdso/gettimeofday-64.o 
> arch/powerpc/kernel/vdso/gettimeofday.S

e500mc is a 32-bit core.  This cannot fly.

> 85xx-hw.config
> powerpc-linux-gcc -Wp,-MMD,arch/powerpc/kernel/.epapr_hcalls.o.d 
> -nostdinc -I./arch/powerpc/include -I./arch/powerpc/include/generated  
> -I./include -I./arch/powerpc/include/uapi 
> -I./arch/powerpc/include/generated/uapi -I./include/uapi 
> -I./include/generated/uapi -include ./include/linux/compiler-version.h 
> -include ./include/linux/kconfig.h -D__KERNEL__ -I ./arch/powerpc 
> -fmacro-prefix-map=./= -D__ASSEMBLY__ -fno-PIE -m32 -Wl,-a32 -mcpu=powerpc 
> -mbig-endian-c -o arch/powerpc/kernel/epapr_hcalls.o 
> arch/powerpc/kernel/epapr_hcalls.S 
>   arch/powerpc/kernel/epapr_hcalls.S: Assembler messages:
>   arch/powerpc/kernel/epapr_hcalls.S:24: Error: unrecognized opcode: 
> `wrteei'

wrteei is a BookE instruction (not a PowerPC instruction), and something
specifically asked for just PowerPC.

> powernv_defconfig

>   cc1: all warnings being treated as errors

Self-inflicted wound.  The warnings may reveal more, or they might show
something that GCC isn't as good at as you may want; all warnings have
false positives, that is why they are warnings and not errors.

>   Compiler ICEs (during GIMPLE pass: ccp) in align.c:
> 
> powerpc-linux-gcc -Wp,-MMD,arch/powerpc/kernel/.align.o.d -nostdinc 
> -I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include 
> -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
> -I./include/uapi -I./include/generated/uapi -include 
> ./include/linux/compiler-version.h -include ./include/linux/kconfig.h 
> -include ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc 
> -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs 
> -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE 
> -Werror=implicit-function-declaration -Werror=implicit-int 
> -Werror=return-type -Wno-format-security -std=gnu11 -mbig-endian -m32 
> -msoft-float -pipe -ffixed-r2 -mmultiple -mno-readonly-in-sdata -mcpu=440 
> -mno-prefixed -mno-pcrel -mno-altivec -mno-vsx -mno-mma 
> -fno-asynchronous-unwind-tables -mno-string -Wa,-m440 -mbig-endian 
> -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 
> -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation 
> -Wno-format-overflow -Wno-address-of-packed-member -O2 
> -fno-allow-store-data-races -Wframe-larger-than=1024 -fstack-protector-strong 
> -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable 
> -Wno-dangling-pointer -fomit-frame-pointer -ftrivial-auto-var-init=zero 
> -fno-stack-clash-protection -Wdeclaration-after-statement -Wvla 
> -Wno-pointer-sign -Wcast-function-type -Wno-stringop-truncation 
> -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized 
> -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -fno-strict-overflow 
> -fno-stack-check -fconserve-stack -Werror=date-time 
> -Werror=incompatible-pointer-types -Werror=designated-init 
> -Wno-packed-not-aligned -g -mstack-protector-guard-offset=1080 -Werror
> -DKBUILD_MODFILE='"arch/powerpc/kernel/align"' -DKBUILD_BASENAME='"align"' 
> -DKBUILD_MODNAME='"align"' 

Re: [PATCH v2 1/4] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig

2022-11-07 Thread Segher Boessenkool
Hi!

On Mon, Nov 07, 2022 at 02:31:59PM +1100, Rohan McLure wrote:
> Add Kconfig option for enabling clearing of registers on arrival in an
> interrupt handler. This reduces the speculation influence of registers
> on kernel internals.

Assuming you are talking about existing PowerPC CPUs from the last 30
years:

There is no data speculation.  At all.  Ever.

There is branch prediction, but that is not influenced by register
contents, either (for any current CPUs at least).  (Except when you get
a flush because of a mispredict, but if this zeroing changes anything,
we will have used wild (but user controlled) values in the old
non-zeroing situation, and that is a much bigger problem itself already,
also for security!  This can be an unlikely kernel bug, or a very
unlikely compiler bug.)

All GPRs are renamed, always.  If you zero all GPRs on interrupt entry
(which is context synchronising, importantly), this will guarantee there
can be no timing influence from the GPRs, because all of the physical
registers depend on nothing that happened before.  So that is good, at
least it can give some peace of mind.  Except that this makes 30 new
registers in just a few cycles, which *itself* can cause stalls, if the
renaming things are still busy.  Context synchronising does not
necessarily help there, the renaming machinery can do stuff *after* an
insn completes.

I don't see how this helps anything.  If it does, "reduces speculation
influence" is not a good description of what it does, afaics?


Segher


Re: [RFC PATCH 01/19] powerpc/perf: callchain validate kernel stack pointer bounds

2022-11-04 Thread Segher Boessenkool
On Mon, Oct 31, 2022 at 03:54:22PM +1000, Nicholas Piggin wrote:
> Could the user set r1 to be equal to the address matching the first
> interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page
> due to the kernel redzone, and induce the kernel to load the marker from
> there? Possibly it could cause a crash at least.

Yes, the user can set r1 to anything, it is just a general purpose
register.  This isn't a valid thing to do of course, the ABI requires
r1 to point at a valid stack at all times, but it is an obvious attack
point if we do not harden against this :-)


Segher


Re: Issues with the first PowerPC updates for the kernel 6.1

2022-10-29 Thread Segher Boessenkool
On Mon, Oct 17, 2022 at 09:53:04AM +0200, Christian Zigotzky wrote:
> > On 17. Oct 2022, at 02:43, Michael Ellerman  wrote:
> > Previously BIG_ENDIAN && GENERIC_CPU would use -mcpu=power5, now it uses
> > -mcpu=power4.
> 
> Maybe this is the issue. We will wait and not release the RC1 for testing 
> because it is a risk for our testers to test these new kernels because of 
> this issue.
> 
> It is really important do not to rewrite code, that is well worked before.
> Bugfixing and adding some new features is ok but rewriting of good code is 
> expensive and doesn’t make any sense.

It was just a bugfix, and a (partial) revert.

471d7ff8b51b says it removed ISA 2.00 support (original power4, "GP").
Support for ISA 2.01 was retained it says.  That is power4+, "GQ", but
also 970 (Apple G5).  That patch actually switched to ISA 2.02 though,
unintendedly, and code generated for ISA 2.02 will not run on systems
like 970, in principle.  It is just one uncommon instruction that is
problematical, namely popcntb, because the kernel does not use floating
point at all, so that is why we got away with it for so long (most code
that does use fp will fall flat on its face in no time).  It still is a
bug fix though!

PA6T is ISA 2.04, it's not clear how this (bugfix, and revert!) change
made code not run on PA6T anymore.  Smells a lot like something indirect
(or triply indirect), a separate bug, something that was introduced in
the last two years maybe, but I'll even bet it is something *exposed* in
that time, a bug that has been here for longer!


Segher


Re: Issues with the first PowerPC updates for the kernel 6.1

2022-10-16 Thread Segher Boessenkool
On Fri, Oct 14, 2022 at 06:11:21PM +0200, Christian Zigotzky wrote:
> make oldconfig has asked because of the CPU family. I choosed GENERIC for my 
> P.A. Semi PWRficient PA6T-1682M. Is this correct? Maybe this is the problem.
> 
> config GENERIC_CPU
> - bool "Generic (POWER4 and above)"
> + bool "Generic (POWER5 and PowerPC 970 and above)"
>   depends on PPC_BOOK3S_64 && !CPU_LITTLE_ENDIAN
>   select PPC_64S_HASH_MMU
> 
> There isn’t a POWER4 anymore and I used it via CONFIG_GENERIC_CPU=y before.

PA6T is ISA 2.04, just like POWER5+.  It should be fine.


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-07 Thread Segher Boessenkool
On Fri, Oct 07, 2022 at 10:03:38AM +1000, Nicholas Piggin wrote:
> On Fri Oct 7, 2022 at 9:23 AM AEST, Segher Boessenkool wrote:
> > > For pcrel addressing? Bootstrapping the C environment is one, the
> > > module dynamic linker is another.
> >
> > I don't know what either of those mean.
> 
> arch/powerpc/kernel/head_64.S and arch/powerpc/kernel/module_64.c
> 
> Can discuss in the pcrel patch series thread if you would like to know
> more.

So "bootstrapping the C environment" is meant to mean "initialising it",
like *rt*.o ("C runtime") does normally?

And "module dynamic linker" is "module loader" here?

Yes, those things probably need some attention for pcrel, but
"bootstrapping" and "dynamic" had me scratch my head: there is nothing
that pulls itself up by its bootstraps (like, the initialisation itself
would be done in C code), and module loading is much closer to static
loading than to dynamic loading :-)

> > Just say in a comment where you disable stuff that it is to prevent
> > possible problems, this is a WIP, that kind of thing?  Otherwise other
> > people (like me :-) ) will read it and think there must be some deeper
> > reason.  Like, changing code to work with pcrel is hard or a lot of
> > work -- it isn't :-)  As you say in 0/7 yourself btw!
> 
> I will describe limitations and issues a bit more in changelog of patches
> to enable prefix and pcrel when I submit as non-RFC candidate. It would
> probably not be too hard to get things to a workable state that could be
> merged.

Looking forward to it!

> > VMX and VSX are disabled here because the compiler *will* use those
> > registers if it feels like it (that is, if it thinks that will be
> > faster).  MMA is a very different beast: the compiler can never know if
> > it will be faster, to start with.
> 
> True, but now I don't have to find the exact clause and have my lawyer
> confirm that it definitely probably won't change in future and break
> things.

Your lawyer won't be able to tell you, but I can.  And I did already.


The reason I care about these things is that very often people look at
what the kernel does as a "best practices" example.  And then copy this
stuff as some cargo cult incantations :-/


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-07 Thread Segher Boessenkool
On Fri, Oct 07, 2022 at 04:31:28PM +1100, Michael Ellerman wrote:
> "Nicholas Piggin"  writes:
> > On Fri Oct 7, 2022 at 9:23 AM AEST, Segher Boessenkool wrote:
> >> On Fri, Oct 07, 2022 at 07:56:09AM +1000, Nicholas Piggin wrote:
> >> > On Fri Oct 7, 2022 at 5:54 AM AEST, Segher Boessenkool wrote:
> >> > > On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> ...
> >> > > > +# No AltiVec or VSX or MMA instructions when building kernel
> >> > > >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
> >> > > >  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> >> > > > +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
> >> > >
> >> > > MMA code is never generated unless the code asks for it explicitly.
> >> > > This is fundamental, not just an implementations side effect.
> >> > 
> >> > Well, now it double won't be generated :)
> >>
> >> Yeah, but there are many other things you can unnecessarily disable as
> >> well!  :-)
> >>
> >> VMX and VSX are disabled here because the compiler *will* use those
> >> registers if it feels like it (that is, if it thinks that will be
> >> faster).  MMA is a very different beast: the compiler can never know if
> >> it will be faster, to start with.
> >
> > True, but now I don't have to find the exact clause and have my lawyer
> > confirm that it definitely probably won't change in future and break
> > things.
> 
> Right. If someone asks "does the kernel ever use MMA instructions?" we
> can just point at that line and we have a definite answer. No need to
> audit the behaviour of all GCC and Clang versions ever released.

As I said, no sane compiler can use MMA ever (unless asked for it
directly of course).  But yeah, who knows what clang does!


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
On Fri, Oct 07, 2022 at 07:56:09AM +1000, Nicholas Piggin wrote:
> On Fri Oct 7, 2022 at 5:54 AM AEST, Segher Boessenkool wrote:
> > On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> > > This adds basic POWER10_CPU option, which builds with -mcpu=power10.
> >
> > > +# No prefix or pcrel
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
> >
> > Why do you disable all prefixed insns?  What goes wrong if you don't?
> 
> Potentially things like kprobes.

So mention that?  "This patch is due to an abundance of caution".

What I meant to ask is if you *saw* something going wrong, not if you
can imagine something going wrong.  I can imagine ten gazillion things
going wrong, that is not why I asked :-)

> > Same question for pcrel.  I'm sure you want to optimise it better, but
> > it's not clear to me how it fails now?
> 
> For pcrel addressing? Bootstrapping the C environment is one, the
> module dynamic linker is another.

I don't know what either of those mean.

> Some details in this series.
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-September/248521.html

I've watched that series with great interest, but I don't remember
anything like that?  Are you refering to the commentary in 7/7?
"Definitely ftrace and probes, possibly BPF and KVM have some breakage.
I haven't looked closely yet."...  This doesn't mean much does it :-)
It can be a triviality or two.  Or it could be a massive roadblock.

Just say in a comment where you disable stuff that it is to prevent
possible problems, this is a WIP, that kind of thing?  Otherwise other
people (like me :-) ) will read it and think there must be some deeper
reason.  Like, changing code to work with pcrel is hard or a lot of
work -- it isn't :-)  As you say in 0/7 yourself btw!

> > > +# No AltiVec or VSX or MMA instructions when building kernel
> > >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
> > >  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
> >
> > MMA code is never generated unless the code asks for it explicitly.
> > This is fundamental, not just an implementations side effect.
> 
> Well, now it double won't be generated :)

Yeah, but there are many other things you can unnecessarily disable as
well!  :-)

VMX and VSX are disabled here because the compiler *will* use those
registers if it feels like it (that is, if it thinks that will be
faster).  MMA is a very different beast: the compiler can never know if
it will be faster, to start with.


Segher


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 08:50:31PM +, Christophe Leroy wrote:
> Le 06/10/2022 à 22:45, Segher Boessenkool a écrit :
> > I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
> > or something like that :-)  And, on 64 bit, which is what the question
> > was about!
> 
> Ah, does the size really matters here ? I was thinking more in terms of 
> performance when I made the comment.

Other than some unused code there should not be much performance impact
at all from enabling modules when not needed, on 64 bit.  Security (in
depth) is a very different kettle of fish of course ;-)


Segher


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 06:38:00PM +, Christophe Leroy wrote:
> Le 06/10/2022 à 20:22, Segher Boessenkool a écrit :
> > Long ago I built kernels that fit together with the boot firmware and a
> > root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
> > close to that at all these days :-)
> 
> 4MB, not easy. But 8M still achievable. Well our smaller board has 32M, 
> we have thousands of it spread all over Europe and have to keep it up to 
> date 

The smallest of these systems had 256MB RAM.  This 4MB is flash ROM :-)

> > What is the overhead if you enable modules but do not use them, these
> > days?
> 
> On the 8xx it is mainly the instruction TLB miss handler:

I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
or something like that :-)  And, on 64 bit, which is what the question
was about!


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
Hi!

On Thu, Oct 06, 2022 at 06:07:32PM +, Christophe Leroy wrote:
> Le 23/09/2022 à 08:23, Nicholas Piggin a écrit :
> > I would rather complete prefixed support in the kernel and use pcrel
> > addressing. Actually even if we don't compile with pcrel or prefixed,
> > there are some instructions and we will probably get more that require
> > prefixed, possible we might want to use them in kernel. Some of it is
> > required to handle user mode instructions too. So I think removing
> > it is premature, but I guess it's up for debate.
> 
> Well ok, in fact I only had code_patching in mind.
> 
> Code patching is only for kernel text. Today code patching is used for 
> things like kprobe, ftrace, etc  which really do not seems to be 
> prepared for prefixed instructions.
> 
> If you are adding -mno-prefixed, it is worth keeping that code which 
> sometimes gives us some headacke ?

-mpcrel requires -mprefixed.  Using PC relative addressing will be a
significant performance benefit.

> Of course if there are plans to get real prefixed instruction in kernel 
> code anytime soon, lets live with it, in that case the support should 
> get completed. But otherwise I think it would be better to get rid of it 
> for now, and implement it completely when we need it in years.

The future is unstoppable, certainly the near future is :-)

> When I see the following, I'm having hard time believing it would work 
> with prefixed instructions in the kernel text:
> 
>   typedef u32 kprobe_opcode_t;
> 
>   struct kprobe {
>   ...
>   /* Saved opcode (which has been replaced with breakpoint) */
>   kprobe_opcode_t opcode;
> 
> 
>   void arch_disarm_kprobe(struct kprobe *p)
>   {
>   WARN_ON_ONCE(patch_instruction(p->addr, ppc_inst(p->opcode)));
>   }

Why would it not work?


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
Hi!

On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> This adds basic POWER10_CPU option, which builds with -mcpu=power10.

> +# No prefix or pcrel
> +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)

Why do you disable all prefixed insns?  What goes wrong if you don't?

Same question for pcrel.  I'm sure you want to optimise it better, but
it's not clear to me how it fails now?

Please say in the comment what is wrong, don't spread fear :-)

> +# No AltiVec or VSX or MMA instructions when building kernel
>  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
>  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> +KBUILD_CFLAGS += $(call cc-option,-mno-mma)

MMA code is never generated unless the code asks for it explicitly.
This is fundamental, not just an implementations side effect.


Segher


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 11:39:50AM +1100, Michael Ellerman wrote:
> Christophe Leroy  writes:
> > However, thinking out loudly, I'm wondering, could we make things any 
> > simpler when CONFIG_MODULES is not selected, or is that a too much 
> > corner case on PPC64 ?
> 
> I'd say it's mostly a corner case.
> 
> Obviously no distros ship with modules disabled. 
> 
> AFAIK even the stripped down kernels we use in CPU bringup have modules
> enabled.
> 
> So I think it's probably not worth worrying about, unless there's an
> obvious and fairly simple optimisation.

Long ago I built kernels that fit together with the boot firmware and a
root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
close to that at all these days :-)

What is the overhead if you enable modules but do not use them, these
days?


Segher


Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

2022-09-24 Thread Segher Boessenkool
On Fri, Sep 23, 2022 at 05:15:43PM -0500, Segher Boessenkool wrote:
> On Sat, Sep 24, 2022 at 02:26:52AM +1000, Nicholas Piggin wrote:
> > I still don't see what clauses guarantees asm("%0" ::"r"(foo)) to give
> > 13. It doesn't say access via inline assembly is special,
> 
> But it is.  It is for all register variables, local and global.  I agree
> this isn't documented clearly.  For local register variables this is the
> *only* thing guaranteed; for global register vars there is more (it
> changes the ABI, there are safe/restore effects, that kind of thing).

I filed <https://gcc.gnu.org/PR107027> to improve the docs.  Thanks for
bringing this to our attention!


Segher


Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

2022-09-24 Thread Segher Boessenkool
On Sat, Sep 24, 2022 at 02:00:55PM +1000, Nicholas Piggin wrote:
> On Sat Sep 24, 2022 at 8:15 AM AEST, Segher Boessenkool wrote:
> > Never it is guaranteed that all accesses through this variable will use
> > the register directly: this fundamentally cannot work on all archs, and
> > also not at -O0.  More in general it doesn't work if some basic
> > optimisations are not done, be it because of a compiler deficiency, or a
> > straight out bug, or maybe it is a conscious choice in some cases.
> 
> Right, and we know better than to rely on a spec that is not 100% air
> tight with no possibility of lawyering. This may be what the intention is,
> it may be what gcc and clang do now, and everybody involved today agrees
> with that interpretation. We still have to maintain the kernel tomorrow
> though, so explicit r13 it must be.

It has *always* been this way.  Very old GCC (say, GCC < 3.x) tried to
guarantee more, even, but that turned out to be untenable.  But this is
all in the distant past.

I have no idea if clang implements the GCC C extensions correctly.  If
they don't it is just another compiler bug and they'll just have to fix
it.

The rules *are* airtight.  But this does not mean you can assume random
other stuff, adjacent or not :-P

> > (Please use "n" instead of "i".  Doesn't matter here, but it does in
> > many other places.)
> 
> What is the difference? Just "i" allows assmebly-time constants?

"n" means "number": constant integers.  "i" means "immediate": any
constant.  The address of a global variable is "i" but not "n" (in most
ABIs, no -fPIC and such) for example.

> How about "I"? that looks like it was made for it. Gives much better
> errors.

For PowerPC, "I" is a signed 16-bit number.  "K" is unsigned 16-bit,
and there are more as well.  Just like for "n" you'll have to make
sure the number you feed in will work in the assembler, and you'll get
the same error message (but, as you say, for "I" in some cases the
compiler will give errors already).  It's otherwise only useful if you
use e.g. "IL" as constraint, and then write "addi%e2 %0,%1,%2" for
example, so the asm can generate "addis" insns.  Such things aren't very
often useful in internal asm.  The main reason any of this exists is
this is how GCC works internally; extended inline asm exposes a lot of
that.


Segher


Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

2022-09-23 Thread Segher Boessenkool
On Sat, Sep 24, 2022 at 02:26:52AM +1000, Nicholas Piggin wrote:
> I still don't see what clauses guarantees asm("%0" ::"r"(foo)) to give
> 13. It doesn't say access via inline assembly is special,

But it is.  It is for all register variables, local and global.  I agree
this isn't documented clearly.  For local register variables this is the
*only* thing guaranteed; for global register vars there is more (it
changes the ABI, there are safe/restore effects, that kind of thing).

Never it is guaranteed that all accesses through this variable will use
the register directly: this fundamentally cannot work on all archs, and
also not at -O0.  More in general it doesn't work if some basic
optimisations are not done, be it because of a compiler deficiency, or a
straight out bug, or maybe it is a conscious choice in some cases.

> I think if it was obviously guaranteed then this might be marginally
> better than explicit r13 in the asm
> 
>asm volatile(
>"stb %0,%2(%1)"
>:
>: "r" (mask),
>"r" (local_paca),
>  "i" (offsetof(struct paca_struct, irq_soft_mask))
>: "memory");

(Please use "n" instead of "i".  Doesn't matter here, but it does in
many other places.)


Segher


Re: [PATCH 1/2] powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5

2022-09-23 Thread Segher Boessenkool
On Fri, Sep 23, 2022 at 05:17:45PM +1000, Nicholas Piggin wrote:
> On Thu Sep 22, 2022 at 4:50 AM AEST, Segher Boessenkool wrote:
> > On Wed, Sep 21, 2022 at 11:41:02AM +1000, Nicholas Piggin wrote:
> > > Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5.
> > > POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added
> > > the popcntb instruction which a compiler might use.
> > > 
> > > Use -mcpu=power4.
> > > 
> > > Fixes: 471d7ff8b51b ("powerpc/64s: Remove POWER4 support")
> > > Signed-off-by: Nicholas Piggin 
> >
> > Reviewed-by: Segher Boessenkool 
> >
> > Thank you!
> >
> > Maybe superfluous, but some more context: GCC's -mcpu=power4 means
> > POWER4+, ISA 2.01, according to our documentation.  There is no
> > difference with ISA 2.00 (what plain POWER4 implements) for anything
> > GCC does.
> 
> Huh, okay. Well I guess we are past that point now, interesting that
> another ISA version was done for 4+ though, and then another for 5.
> I don't see a list of changes from 2.00 in the public version, I
> wonder what else changed other than mtmsrd.

I think searching for "POWER4+" will give you everything.  I cannot find
a public 2.00 either, yeah.  I listed everything I think changed
elsewhere in the thread.


Segher


Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

2022-09-23 Thread Segher Boessenkool
On Fri, Sep 23, 2022 at 05:08:13PM +1000, Nicholas Piggin wrote:
> On Tue Sep 20, 2022 at 4:41 PM AEST, Christophe Leroy wrote:
> > local_paca is declared as global register asm("r13"), it is therefore
> > garantied to always ever be r13.
> >
> > It is therefore not required to opencode r13 in the assembly, use
> > a reference to local_paca->irq_soft_mask instead.

> The code matches the changelog AFAIKS. But I don't know where it is
> guaranteed it will always be r13 in GCC and Clang. I still don't know
> where in the specification or documentation suggests this.

"Global Register Variables" in the GCC manual.

> There was some assertion it would always be r13, but that can't be a
> *general* rule. e.g., the following code:
> 
> struct foo {
> #ifdef BIGDISP
> int array[1024*1024];
> #endif
> char bar;
> };
> 
> register struct foo *foo asm("r13");
> 
> static void setval(char val)
> {
> asm("stb%X0 %1,%0" : "=m" (foo->bar) : "r" (val));
> }
> 
> int main(void)
> {
> setval(10);
> }

Just use r13 directly in the asm, if that is what you want!

> With -O0 this generates stb 9,0(10) for me for GCC 12, and with -O2
> -DBIGDISP it generates stb 10,0(9). So that makes me nervious. GCC
> does not have some kind of correctness guarantee here, so it must not
> have this in its regression tests etc., and who knows about clang.

GCC has all kinds of correctness guarantees, here and elsewhere, that is
90% of a compiler's job.  But you don't *tell* it what you consider
"correct" here.

You wrote "foo->bar", and this expression was translated to something
that derived from r13.  If you made the asm something like
asm("stb%X0 %1,0(%0)" : : "r" (foo), "r" (val) : "memory");
it would work fine.  It would also work fine if you wrote 13 in the
template directly.  These things follow the rules, so are guaranteed.

The most important pieces of doc here may be
   * Accesses to the variable may be optimized as usual and the register
 remains available for allocation and use in any computations,
 provided that observable values of the variable are not affected.
   * If the variable is referenced in inline assembly, the type of
 access must be provided to the compiler via constraints (*note
 Constraints::).  Accesses from basic asms are not supported.
but read the whole "Global Register Variables" chapter?

> If it is true for some particular subset of cases that we can guarantee,
> e.g., using -O2 and irq_soft_mask offset within range of stb offset and
> we can point to specification such that both GCC and Clang will follow
> it, then okay. Otherwise I still think it's more trouble than it is
> worth.

-O2 makes it much more likely some inline assembler things work as
intended, but it guarantees nothing.  Sorry.


Segher


Re: [PATCH 2/2] powerpc/64s: update cpu selection options

2022-09-21 Thread Segher Boessenkool
On Wed, Sep 21, 2022 at 11:41:03AM +1000, Nicholas Piggin wrote:
> Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
> make that clear in the option name. The POWER5_CPU option is dropped
> because it's uncommon, and GENERIC_CPU covers it.
> 
> -mtune= before power8 is dropped because the minimum gcc version
> supports power8, and tuning is made consistent between big and little
> endian.
> 
> A 970 option is added for PowerPC 970 / G5 because they still have a
> user base, and -mtune=power8 does not generate good code for the 970.
> 
> This also updates the ISA versions document to add Power4/Power4+
> because I didn't realise Power4+ used 2.01.
> 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Segher Boessenkool 

Cheers,


Segher


Re: [PATCH 1/2] powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5

2022-09-21 Thread Segher Boessenkool
On Wed, Sep 21, 2022 at 11:41:02AM +1000, Nicholas Piggin wrote:
> Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5.
> POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added
> the popcntb instruction which a compiler might use.
> 
> Use -mcpu=power4.
> 
> Fixes: 471d7ff8b51b ("powerpc/64s: Remove POWER4 support")
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Segher Boessenkool 

Thank you!

Maybe superfluous, but some more context: GCC's -mcpu=power4 means
POWER4+, ISA 2.01, according to our documentation.  There is no
difference with ISA 2.00 (what plain POWER4 implements) for anything
GCC does.


Segher


Re: [RFC PATCH 5/7] powerpc/64s: update generic cpu option name and compiler flags

2022-09-21 Thread Segher Boessenkool
Hi!

On Wed, Sep 21, 2022 at 11:01:18AM +1000, Nicholas Piggin wrote:
> On Wed Sep 21, 2022 at 8:16 AM AEST, Segher Boessenkool wrote:
> > On Tue, Sep 20, 2022 at 12:01:47AM +1000, Nicholas Piggin wrote:
> > > Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
> > > make that clear in the option name.
> >
> > AFAIR the minimum now is POWER4+ (ISA 2.01), not POWER5 (ISA 2.02).
> 
> It's POWER5 now, because of commit 471d7ff8b5 ("powerpc/64s: Remove
> POWER4 support"), which is misguided about POWER4+ and also introduced
> the -mcpu=power5 bug on 970 builds :\

ISA 2.01 added just a few things (LPES[0], HDEC, some PM things, but
crucially also anything that sets MSR[PR] also sets MSR[EE] since then).

> Not sure it's worth adding POWER4+ support back but if someone has a
> POWER4+ or adds it to QEMU TCG, I will do the patch.

970 is 2.01 -- pretending it is 2.02 is a ticking time bomb: the popcntb
insn will be generated for popcount and parity intrinsics, which can be
generated by generic code!

> > > -mtune= before power8 is dropped because the minimum gcc version
> > > supports power8, and tuning is made consistent between big and little
> > > endian.
> >
> > Tuning for p8 on e.g. 970 gives quite bad results.  No idea if anyone
> > cares, but this is a serious regression if so.
> 
> It's for "generic" kernel so we set low minimum but higher tune,
> assuming that people would usually have newer, so it was already
> doing -mtune=power7.
> 
> We could make a specific 970/G5 entry though, since those still
> have users.

If that uses -mcpu=power4 (which means ISA 2.01 btw!) all is fine
already?  (Or -mcpu=970, same thing really, it just allows VMX as well).

Thanks for taking care of this Nick, much appreciated!


Segher


Re: [RFC PATCH 5/7] powerpc/64s: update generic cpu option name and compiler flags

2022-09-20 Thread Segher Boessenkool
Hi!

On Tue, Sep 20, 2022 at 12:01:47AM +1000, Nicholas Piggin wrote:
> Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
> make that clear in the option name.

AFAIR the minimum now is POWER4+ (ISA 2.01), not POWER5 (ISA 2.02).

> -mtune= before power8 is dropped because the minimum gcc version
> supports power8, and tuning is made consistent between big and little
> endian.

Tuning for p8 on e.g. 970 gives quite bad results.  No idea if anyone
cares, but this is a serious regression if so.

> Big endian drops -mcpu=power4 in favour of power5. Effectively the
> minimum compiler version means power5 was always being selected here,
> so this should not change anything. 970 / G5 code generation does not
> seem to have been a problem with -mcpu=power5, but it's possible we
> should go back to power4 to be really safe.

Yes, -mcpu=power5 code does *not* run on 970, if you are unlucky enough
that the compiler does something smart with popcntb (the sole non-float
insn new on p5, not counting hrfid).

> +# -mcpu=power5 should generate 970 compatible kernel code

It doesn't.  Even if it did, it would need more explanation!


Segher


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-14 Thread Segher Boessenkool
On Wed, Sep 14, 2022 at 04:55:27PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 14, 2022 at 02:28:26PM +, Michael Matz wrote:
> > Don't mix DWARF debug info with DWARF-based unwinding info, the latter 
> > doesn't imply the former.  Out of interest: how does ORC get around the 
> > need for CFI annotations (or equivalents to restore registers) and what 
> 
> Objtool 'interprets' the stackops. So it follows the call-graph and is
> an interpreter for all instructions that modify the stack. Doing that it
> konws what the stackframe is at 'most' places.

To get correct backtraces on e.g. PowerPC you need to emulate many of
the integer insns.  That is why GCC enables -fasynchronous-unwind-tables
by default for us.

The same is true for s390, aarch64, and x86 (unless 32-bit w/ frame
pointer).

The problem is that you do not know how to access anything on the stack,
whether in the current frame or in a previous frame, from a random point
in the program.  GDB has many heuristics for this, and it still does not
get it right in all cases.

> > makes it fast?  I want faster unwinding for DWARF as well, when there's 
> > feature parity :-)  Maybe something can be learned for integration into 
> > dwarf-unwind.
> 
> I think we have some details here:
> 
>  https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html

It is faster because it does a whole lot less.  Is that still enough?
It's not clear (to me) what exact information it wants to provide :-(


Segher


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-14 Thread Segher Boessenkool
On Wed, Sep 14, 2022 at 11:21:00AM +0100, Josh Poimboeuf wrote:
> On Mon, Sep 12, 2022 at 06:31:14AM -0500, Segher Boessenkool wrote:
> > On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > > 2) Noreturn functions:
> > >
> > >There's no reliable way to determine which functions are designated
> > >by the compiler to be noreturn (either explictly via function
> > >attribute, or implicitly via a static function which is a wrapper
> > >around a noreturn function.)
> > 
> > Or just a function that does not return for any other reason.
> > 
> > The compiler makes no difference between functions that have the
> > attribute and functions that do not.  There are good reasons to not
> > have the attribute on functions that do in fact not return.  The
> > not-returningness of the function may be just an implementation
> > accident, something you do not want part of the API, so it *should* not
> > have that attribute; or you may want the callers to a function to not be
> > optimised according to this knowledge (you cannot *prevent* that, the
> > compiler can figure it out it other ways, but still) for any other
> > reason.
> 
> Yes, many static functions that are wrappers around noreturn functions
> have this "implicit noreturn" property.

I meant functions that are noreturn intrinsically.  The trivial example:

void f(void)
{
for (;;)
;
}

>  I agree we would need to know
> about those functions (or, as Michael suggested, their call sites) as
> well.

Many "potentially does not return" functions (there are very many such
functions!) turn into "never returns" functions, for some inputs (or
something in the environment).  If the compiler specialises a code path
that does not return, you'll not see that marked up any way.  Of course
such a path should not be taken in the kernel, normally :-)

> > >This information is needed because the
> > >code after the call to such a function is optimized out as
> > >unreachable and objtool has no way of knowing that.
> > 
> > Since June we (GCC) have -funreachable-traps.  This creates a trap insn
> > wherever control flow would otherwise go into limbo.
> 
> Ah, that's interesting, though I'm not sure if we'd be able to
> distinguish between "call doesn't return" traps and other traps or
> reasons for UD2.

The trap handler can see where the trap came from.  And then look up
that address in some tables or such.  Just like __bug_table?


Segher


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-12 Thread Segher Boessenkool
Hi!

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> 2) Noreturn functions:
>
>There's no reliable way to determine which functions are designated
>by the compiler to be noreturn (either explictly via function
>attribute, or implicitly via a static function which is a wrapper
>around a noreturn function.)

Or just a function that does not return for any other reason.

The compiler makes no difference between functions that have the
attribute and functions that do not.  There are good reasons to not
have the attribute on functions that do in fact not return.  The
not-returningness of the function may be just an implementation
accident, something you do not want part of the API, so it *should* not
have that attribute; or you may want the callers to a function to not be
optimised according to this knowledge (you cannot *prevent* that, the
compiler can figure it out it other ways, but still) for any other
reason.

>This information is needed because the
>code after the call to such a function is optimized out as
>unreachable and objtool has no way of knowing that.

Since June we (GCC) have -funreachable-traps.  This creates a trap insn
wherever control flow would otherwise go into limbo.


Segher


Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

2022-09-08 Thread Segher Boessenkool
Hi!

On Thu, Sep 08, 2022 at 05:07:24PM +0200, Arnd Bergmann wrote:
> - if the XOR code has its frame size explode like this, it's
>   probably an indication of the compiler doing something wrong,
>   not the kernel code.

On the contrary, it is most likely an indication that the kernel code
wants something unreasonable.  Like, having 20 variables live at the
same time, but still wanting nicely scheduled machine code generated.

But I suspect GCC unrolled the loops here, even?  Best way to prevent
that here is to put an option in the Makefile, for these files.  We
don't want any of this unrolled after all?  Or, alternatively, remove
all the manual unrolling from this code, let GCC do its thing, without
painting it in a corner.

>   The result is likely that the "optimized"
>   XOR implementation is slower than the default version as a
>   result, and the kernel will pick the other one at boot time.

Yes.  So it's self-healing even, of a sort :-)


Segher


Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

2022-09-08 Thread Segher Boessenkool
On Thu, Sep 08, 2022 at 06:00:24AM +, Christophe Leroy wrote:
> Looking at it more deeply, I see strange things.

I'll have to see full generated machine code to be able to see strange
things, there isn't enough information at all here yet.  Sorry.

Use private mail if it is too big or uninteresting for the list :-)

> What is that frame size ? I thought it was the number of bytes r1 is 
> decremented at the begining of the function, but it seems not, at least 
> on GCC. It seems GCC substrats 112 bytes while clang doesn't.

That is the vars size + the fixed size + the size of the parameter
save area + the size of the regs save area, rounded up to a multiple
of 16.  Fixed size is 8 on 32-bit PowerPC ELF.  Frame size used by GCC
here is just the vars size.

> So it seems that GCC and CLANG don't warn on the same thing, is that 
> expected ? GCC substrats 112 bytes, which is the minimum frame size on a 
> ppc64, but here I'm building a ppc32 kernel, min frame size is 16.

I need to see the generated code to make sense of what is happening
here.  It sounds like it is doing varargs calls or similar expensive
stack juggling.  Or just saving a boatload of registers on the stack.

> And CLANG is still using stack a lot more than GCC.

Good to hear!  Well, good for GCC, anyway ;-)


Segher


Re: [PATCH v2 16/16] objtool/powerpc: Add --mcount specific implementation

2022-09-05 Thread Segher Boessenkool
Hi!

On Mon, Sep 05, 2022 at 04:15:07PM +0530, Naveen N. Rao wrote:
> Segher Boessenkool wrote:
> >>> + if ((insn & 3) == 1) {
> >>> + *type = INSN_CALL;
> >>> + *immediate = insn & 0x3fc;
> >>> + if (*immediate & 0x200)
> >>> + *immediate -= 0x400;
> >>> + }
> >>> + break;
> >>> + }
> >
> >Does this handle AA=1 correctly at all?  That is valid both with and
> >without relocations, just like AA=0.  Same for AA=1 LK=0 btw.
> >
> >If you only handle AA=0, the code should explicitly test for that.
> 
> The code does test for AA=0 LK=1 with the if statement there?

Yes, but that is not what I said :-)

It may be fine to not *handle* AA=1 at all, but the code should at least
scream bloody murder when it encounters it anyway :-)


Segher


Re: [PATCH v2] powerpc: Fix irq_soft_mask_set() and irq_soft_mask_return() with sanitizer

2022-09-02 Thread Segher Boessenkool
On Fri, Sep 02, 2022 at 10:57:27AM -0500, Peter Bergner wrote:
> On 8/31/22 5:45 PM, Segher Boessenkool wrote:
> > Yes, this is guaranteed.
> 
> Agree with Segher here.  That said, there was a gcc bug a long time
> ago where gcc copied r13 into a temporary register and used it from there.

r13 is a fixed register on most of our ABIs (everything that is not AIX
or Darwin, even), so this can never happen.  Except if there are bugs,
of course ;-)

> That's ok (correctness wise, but not ideal) from user land standpoint,
> but we took a context switch after the reg copy and it was restarted on
> a different cpu, so differnt local_paca and r13 value.  We went boom
> because the copy wasn't pointing to the correct local_paca anymore.
> So it is very important the compiler always use r13 when accessing
> the local_paca.

Yes.  So we either whould use -ffixed-r13, or just not use unsupported
compilers.  powerpc*-linux and powerpc*-elf work fine for example :-)


Segher


Re: [PATCH v2 2/2] powerpc/math-emu: Remove -w build flag and fix warnings

2022-09-02 Thread Segher Boessenkool
Hi!

On Fri, Sep 02, 2022 at 09:11:48AM -0700, Nathan Chancellor wrote:
> On Fri, Sep 02, 2022 at 10:59:54AM -0500, Segher Boessenkool wrote:
> > Maybe add -Wno-implicit-fallthrough?  This code is a copy from outside
> > the kernel, no one has ever wanted to maintain it, if nothing else (the
> > more politically correct formulation is "we cannot as easily pick up
> > improvements from upstream if we modify stuff").
> 
> Sure, we could do something like this if you preferred:
> 
> diff --git a/arch/powerpc/math-emu/Makefile b/arch/powerpc/math-emu/Makefile
> index 26fef2e5672e..ed775747a2a5 100644
> --- a/arch/powerpc/math-emu/Makefile
> +++ b/arch/powerpc/math-emu/Makefile
> @@ -16,3 +16,7 @@ obj-$(CONFIG_SPE)   += math_efp.o
>  
>  CFLAGS_fabs.o = -fno-builtin-fabs
>  CFLAGS_math.o = -fno-builtin-fabs
> +
> +ifdef CONFIG_CC_IS_CLANG
> +ccflags-remove-y := $(CONFIG_CC_IMPLICIT_FALLTHROUGH)
> +endif

That is a GCC warning as well.  It needs some $(call cc-option ...)
thing then, though (GCC versions of more than two or so years ago are
supported as well).

> At the same time, I see other modifications to these files that appear
> to be for the kernel only so I suspect that this is already in the "we
> cannot as easily pick up improvements from upstream" category,
> regardless of that diff.

So maybe someone should really maintain this stuff, bring it up to some
reasonably modern state?  :-)


Segher


Re: [PATCH v2 2/2] powerpc/math-emu: Remove -w build flag and fix warnings

2022-09-02 Thread Segher Boessenkool
On Fri, Sep 02, 2022 at 08:37:23AM -0700, Nathan Chancellor wrote:
> On Fri, Sep 02, 2022 at 12:08:55PM +0200, Christophe Leroy wrote:
> > This should have been detected by gcc at build time, but due to
> > '-w' flag it went undetected.
> > 
> > Removing that flag leads to many warnings hence errors.

> Thanks for figuring out what was going on here! I took this patch for a
> spin with clang and it has a few more errors around
> -Wimplicit-fallthrough:

Maybe add -Wno-implicit-fallthrough?  This code is a copy from outside
the kernel, no one has ever wanted to maintain it, if nothing else (the
more politically correct formulation is "we cannot as easily pick up
improvements from upstream if we modify stuff").


Segher


Re: [PATCH] powerpc/vdso: link with -z noexecstack

2022-09-02 Thread Segher Boessenkool
Hi!

On Fri, Sep 02, 2022 at 09:57:09AM +0200, Christophe Leroy wrote:
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -128,7 +128,8 @@ extra-y   += vmlinux.lds
>  
>  obj-$(CONFIG_RELOCATABLE)+= reloc_$(BITS).o
>  
> -obj-$(CONFIG_PPC32)  += entry_32.o setup_32.o early_32.o 
> static_call.o
> +obj-$(CONFIG_PPC32)  += entry_32.o setup_32.o early_32.o
> +obj-$(CONFIG_HAVE_STATIC_CALL)   += static_call.o

Did you want to commit this part?  The commit message doesn't mention it.


Segher


Re: [PATCH] powerpc/math_emu/efp: Include module.h

2022-09-02 Thread Segher Boessenkool
On Fri, Sep 02, 2022 at 08:43:49AM +, Christophe Leroy wrote:
> Le 01/09/2022 à 21:47, Segher Boessenkool a écrit :
> > On Thu, Sep 01, 2022 at 05:41:33AM +, Christophe Leroy wrote:
> >> I think it would be worth a GCC bug report.
> > 
> > We need a stand-alone testcase for this.  When you have created one, at
> > least 98% of the time you discover the bug is in user code after all.
> > 
> > Which is a very good thing, it means the problem can be fixed simpler,
> > cheaper, and a lot faster :-)
> 
> Easy to reproduce with a .c file that has a single line:
> 
> non_existing_macro(xxx);

That was fast (and cheap and simple) :-)

> Apparently that's due to the -w option in arch/powerpc/math_emu/Makefile:
> 
>   ccflags-y = -w
> 
> Was introduced by commit d2b194ed8208 ("powerpc/math-emu: Use kernel 
> generic math-emu code")
> 
> If I understand correctly it means 'ignore all warnings'. Then it seems 
> CLANG doesn't honor that request.

'-w'
 Inhibit all warning messages.

GCC's initial commit has this already (1992).


Segher


Re: [PATCH] powerpc/math_emu/efp: Include module.h

2022-09-01 Thread Segher Boessenkool
On Thu, Sep 01, 2022 at 05:41:33AM +, Christophe Leroy wrote:
> I think it would be worth a GCC bug report.

We need a stand-alone testcase for this.  When you have created one, at
least 98% of the time you discover the bug is in user code after all.

Which is a very good thing, it means the problem can be fixed simpler,
cheaper, and a lot faster :-)


Segher


Re: [PATCH v2] powerpc: Fix irq_soft_mask_set() and irq_soft_mask_return() with sanitizer

2022-09-01 Thread Segher Boessenkool
On Thu, Sep 01, 2022 at 09:37:42AM +0200, Gabriel Paubert wrote:
> On Thu, Sep 01, 2022 at 05:22:32AM +, Christophe Leroy wrote:
> > Le 01/09/2022 à 00:45, Segher Boessenkool a écrit :
> > > I would have used real assembler code here (in a .s file).  But there
> > > probably are reasons to do things this way, performance probably?
> > 
> > We changed it to inline asm in order to ... inline it in the caller.
> 
> And there is a single caller it seems. Typically GCC inlines function
> with a single call site, but it may be confused by asm statements.

"Confused"...  It might estimate the assembler statement to be
significantly more expensive than it really is.  The compiler has to be
somewhat conservative, to be able to generate code that can be assembled
without errors.  It counts instructions by counting newlines and semis
and that kind of thing.  Since GCC 7 there is "asm inline", to make the
compiler think for inlining purposes that the asm outputs minimum size
code.  You can use asm_inline in the kernel for this.

> > I also find that those operand names make it awull more difficult to 
> > read that traditional numbering. I really dislike that new trend.

Yup.  All the extra markup it needs doesn't benefit readability either.

> > And same with those // comments, better use meaningfull C variable names.

I wrote:

> > > Comments like "// Inputs" are just harmful.  As is the "creative"
> > > indentation here.  Both harm readability and do not help understanding
> > > in any other way either.

This is a comment trying to explain (GNU) C syntax.  I certainly agree
that variable naming is very important, but this was meant as commentary
on a worse comment offence :-)


Segher


Re: [PATCH v2] powerpc: Fix irq_soft_mask_set() and irq_soft_mask_return() with sanitizer

2022-09-01 Thread Segher Boessenkool
On Thu, Sep 01, 2022 at 07:47:10AM +, Christophe Leroy wrote:
> Le 01/09/2022 à 09:37, Gabriel Paubert a écrit :
> > Agree, but there is one thing which escapes me: why is r3 listed in the
> > outputs section (actually as a read write operand with the "+"
> > constraint modifier) but is not used after the asm which is the last
> > statement of function returning void?
> > 
> > Do I miss something?
> 
> As far as I remember, that's to tell GCC that r3 register is modified by 
> the callee. As it is an input, it couldn't be listed in the clobber list.

Inputs can be clobbered just fine, in general.  But here the operand
is tied to a register variable, and that causes the error ("'asm'
specifier for variable 'r3' conflicts with 'asm' clobber list").

Marking it in/out here is more appropriate anyway :-)


Segher


Re: [PATCH v2 15/16] objtool/powerpc: Enable objtool to be built on ppc

2022-09-01 Thread Segher Boessenkool
On Thu, Sep 01, 2022 at 09:32:46AM +, Christophe Leroy wrote:
> Le 29/08/2022 à 07:52, Sathvika Vasireddy a écrit :
> > +int arch_decode_instruction(struct objtool_file *file, const struct 
> > section *sec,
> > +   unsigned long offset, unsigned int maxlen,
> > +   unsigned int *len, enum insn_type *type,
> > +   unsigned long *immediate,
> > +   struct list_head *ops_list)
> > +{
> > +   u32 insn;
> > +
> > +   *immediate = 0;
> > +   insn = bswap_if_needed(file->elf, *(u32 *)(sec->data->d_buf + offset));
> > +   *len = 4;
> 
> Should be *len = 8 when insn >> 26 == 1.

But please use ppc_inst_prefixed().  And just use ppc_inst_len() here?

Not having convenience abstraction functions like that will give you
much more work in the future -- currently all prefix insns use primary
opcode 1, sure, and nothing else does; but this can change.


Segher


Re: [PATCH v2] powerpc: Fix irq_soft_mask_set() and irq_soft_mask_return() with sanitizer

2022-08-31 Thread Segher Boessenkool
Hi!

On Tue, Aug 30, 2022 at 09:10:02AM +, Christophe Leroy wrote:
> Le 30/08/2022 à 11:01, Nicholas Piggin a écrit :
> > On Tue Aug 30, 2022 at 3:24 PM AEST, Christophe Leroy wrote:
> >>> This is still slightly concerning to me. Is there any guarantee that the
> >>> compiler would not use a different sequence for the address here?
> >>>
> >>> Maybe explicit r13 is required.
> >>>
> >>
> >> local_paca is defined as:
> >>
> >>register struct paca_struct *local_paca asm("r13");

And this is in global scope, making it a global register variable.

> >> Why would the compiler use another register ?
> > 
> > Hopefully it doesn't. Is it guaranteed that it won't?

Yes, this is guaranteed.

For a local register variable this is guaranteed only for operands to an
extended inline asm; any other access to the variable does not have to
put it in the specified register.

But this is a global register variable.  The only thing that would make
this crash and burn is if *any* translation unit did not see this
declaration: it could then use r13 (if that was allowed by the ABI in
effect, heh).

> > I'm sure Segher will be delighted with the creative asm in __do_IRQ
> > and call_do_irq :) *Grabs popcorn*

All that %% is blinding, yes.

Inline tabs are bad taste.

Operand names instead of numbers are great for obfuscation, and nothing
else -- unless you have more than four or five operands, in which case
you have bigger problems already.

Oh, this function is a good example of proper use of local register asm,
btw.

Comments like "// Inputs" are just harmful.  As is the "creative"
indentation here.  Both harm readability and do not help understanding
in any other way either.

The thing about inline asm is the smallest details change meaning of the
whole, it is a very harsh environment, you are programming both in C and
directly assembler code as well, and things have to be valid for both,
although on the other hand there is almost no error checking.  Keeping
it small, simple, readable is paramount.

The rules for using inline asm:

0) Do no use inline asm.
1) Use extended asm, unless you know all differences with basic asm, and
   you know you want that.  And if you answer "yes I do" to the latter,
   you answered wrong to the former.
2) Do not use toplevel asm.
3) Do no use inline asm.
4) Do no use inline asm.
5) Do no use inline asm.

Inline asm is a very powerful escape hatch.  Like all emergency exits,
you should not use them if you do not need them!  :-)

But, you are talking about the function calling and the frame change I
bet :-)  Both of these are only okay because everything is back as it
was when this (single!) asm is done, and the state created is perfectly
fine (this is very dependent on exact ABI used, etc.)

I would have used real assembler code here (in a .s file).  But there
probably are reasons to do things this way, performance probably?


Segher


Re: [PATCH v2 16/16] objtool/powerpc: Add --mcount specific implementation

2022-08-31 Thread Segher Boessenkool
On Wed, Aug 31, 2022 at 12:50:07PM +, Christophe Leroy wrote:
> Le 29/08/2022 à 07:52, Sathvika Vasireddy a écrit :
> > +   opcode = insn >> 26;
> > +
> > +   switch (opcode) {
> > +   case 18: /* bl */
> 
> case 18 is more than 'bl', it includes also 'b'.
> In both cases, the calculation of *immediate is the same.

It also is "ba" and "bla", sometimes written as "b[l][a]".

> It would therefore be more correct to perform the calculation and setup 
> of *immediate outside the 'if ((insn & 3) == 1)', that would avoid 
> unnecessary churn the day we add support for branches without link.
> 
> > +   if ((insn & 3) == 1) {
> > +   *type = INSN_CALL;
> > +   *immediate = insn & 0x3fc;
> > +   if (*immediate & 0x200)
> > +   *immediate -= 0x400;
> > +   }
> > +   break;
> > +   }

Does this handle AA=1 correctly at all?  That is valid both with and
without relocations, just like AA=0.  Same for AA=1 LK=0 btw.

If you only handle AA=0, the code should explicitly test for that.


Segher


Re: [PATCH] compiler-gcc.h: Remove ancient workaround for gcc PR 58670

2022-08-25 Thread Segher Boessenkool
Hi!

On Thu, Aug 25, 2022 at 04:00:52PM +0530, Naveen N. Rao wrote:
> This is causing a build issue on ppc64le with a new patch replacing use 
> of unreachable() with __builtin_unreachable() in __WARN_FLAGS():
> https://lore.kernel.org/linuxppc-dev/20220808114908.240813-2...@linux.ibm.com/

What is the compiler version?  If this is a GCC version that is still
supported, could you please open a PR?  

> during RTL pass: combine
> In file included from /linux/kernel/locking/rtmutex_api.c:9:
> /linux/kernel/locking/rtmutex.c: In function 
> '__rt_mutex_slowlock.constprop':
> /linux/kernel/locking/rtmutex.c:1612:1: internal compiler error: in 
> purge_dead_edges, at cfgrtl.c:3369
> 1612 | }
>  | ^
> 0x142817c internal_error(char const*, ...)
>   ???:0
> 0x5c8a1b fancy_abort(char const*, int, char const*)
>   ???:0
> 0x72017f purge_all_dead_edges()
>   ???:0
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.

(For some reason your compiler does not show compiler source code file
names or line numbers.)

So it is GCC 11...  is it 11.3?  If not, please try with that.

> So, it looks like gcc still has issues with certain uses of asm goto.

Could be.  Please attach preprocessed code (or reduced code that shows
the same problem if you have that / can make that).  Thanks!


Segher


Re: [PATCH v2 11/14] powerpc/64s: Clear/restore caller gprs in syscall interrupt/return

2022-08-11 Thread Segher Boessenkool
On Thu, Aug 11, 2022 at 03:39:58PM +, Christophe Leroy wrote:
> 
> 
> Le 11/08/2022 à 17:13, Segher Boessenkool a écrit :
> > Hi!
> > 
> > On Mon, Jul 25, 2022 at 04:31:11PM +1000, Rohan McLure wrote:
> >> +  /*
> >> +   * Zero user registers to prevent influencing speculative execution
> >> +   * state of kernel code.
> >> +   */
> >> +  NULLIFY_GPRS(5, 12)
> >> +  NULLIFY_NVGPRS()
> > 
> > "Nullify" means "invalidate", which is not what this does or is for :-(
> 
> Would "Zeroise" be more appropriate ?

That is probably a good compromise, yes.  It obviously is a verb, its
meaning is clear and unamiguous, and there is precedent for it even :-)

Thanks,


Segher


Re: [PATCH v2 11/14] powerpc/64s: Clear/restore caller gprs in syscall interrupt/return

2022-08-11 Thread Segher Boessenkool
Hi!

On Mon, Jul 25, 2022 at 04:31:11PM +1000, Rohan McLure wrote:
> + /*
> +  * Zero user registers to prevent influencing speculative execution
> +  * state of kernel code.
> +  */
> + NULLIFY_GPRS(5, 12)
> + NULLIFY_NVGPRS()

"Nullify" means "invalidate", which is not what this does or is for :-(


Segher


Re: [PATCHv3, resend] powerpc: mm: radix_tlb: rearrange the if-else block

2022-08-10 Thread Segher Boessenkool
Hi!

On Wed, Aug 10, 2022 at 01:43:18PM +0200, Anders Roxell wrote:
> Clang warns:
> 
> arch/powerpc/mm/book3s64/radix_tlb.c:1191:23: error: variable 'hstart' is 
> uninitialized when used here [-Werror,-Wuninitialized]
> __tlbiel_va_range(hstart, hend, pid,
>   ^~
> arch/powerpc/mm/book3s64/radix_tlb.c:1175:23: note: initialize the variable 
> 'hstart' to silence this warning

This note often is bad advice: hiding problems instead of investigating
and solving them.  Bah.

If silencing warnings is your goal, look no further than "-w" :-)

> Rework the 'if (IS_ENABLE(CONFIG_TRANSPARENT_HUGEPAGE))' so hstart/hend
> always gets initialized, this will silence the warnings. That will also
> simplify the 'else' path. Clang is getting confused with these warnings,
> but the warnings is a false-positive.

If it is, please report that bug to clang?  It says "*is* uninitialized
when used here", there can not be false positives to statements like
that.  If the analysis was heutistical it should say "may be" or such.


Segher


Re: [PATCH v1 1/3] powerpc: Fix eh field when calling lwarx on PPC32

2022-08-02 Thread Segher Boessenkool
Hi!

On Tue, Aug 02, 2022 at 11:02:36AM +0200, Christophe Leroy wrote:
> Commit 9401f4e46cf6 ("powerpc: Use lwarx/ldarx directly instead of
> PPC_LWARX/LDARX macros") properly handled the eh field of lwarx
> in asm/bitops.h but failed to clear it for PPC32 in
> asm/simple_spinlock.h
> 
> So, do as in arch_atomic_try_cmpxchg_lock(), set it to 1 if PPC64
> but set it to 0 if PPC32. For that use IS_ENABLED(CONFIG_PPC64) which
> returns 1 when CONFIG_PPC64 is set and 0 otherwise.
> 
> Reported-by: Pali Rohár 

Reviewed-by: Segher Boessenkool 

> + unsigned int eh = IS_ENABLED(CONFIG_PPC64);
>  
>   token = LOCK_TOKEN;
>   __asm__ __volatile__(
> -"1:  lwarx   %0,0,%2,1\n\
> +"1:  lwarx   %0,0,%2,%3\n\
>   cmpwi   0,%0,0\n\
>   bne-2f\n\
>   stwcx.  %1,0,%2\n\
> @@ -59,7 +60,7 @@ static inline unsigned long 
> __arch_spin_trylock(arch_spinlock_t *lock)
>   PPC_ACQUIRE_BARRIER
>  "2:"
>   : "=" (tmp)
> - : "r" (token), "r" (>slock)
> + : "r" (token), "r" (>slock), "i" (eh)
>   : "cr0", "memory");

That should work yes.  But please note that "n" is prefered if a number
is required (like here), not some other constant, as allowed by "i".

Thanks!


Segher


Re: Regression: Linux v5.15+ does not boot on Freescale P2020

2022-07-26 Thread Segher Boessenkool
On Tue, Jul 26, 2022 at 04:01:00PM +0200, Pali Rohár wrote:
> On Tuesday 26 July 2022 08:44:05 Segher Boessenkool wrote:
> > And the architecture says
> >   Programming Note
> >   Warning: On some processors that comply with versions of the
> >   architecture that precede Version 2.00
> 
> But e500v2 is 2.03 and not older than 2.00...

Yes.  And it does not implement reserved fields in instructions (*any*
reserved fields in instructions, apparently!) correctly at all.

> >   e500v1/v2 based chips will treat any reserved field being set in an
> >   opcode as illegal.
> > 
> > while the architecture says
> > 
> >   Reserved fields in instructions are ignored by the processor.
> > 
> > Whoops :-)  We need fixes for processor implementation bugs all the
> > time of course, but this is a massive *design* bug.
> 
> I looked also in e500v2 and P2020 errata documents there is nothing
> mentioned about eh flag. But it looks like a bug.

The bug is if it does this for any reserved field (and it apparently
does it for all even).

> > Also people using an SMP kernel on older cores should see the problem,
> > no?
> 
> Probably yes.
> 
> But most people on these machines are using stable LTS kernels and do
> not upgrade too often.

Yeah.

> So you need to wait longer time to see people starting reporting such
> bugs. Need to wait at least when v4.14 and v4.19 LTS versions stops
> receiving updates. v4.19 is used in Debian 10 (oldstable) and v5.4 is
> used by current OpenWRT. Both distributions are still supported, so
> users have not migrated to new v5.15 problematic kernel yet.

That's not a reasonable timeline for kernel development of course.

We see the same thing with the compiler...  Although GCC has a much
slower release cadence (one new major version every year), it often
takes two or three or more years before we get bug reports that
something was broken.  If stuff isn't tested, we cannot really support
it at all.


Segher


Re: Regression: Linux v5.15+ does not boot on Freescale P2020

2022-07-26 Thread Segher Boessenkool
On Tue, Jul 26, 2022 at 11:02:59AM +0200, Arnd Bergmann wrote:
> On Tue, Jul 26, 2022 at 10:34 AM Pali Rohár  wrote:
> > On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote:
> > > The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu
> > > implementations actually raise an illegal insn exception on EH=1.  It
> > > appears P2020 is one of those.
> >
> > P2020 has e500 cores. e500 cores uses ISA 2.03. So this may be reason.
> > But in official Freescale/NXP documentation for e500 is documented that
> > lwarx supports also eh=1. Maybe it is not really supported.
> > https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf (page 562)

(page 6-186)

> > At least there is NOTE:
> > Some older processors may treat EH=1 as an illegal instruction.

And the architecture says
  Programming Note
  Warning: On some processors that comply with versions of the
  architecture that precede Version 2.00, executing a Load And Reserve
  instruction in which EH = 1 will cause the illegal instruction error
  handler to be invoked.

> In commit d6ccb1f55ddf ("powerpc/85xx: Make sure lwarx hint isn't set on 
> ppc32")
> this was clarified to affect (all?) e500v1/v2,

  e500v1/v2 based chips will treat any reserved field being set in an
  opcode as illegal.

while the architecture says

  Reserved fields in instructions are ignored by the processor.

Whoops :-)  We need fixes for processor implementation bugs all the
time of course, but this is a massive *design* bug.  I'm surprised this
CPU still works as well as it does!

Even the venerable PEM (last updated in 1997) shows the EH field as
reserved, always treated as 0.

> this one apparently
> fixed it before,
> but Christophe's commit effectively reverted that change.
> 
> I think only the simple_spinlock.h file actually uses EH=1

That's right afaics.

> and this is not
> included in non-SMP kernels, so presumably the only affected machines were
> the rare dual-core e500v2 ones (p2020, MPC8572, bsc9132), which would
> explain why nobody noticed for the past 9 months.

Also people using an SMP kernel on older cores should see the problem,
no?  Or is that patched out?  Or does this use case never happen :-)


Segher


Re: [PATCH] drm/amdgpu: Re-enable DCN for 64-bit powerpc

2022-07-25 Thread Segher Boessenkool
Hi!

On Mon, Jul 25, 2022 at 03:40:40PM -0700, Guenter Roeck wrote:
> On 7/25/22 13:42, Segher Boessenkool wrote:
> >>What I'm wondering is if the compiler is getting confused between 
> >>standard and long doubles when they are both the same bit length...
> >
> >The compiler emits the same code (DFmode things, double precision float)
> >in both cases, and it itself does not see any difference anymore fairly
> >early in the pipeline.  Compare to int and long on most 32-bit targets,
> >both are SImode, the compiler will not see different types anymore:
> >there *are* no types, except in the compiler frontend.
> >
> >It only happens for powerpc64le things, and not for powerpc64 builds.
> >
> >It is probably a GCC problem.  I don't see what forces the GCC build
> >here to use 64-bit long double either btw?  Compilers build via buildall
> >have all kinds of unnecessary things disabled, but not that, not
> >directly at least.
> 
> From what little documentation I can find, there appears to be
> "--with-long-double-128" and "--with-long-double-format=ieee".
> That looks like something that would need to be enabled, not disabled.

Look in the GCC toplevel configure.ac for (some of!) the actual rules
(and some more in config.gcc, and there are more at different levels).

If your target is not *-linux* you likely end up with a 64-bit long
double by default, and if it is, you do.  But it depends on various
things configure can determine about your C library.  buildall uses
--without-headers, but something makes GCC still use 128-bit long
double on powerpc64-linux, but it uses 64-bit on powerpc64le-linux.
Curious.  I suppose things work better on Linux userland when you do
not use the spartan build flags buildall uses :-)

If you set it explicitly (--with-long-double-128) it just works.

> FWIW, depending on compiler build options such as the above for kernel
> builds seems to be a little odd to me,

Yes.  It should be (and was!) possible to build the kernel with pretty
much any compiler.  Usual were *-linux* compilers of course, but *-elf
also works, and for powerpc in particular any kind of biarch or not
"just works".

> and I am not sure I'd want to
> blame gcc if the kernel wants to be built with 128-bit floating point
> as default. At the very least, that should be documented somewhere,
> and if possible the kernel should refuse to build if the compiler build
> options don't meet the requirements.

It always was the rule that the kernel did not use floating point at
all.  If that is changed it can no longer be built on targets that use
soft float for example (they need libgcc, which the kernel build is
allergic to for some reason).  It is non-trivial to handle floating
point in the kernel itself as well of course (mostly arch stuff).

The problem here was that some objects are built with soft float and
some with hard float, incompatible ABIs that ld does not want to link
together (without further coercing).


Segher


Re: Regression: Linux v5.15+ does not boot on Freescale P2020

2022-07-25 Thread Segher Boessenkool
On Mon, Jul 25, 2022 at 10:10:09PM +0200, Pali Rohár wrote:
> On Monday 25 July 2022 16:20:49 Christophe Leroy wrote:
> Now I did again clean test with same Debian 10 cross compiler.
> 
> $ git clone 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && cd linux
> $ git checkout v5.15
> $ make mpc85xx_smp_defconfig ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe-
> $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe-
> $ cp -a vmlinux vmlinux.v5.15
> $ git revert 9401f4e46cf6965e23738f70e149172344a01eef
> $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe-
> $ cp -a vmlinux vmlinux.revert
> $ powerpc-linux-gnuspe-objdump -d vmlinux.revert > vmlinux.revert.dump
> $ powerpc-linux-gnuspe-objdump -d vmlinux.v5.15 > vmlinux.v5.15.dump
> $ diff -Naurp vmlinux.v5.15.dump vmlinux.revert.dump
> 
> And there are:
> 
> -c000c304:  7d 20 f8 29 lwarx   r9,0,r31,1
> +c000c304:  7d 20 f8 28 lwarx   r9,0,r31
> 
> I guess it must be reproducible this issue as I'm using regular
> toolchain from distribution.

The kernel had

#define PPC_RAW_LWARX(t, a, b, eh)   (0x7c28 | ___PPC_RT(t) | 
___PPC_RA(a) | ___PPC_RB(b) | __PPC_EH(eh))

and

#define PPC_LWARX(t, a, b, eh) stringify_in_c(.long PPC_RAW_LWARX(t, a, b, eh))

and

#ifdef CONFIG_PPC64
#define __PPC_EH(eh)(((eh) & 0x1) << 0)
#else
#define __PPC_EH(eh)0
#endif

but Christophe's 9401f4e46cf6 changed

-"1:" PPC_LWARX(%0,0,%2,1) "\n\
+"1:lwarx   %0,0,%2,1\n\

no longer checking CONFIG_PPC64.  That appears to be the bug.

The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu
implementations actually raise an illegal insn exception on EH=1.  It
appears P2020 is one of those.


Segher


Re: [PATCH] drm/amdgpu: Re-enable DCN for 64-bit powerpc

2022-07-25 Thread Segher Boessenkool
On Mon, Jul 25, 2022 at 02:34:08PM -0500, Timothy Pearson wrote:
> >> Further digging shows that the build failures only occur with compilers
> >> that default to 64-bit long double.
> > 
> > Where the heck do we have 'long double' things anywhere in the kernel?
> > 
> > I tried to grep for it, and failed miserably. I found some constants
> > that would qualify, but they were in the v4l colorspaces-details.rst
> > doc file.
> > 
> > Strange.
> 
> We don't, at least not that I can see.  The affected code uses standard 
> doubles.
> 
> What I'm wondering is if the compiler is getting confused between standard 
> and long doubles when they are both the same bit length...

The compiler emits the same code (DFmode things, double precision float)
in both cases, and it itself does not see any difference anymore fairly
early in the pipeline.  Compare to int and long on most 32-bit targets,
both are SImode, the compiler will not see different types anymore:
there *are* no types, except in the compiler frontend.

It only happens for powerpc64le things, and not for powerpc64 builds.

It is probably a GCC problem.  I don't see what forces the GCC build
here to use 64-bit long double either btw?  Compilers build via buildall
have all kinds of unnecessary things disabled, but not that, not
directly at least.


Segher


Re: [PATCH] powerpc: Remove the static variable initialisations to 0

2022-07-25 Thread Segher Boessenkool
On Mon, Jul 25, 2022 at 01:27:52PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Sat, Jul 23, 2022 at 03:34:05PM +0200, Michal Suchánek wrote:
> >> Hello,
> >> 
> >> On Sat, Jul 23, 2022 at 05:24:36PM +0800, Jason Wang wrote:
> >> > Initialise global and static variable to 0 is always unnecessary.
> >> > Remove the unnecessary initialisations.
> >> 
> >> Isn't this change also unnecessary?
> >> 
> >> Initializing to 0 does not affect correctness, or even any kind of
> >> semantics in any way.
> >
> > It did make a difference when the kernel was still compiled with
> > -fcommon (which used to be the GCC default on most configurations, it is
> > traditional on Unix).  No explicit initialiser puts an object in .bss if
> > you use -fcommon.  This matters a bit for data layout.
> 
> The kernel has built with -fno-common since ~2002.

2001, yes (255649c18287).  And before that it was important to
initialise everything with static storage duration explicitly in the
source code.  It was part of the collective memory, I wondered if this
patch originated that way?

> I think the belief is that an explicit initialiser of 0 forces the
> variable into .data, but AFAICS that is not true with any compiler we
> support.

Exactly, you get identical code either way, if you use -fno-common.
People will still see this difference if they use a compiler before
GCC 10 for compiling most other things though.


Segher


  1   2   3   4   5   6   7   8   9   10   >