Re: [PATCH 2/2] powerpc: add ALTIVEC support to lib/ when PPC_FPU not set

2021-04-19 Thread Segher Boessenkool
On Mon, Apr 19, 2021 at 03:38:02PM +0200, Christophe Leroy wrote:
> Le 19/04/2021 à 15:32, Segher Boessenkool a écrit :
> >On Sun, Apr 18, 2021 at 01:17:26PM -0700, Randy Dunlap wrote:
> >>Add ldstfp.o to the Makefile for CONFIG_ALTIVEC and add
> >>externs for get_vr() and put_vr() in lib/sstep.c to fix the
> >>build errors.
> >
> >>  obj-$(CONFIG_PPC_FPU) += ldstfp.o
> >>+obj-$(CONFIG_ALTIVEC)  += ldstfp.o
> >
> >It is probably a good idea to split ldstfp.S into two, one for each of
> >the two configuration options?
> >
> 
> Or we can build it all the time and #ifdef the FPU part.
> 
> Because it contains FPU, ALTIVEC and VSX stuff.

So it becomes an empty object file if none of the options are selected?
Good idea :-)


Segher


Re: [PATCH 2/2] powerpc: add ALTIVEC support to lib/ when PPC_FPU not set

2021-04-19 Thread Segher Boessenkool
Hi!

On Sun, Apr 18, 2021 at 01:17:26PM -0700, Randy Dunlap wrote:
> Add ldstfp.o to the Makefile for CONFIG_ALTIVEC and add
> externs for get_vr() and put_vr() in lib/sstep.c to fix the
> build errors.

>  obj-$(CONFIG_PPC_FPU)+= ldstfp.o
> +obj-$(CONFIG_ALTIVEC)+= ldstfp.o

It is probably a good idea to split ldstfp.S into two, one for each of
the two configuration options?


Segher


Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr

2021-04-18 Thread Segher Boessenkool
On Sun, Apr 18, 2021 at 06:24:29PM +0200, Christophe Leroy wrote:
> Le 17/04/2021 à 22:17, Randy Dunlap a écrit :
> >Should the code + Kconfigs/Makefiles handle that kind of
> >kernel config or should ALTIVEC always mean PPC_FPU as well?
> 
> As far as I understand, Altivec is completely independant of FPU in Theory. 

And, as far as the hardware is concerned, in practice as well.

> So it should be possible to use Altivec without using FPU.

Yup.

> However, until recently, it was not possible to de-activate FPU support on 
> book3s/32. I made it possible in order to reduce unneccessary processing on 
> processors like the 832x that has no FPU.

The processor has to implement FP to be compliant to any version of
PowerPC, as far as I know?  So that is all done by emulation, including
all the registers?  Wow painful.

> As far as I can see in cputable.h/.c, 832x is the only book3s/32 without 
> FPU, and it doesn't have ALTIVEC either.

602 doesn't have double-precision hardware, also no 64-bit FP registers.
But that CPU was never any widely used :-)

> So we can in the future ensure that Altivec can be used without FPU 
> support, but for the time being I think it is OK to force selection of FPU 
> when selecting ALTIVEC in order to avoid build failures.

It is useful to allow MSR[VEC,FP]=1,0 but yeah there are no CPUs that
have VMX (aka AltiVec) but that do not have FP.  I don't see how making
that artificial dependency buys anything, but maybe it does?

> >I have patches to fix the build errors with the config as
> >reported but I don't know if that's the right thing to do...

Neither do we, we cannot see those patches :-)


Segher


Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible

2021-04-14 Thread Segher Boessenkool
On Wed, Apr 14, 2021 at 03:32:04PM +, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 14 April 2021 16:19
> ...
> > > Could the kernel use GCC builtin atomic functions instead ?
> > >
> > > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> > 
> > Certainly that should work fine for the simpler cases that the atomic
> > operations are meant to provide.  But esp. for not-so-simple cases the
> > kernel may require some behaviour provided by the existing assembler
> > implementation, and not by the atomic builtins.
> > 
> > I'm not saying this cannot work, just that some serious testing will be
> > needed.  If it works it should be the best of all worlds, so then it is
> > a really good idea yes :-)
> 
> I suspect they just add an extra layer of abstraction that makes it
> even more difficult to verify and could easily get broken by a compiler
> update (etc).

I would say it uses an existing facility, instead of creating a kernel-
specific one.

> The other issue is that the code needs to be correct with compiled
> with (for example) -O0.
> That could very easily break anything except the asm implementation
> if additional memory accesses and/or increased code size cause grief.

The compiler generates correct code.  New versions of the compiler or
old, -O0 or not, under any phase of the moon.

Of course sometimes the compiler is broken, but there are pre-existing
ways of dealing with that, and there is no reason at all to think this
would break more often than random other code.


Segher


Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible

2021-04-14 Thread Segher Boessenkool
On Wed, Apr 14, 2021 at 02:42:51PM +0200, Christophe Leroy wrote:
> Le 14/04/2021 à 14:24, Segher Boessenkool a écrit :
> >On Wed, Apr 14, 2021 at 12:01:21PM +1000, Nicholas Piggin wrote:
> >>Would be nice if we could let the compiler deal with it all...
> >>
> >>static inline unsigned long lr(unsigned long *mem)
> >>{
> >> unsigned long val;
> >>
> >> /*
> >>  * This doesn't clobber memory but want to avoid memory 
> >>  operations
> >>  * moving ahead of it
> >>  */
> >> asm volatile("ldarx %0, %y1" : "=r"(val) : "Z"(*mem) : 
> >> "memory");
> >>
> >> return val;
> >>}
> >
> >(etc.)
> >
> >That can not work reliably: the compiler can put random instructions
> >between the larx and stcx. this way, and you then do not have guaranteed
> >forward progress anymore.  It can put the two in different routines
> >(after inlining and other interprocedural optimisations), duplicate
> >them, make a different number of copies of them, etc.
> >
> >Nothing of that is okay if you want to guarantee forward progress on all
> >implementations, and also not if you want to have good performance
> >everywhere (or anywhere even).  Unfortunately you have to write all
> >larx/stcx. loops as one block of assembler, so that you know exactly
> >what instructions will end up in your binary.
> >
> >If you don't, it will fail mysteriously after random recompilations, or
> >have performance degradations, etc.  You don't want to go there :-)
> >
> 
> Could the kernel use GCC builtin atomic functions instead ?
> 
> https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

Certainly that should work fine for the simpler cases that the atomic
operations are meant to provide.  But esp. for not-so-simple cases the
kernel may require some behaviour provided by the existing assembler
implementation, and not by the atomic builtins.

I'm not saying this cannot work, just that some serious testing will be
needed.  If it works it should be the best of all worlds, so then it is
a really good idea yes :-)


Segher


Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible

2021-04-14 Thread Segher Boessenkool
On Wed, Apr 14, 2021 at 12:01:21PM +1000, Nicholas Piggin wrote:
> Would be nice if we could let the compiler deal with it all...
> 
> static inline unsigned long lr(unsigned long *mem)
> {
> unsigned long val;
> 
> /*
>  * This doesn't clobber memory but want to avoid memory operations
>  * moving ahead of it
>  */
> asm volatile("ldarx %0, %y1" : "=r"(val) : "Z"(*mem) : "memory");
> 
> return val;
> }

(etc.)

That can not work reliably: the compiler can put random instructions
between the larx and stcx. this way, and you then do not have guaranteed
forward progress anymore.  It can put the two in different routines
(after inlining and other interprocedural optimisations), duplicate
them, make a different number of copies of them, etc.

Nothing of that is okay if you want to guarantee forward progress on all
implementations, and also not if you want to have good performance
everywhere (or anywhere even).  Unfortunately you have to write all
larx/stcx. loops as one block of assembler, so that you know exactly
what instructions will end up in your binary.

If you don't, it will fail mysteriously after random recompilations, or
have performance degradations, etc.  You don't want to go there :-)


Segher


Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible

2021-04-13 Thread Segher Boessenkool
On Tue, Apr 13, 2021 at 06:33:19PM +0200, Christophe Leroy wrote:
> Le 12/04/2021 à 23:54, Segher Boessenkool a écrit :
> >On Thu, Apr 08, 2021 at 03:33:44PM +, Christophe Leroy wrote:
> >>For clear bits, on 32 bits 'rlwinm' can be used instead or 'andc' for
> >>when all bits to be cleared are consecutive.
> >
> >Also on 64-bits, as long as both the top and bottom bits are in the low
> >32-bit half (for 32 bit mode, it can wrap as well).
> 
> Yes. But here we are talking about clearing a few bits, all other ones must 
> remain unchanged. An rlwinm on PPC64 will always clear the upper part, 
> which is unlikely what we want.

No, it does not.  It takes the low 32 bits of the source reg, duplicated
to the top half as well, then rotated, then ANDed with the mask (which
can wrap around).  This isn't very often very useful, but :-)

(One useful operation is splatting 32 bits to both halves of a 64-bit
register, which is just rlwinm d,s,0,1,0).

If you only look at the low 32 bits, it does exactly the same as on
32-bit implementations.

> >>For the time being only
> >>handle the single bit case, which we detect by checking whether the
> >>mask is a power of two.
> >
> >You could look at rs6000_is_valid_mask in GCC:
> >   
> > <https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/rs6000/rs6000.c;h=48b8efd732b251c059628096314848305deb0c0b;hb=HEAD#l11148>
> >used by rs6000_is_valid_and_mask immediately after it.  You probably
> >want to allow only rlwinm in your case, and please note this checks if
> >something is a valid mask, not the inverse of a valid mask (as you
> >want here).
> 
> This check looks more complex than what I need. It is used for both rlw... 
> and rld..., and it calculates the operants.  The only thing I need is to 
> validate the mask.

It has to do exactly the same thing for rlwinm as for all 64-bit
variants (rldicl, rldicr, rldic).

One side effect of calculation the bit positions with exact_log2 is that
that returns negative if the argument is not a power of two.

Here is a simpler way, that handles all cases:  input in "u32 val":

if (!val)
return nonono;
if (val & 1)
val = ~val; // make the mask non-wrapping
val += val & -val;  // adding the low set bit should result in
// at most one bit set
if (!(val & (val - 1)))
return okidoki_all_good;

> I found a way: By anding the mask with the complement of itself rotated by 
> left bits to 1, we identify the transitions from 0 to 1. If the result is a 
> power of 2, it means there's only one transition so the mask is as expected.

That does not handle all cases (it misses all bits set at least).  Which
isn't all that interesting of course, but is a valid mask (but won't
clear any bits, so not too interesting for your specific case :-) )


Segher


Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-04-12 Thread Segher Boessenkool
On Fri, Apr 09, 2021 at 02:36:16PM +1000, Alexey Kardashevskiy wrote:
> On 08/04/2021 19:04, Michael Ellerman wrote:
> +#define QUERY_DDW_PGSIZE_4K  0x01
> +#define QUERY_DDW_PGSIZE_64K 0x02
> +#define QUERY_DDW_PGSIZE_16M 0x04
> +#define QUERY_DDW_PGSIZE_32M 0x08
> +#define QUERY_DDW_PGSIZE_64M 0x10
> +#define QUERY_DDW_PGSIZE_128M0x20
> +#define QUERY_DDW_PGSIZE_256M0x40
> +#define QUERY_DDW_PGSIZE_16G 0x80
> >>>
> >>>I'm not sure the #defines really gain us much vs just putting the
> >>>literal values in the array below?
> >>
> >>Then someone says "u magic values" :) I do not mind either way. 
> >>Thanks,
> >
> >Yeah that's true. But #defining them doesn't make them less magic, if
> >you only use them in one place :)
> 
> Defining them with "QUERY_DDW" in the names kinda tells where they are 
> from. Can also grep QEMU using these to see how the other side handles 
> it. Dunno.

And *not* defining anything reduces the mental load a lot.  You can add
a comment at the single spot you use them, explaining what this is, in a
much better way!

Comments are *good*.


Segher


Re: [PATCH v1 2/2] powerpc/atomics: Use immediate operand when possible

2021-04-12 Thread Segher Boessenkool
Hi!

On Thu, Apr 08, 2021 at 03:33:45PM +, Christophe Leroy wrote:
> +#define ATOMIC_OP(op, asm_op, dot, sign) \
>  static __inline__ void atomic_##op(int a, atomic_t *v)   
> \
>  {\
>   int t;  \
>   \
>   __asm__ __volatile__(   \
>  "1:  lwarx   %0,0,%3 # atomic_" #op "\n" \
> - #asm_op " %0,%2,%0\n"   \
> + #asm_op "%I2" dot " %0,%0,%2\n" \
>  "stwcx.  %0,0,%3 \n" \
>  "bne-1b\n"   \
> - : "=" (t), "+m" (v->counter)  \
> - : "r" (a), "r" (>counter)\
> + : "=" (t), "+m" (v->counter)  \
> + : "r"#sign (a), "r" (>counter)   \
>   : "cc");\
>  }\

You need "b" (instead of "r") only for "addi".  You can use "addic"
instead, which clobbers XER[CA], but *all* inline asm does, so that is
not a downside here (it is also not slower on any CPU that matters).

> @@ -238,14 +238,14 @@ static __inline__ int atomic_fetch_add_unless(atomic_t 
> *v, int a, int u)
>  "1:  lwarx   %0,0,%1 # atomic_fetch_add_unless\n\
>   cmpw0,%0,%3 \n\
>   beq 2f \n\
> - add %0,%2,%0 \n"
> + add%I2  %0,%0,%2 \n"
>  "stwcx.  %0,0,%1 \n\
>   bne-    1b \n"
>   PPC_ATOMIC_EXIT_BARRIER
> -"subf%0,%2,%0 \n\
> +"sub%I2  %0,%0,%2 \n\
>  2:"
> - : "=" (t)
> - : "r" (>counter), "r" (a), "r" (u)
> + : "=" (t)
> + : "r" (>counter), "rI" (a), "r" (u)
>   : "cc", "memory");

Same here.

Nice patches!

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible

2021-04-12 Thread Segher Boessenkool
Hi!

On Thu, Apr 08, 2021 at 03:33:44PM +, Christophe Leroy wrote:
> For clear bits, on 32 bits 'rlwinm' can be used instead or 'andc' for
> when all bits to be cleared are consecutive.

Also on 64-bits, as long as both the top and bottom bits are in the low
32-bit half (for 32 bit mode, it can wrap as well).

> For the time being only
> handle the single bit case, which we detect by checking whether the
> mask is a power of two.

You could look at rs6000_is_valid_mask in GCC:
  

used by rs6000_is_valid_and_mask immediately after it.  You probably
want to allow only rlwinm in your case, and please note this checks if
something is a valid mask, not the inverse of a valid mask (as you
want here).

So yes this is pretty involved :-)

Your patch looks good btw.  But please use "n", not "i", as constraint?


Segher


Re: static_branch/jump_label vs branch merging

2021-04-10 Thread Segher Boessenkool
On Fri, Apr 09, 2021 at 12:33:29PM -0700, Nick Desaulniers wrote:
> Since asm goto is implicitly volatile qualified, it sounds like
> removing the implicit volatile qualifier from asm goto might help?
> Then if there were side effects but you forgot to inform the compiler
> that there were via an explicit volatile qualifier, and it performed
> the suggested merge, oh well.

"asm goto" without outputs is always volatile, just like any other asm
without outputs (if it wasn't, the compiler would always delete every
asm without outputs!)


Segher


Re: [PATCH v3] powerpc/traps: Enhance readability for trap types

2021-04-10 Thread Segher Boessenkool
On Sat, Apr 10, 2021 at 11:42:41AM +0200, Christophe Leroy wrote:
> Le 10/04/2021 à 02:04, Michael Ellerman a écrit :
> >I think these can all be avoided by defining most of the values
> >regardless of what platform we're building for. Only the values that
> >overlap need to be kept behind an ifdef.
> 
> Even if values overlap we can keep multiple definitions for the same value.

That works, but it can lead to puzzling bugs.  Of course we all *like*
working on more challenging bugs, but :-)


Segher


Re: [PATCH v3] powerpc/traps: Enhance readability for trap types

2021-04-09 Thread Segher Boessenkool
On Fri, Apr 09, 2021 at 06:14:19PM +0200, Christophe Leroy wrote:
> >+#define INTERRUPT_SYSTEM_RESET0x100
> 
> INT_SRESET

SRESET exists on many PowerPC, it means "soft reset".  Not the same
thing at all.

I think "INT" is not a great prefix fwiw, there are many things you can
abbr to "INT".

> >+#define INTERRUPT_DATA_SEGMENT0x380
> 
> INT_DSEG

exceptions-64s.S calls this "DSLB" (I remember "DSSI" though -- but neither
is a very official name).  It probably is a good idea to look at that
existing code, not make up even more new names :-)

> >+#define INTERRUPT_DOORBELL0xa00
> 
> INT_DBELL

That saves three characters and makes it very not understandable.


Segher


Re: static_branch/jump_label vs branch merging

2021-04-09 Thread Segher Boessenkool
On Thu, Apr 08, 2021 at 06:52:18PM +0200, Peter Zijlstra wrote:
> Is there *any* way in which we can have the compiler recognise that the
> asm_goto only depends on its arguments and have it merge the branches
> itself?
> 
> I do realize that asm-goto being volatile this is a fairly huge ask, but
> I figured I should at least raise the issue, if only to raise awareness.

"volatile" should not be an impediment to this at all, volatile means
there is an unspecified side effect, nothing more, nothing less.  But
yes this currently does not work with GCC:

void f(int x)
{   if (x)
asm volatile("ojee %0" :: "r"(x));
else
asm volatile("ojee %0" :: "r"(x));
}

or even

static inline void h(int x)
{
asm volatile("ojee %0" :: "r"(x));
}

void f1(int x)
{   if (x)
h(x);
else
h(x);
}

which both emit silly machine code.


Segher


Re: [PATCH 2/2] powerpc: make 'boot_text_mapped' static

2021-04-09 Thread Segher Boessenkool
Hi!

On Thu, Apr 08, 2021 at 07:04:35AM +0200, Christophe Leroy wrote:
> Le 08/04/2021 à 03:18, Yu Kuai a écrit :
> >-int boot_text_mapped __force_data = 0;
> >+static int boot_text_mapped __force_data;
> 
> Are you sure the initialisation to 0 can be removed ? Usually 
> initialisation to 0 is not needed because not initialised variables go in 
> the BSS section which is zeroed at startup. But here the variable is 
> flagged with __force_data so it is not going in the BSS section.

Any non-automatic (i.e. function-scope, not static) variable is
initialised to 0.  See e.g. C11 6.7.9/10 (this has been like that since
times immemorial, C90 anyway).


Segher


Re: [PATCH] powerpc/32: Remove powerpc specific definition of 'ptrdiff_t'

2021-04-05 Thread Segher Boessenkool
On Mon, Apr 05, 2021 at 09:57:27AM +, Christophe Leroy wrote:
> For unknown reason, old commit d27dfd388715 ("Import pre2.0.8")
> changed 'ptrdiff_t' from 'int' to 'long'.
> 
> GCC expects it as 'int' really,

It isn't actually defined in the ABI as far as I can see (neither the
old document or the new one), but GCC has defined it as "int" since
forever (which was in 1995), for anything using the SYSV ABI (which
includes powerpc-linux).

>  defines it as 'int', and
> defines 'size_t' and 'ssize_t' exactly as powerpc do, so
> remove the powerpc specific definitions and fallback on
> generic ones.
> 
> Signed-off-by: Christophe Leroy 

Thanks!

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH v2] powerpc/traps: Enhance readability for trap types

2021-04-01 Thread Segher Boessenkool
On Thu, Apr 01, 2021 at 06:01:29PM +1000, Nicholas Piggin wrote:
> Excerpts from Michael Ellerman's message of April 1, 2021 12:39 pm:
> > Segher Boessenkool  writes:
> >> On Wed, Mar 31, 2021 at 08:58:17PM +1100, Michael Ellerman wrote:
> >>> So perhaps:
> >>> 
> >>>   EXC_SYSTEM_RESET
> >>>   EXC_MACHINE_CHECK
> >>>   EXC_DATA_STORAGE
> >>>   EXC_DATA_SEGMENT
> >>>   EXC_INST_STORAGE
> >>>   EXC_INST_SEGMENT
> >>>   EXC_EXTERNAL_INTERRUPT
> >>>   EXC_ALIGNMENT
> >>>   EXC_PROGRAM_CHECK
> >>>   EXC_FP_UNAVAILABLE
> >>>   EXC_DECREMENTER
> >>>   EXC_HV_DECREMENTER
> >>>   EXC_SYSTEM_CALL
> >>>   EXC_HV_DATA_STORAGE
> >>>   EXC_PERF_MONITOR
> >>
> >> These are interrupt (vectors), not exceptions.  It doesn't matter all
> >> that much, but confusing things more isn't useful either!  There can be
> >> multiple exceptions that all can trigger the same interrupt.
> > 
> > Yeah I know, but I think that ship has already sailed as far as the
> > naming we have in the kernel.
> 
> It has, but there are also several other ships also sailing in different 
> directions. It could be worse though, at least they are not sideways in 
> the Suez.

:-)

> > We have over 250 uses of "exc", and several files called "exception"
> > something.
> > 
> > Using "interrupt" can also be confusing because Linux uses that to mean
> > "external interrupt".
> > 
> > But I dunno, maybe INT or VEC is clearer? .. or TRAP :)
> 
> We actually already have defines that follow Segher's suggestion, it's 
> just that they're hidden away in a KVM header.
> 
> #define BOOK3S_INTERRUPT_SYSTEM_RESET   0x100
> #define BOOK3S_INTERRUPT_MACHINE_CHECK  0x200
> #define BOOK3S_INTERRUPT_DATA_STORAGE   0x300
> #define BOOK3S_INTERRUPT_DATA_SEGMENT   0x380
> #define BOOK3S_INTERRUPT_INST_STORAGE   0x400
> #define BOOK3S_INTERRUPT_INST_SEGMENT   0x480
> #define BOOK3S_INTERRUPT_EXTERNAL   0x500
> #define BOOK3S_INTERRUPT_EXTERNAL_HV0x502
> #define BOOK3S_INTERRUPT_ALIGNMENT  0x600
> 
> It would take just a small amount of work to move these to general 
> powerpc header, add #ifdefs for Book E/S where the numbers differ,
> and remove the BOOK3S_ prefix.
> 
> I don't mind INTERRUPT_ but INT_ would be okay too. VEC_ actually
> doesn't match what Book E does (which is some weirdness to map some
> of them to match Book S but not all, arguably we should clean that
> up too and just use vector numbers consistently, but the INTERRUPT_
> prefix would still be valid if we did that).

VEC also is pretty incorrect: there is code at those addresses, not
vectors pointing to code (as similar things on some other architectures
have).  Everyone understands what it means of course, except it is
confusing with a thing we *do* have on Power called VEC (the MSR bit) :-P

(And TRAP is just one cause of 700...)


Segher


Re: [PATCH v2] powerpc/traps: Enhance readability for trap types

2021-04-01 Thread Segher Boessenkool
On Thu, Apr 01, 2021 at 10:55:58AM +0800, Xiongwei Song wrote:
> Segher Boessenkool  于2021年4月1日周四 上午6:15写道:
> 
> > On Wed, Mar 31, 2021 at 08:58:17PM +1100, Michael Ellerman wrote:
> > > So perhaps:
> > >
> > >   EXC_SYSTEM_RESET
> > >   EXC_MACHINE_CHECK
> > >   EXC_DATA_STORAGE
> > >   EXC_DATA_SEGMENT
> > >   EXC_INST_STORAGE
> > >   EXC_INST_SEGMENT
> > >   EXC_EXTERNAL_INTERRUPT
> > >   EXC_ALIGNMENT
> > >   EXC_PROGRAM_CHECK
> > >   EXC_FP_UNAVAILABLE
> > >   EXC_DECREMENTER
> > >   EXC_HV_DECREMENTER
> > >   EXC_SYSTEM_CALL
> > >   EXC_HV_DATA_STORAGE
> > >   EXC_PERF_MONITOR
> >
> > These are interrupt (vectors), not exceptions.  It doesn't matter all
> > that much, but confusing things more isn't useful either!  There can be
> > multiple exceptions that all can trigger the same interrupt.
> >
> >  When looking at the reference manual of e500 and e600 from NXP
>  official, they call them as interrupts.While looking at the "The
> Programming Environments"
>  that is also from NXP, they call them exceptions. Looks like there is
>  no explicit distinction between interrupts and exceptions.

The architecture documents have always called it interrupts.  The PEM
says it calls them exceptions instead, but they are called interrupts in
the architecture (and the PEM says that, too).

> Here is the "The Programming Environments" link:
> https://www.nxp.com.cn/docs/en/user-guide/MPCFPE_AD_R1.pdf

That document is 24 years old.  The architecture is still published,
new versions regularly.

> As far as I know, the values of interrupts or exceptions above are defined
> explicitly in reference manual or the programming environments.

They are defined in the architecture.

> Could
> you please provide more details about multiple exceptions with the same
> interrupts?

The simplest example is 700, program interrupt.  There are many causes
for it, including all the exceptions in FPSCR: VX, ZX, OX, UX, XX, and
VX is actually divided into nine separate cases itself.  There also are
the various causes of privileged instruction type program interrupts,
and  the trap type program interrupt, but the FEX ones are most obvious
here.


Segher


Re: [PATCH v2] powerpc/traps: Enhance readability for trap types

2021-03-31 Thread Segher Boessenkool
On Wed, Mar 31, 2021 at 08:58:17PM +1100, Michael Ellerman wrote:
> So perhaps:
> 
>   EXC_SYSTEM_RESET
>   EXC_MACHINE_CHECK
>   EXC_DATA_STORAGE
>   EXC_DATA_SEGMENT
>   EXC_INST_STORAGE
>   EXC_INST_SEGMENT
>   EXC_EXTERNAL_INTERRUPT
>   EXC_ALIGNMENT
>   EXC_PROGRAM_CHECK
>   EXC_FP_UNAVAILABLE
>   EXC_DECREMENTER
>   EXC_HV_DECREMENTER
>   EXC_SYSTEM_CALL
>   EXC_HV_DATA_STORAGE
>   EXC_PERF_MONITOR

These are interrupt (vectors), not exceptions.  It doesn't matter all
that much, but confusing things more isn't useful either!  There can be
multiple exceptions that all can trigger the same interrupt.


Segher


Re: [PATCH mm] kfence: fix printk format for ptrdiff_t

2021-03-19 Thread Segher Boessenkool
On Thu, Mar 18, 2021 at 10:38:43AM +0100, Christophe Leroy wrote:
> Yes it seems to be wrong. It was changed by commit d27dfd3887 ("Import 
> pre2.0.8"), so that's long time ago. Before that it was an 'int' for ppc32.
> 
> gcc provides ptrdiff_t in stddef.h via __PTRDIFF_TYPE__
> gcc defined __PTRDIFF_TYPE__ as 'int' at build time.

(On 32-bit PowerPC Linux.)

> Should we fix it in arch/powerpc/include/uapi/asm/posix_types.h ?

I think so, yes.

> Anyway 
> 'long' and 'int' makes no functionnal difference on 32 bits so there should 
> be no impact for users if any.

Except that long and int are different types, which causes errors like
what you have here.  There may be similar fallout from changing it back.


Segher


Re: [PATCH mm] kfence: fix printk format for ptrdiff_t

2021-03-16 Thread Segher Boessenkool
On Tue, Mar 16, 2021 at 09:32:32AM +0100, Christophe Leroy wrote:
> +segher

I cannot see through the wood of #defines here, sorry.

> Still a problem.
> 
> I don't understand, gcc bug ?

Rule #1: If you do not understand what is happening, it is not a
compiler bug.  I'm not saying that it isn't, just that it is much more
likely something else.

> The offending argument is 'const ptrdiff_t object_index'
> 
> We have:
> 
> arch/powerpc/include/uapi/asm/posix_types.h:typedef long   
> __kernel_ptrdiff_t;

So this is a 64-bit build.

> include/linux/types.h:typedef __kernel_ptrdiff_t  ptrdiff_t;
> 
> And get:
> 
>   CC  mm/kfence/report.o
> In file included from ./include/linux/printk.h:7,
>  from ./include/linux/kernel.h:16,
>  from mm/kfence/report.c:10:
> mm/kfence/report.c: In function 'kfence_report_error':
> ./include/linux/kern_levels.h:5:18: warning: format '%td' expects argument 
> of type 'ptrdiff_t', but argument 6 has type 'long int' [-Wformat=]

This is declared as
const ptrdiff_t object_index = meta ? meta - kfence_metadata : -1;
so maybe something with that goes wrong?  What happens if you delete the
(useless) "const" here?


Segher


Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure

2021-03-15 Thread Segher Boessenkool
On Mon, Mar 15, 2021 at 04:38:52PM +, David Laight wrote:
> From: Rasmus Villemoes
> > Sent: 15 March 2021 16:24
> > On 12/03/2021 03.29, Segher Boessenkool wrote:
> > > On Tue, Mar 09, 2021 at 06:19:30AM +, Christophe Leroy wrote:
> > >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> > >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> > >> generates a call to _restgpr_31_x.
> > >
> > >> I don't know if there is a way to tell GCC not to emit that call, 
> > >> because at the end we get more
> > instructions than needed.
> > >
> > > The function is required by the ABI, you need to have it.
> > >
> > > You get *fewer* insns statically, and that is what -Os is about: reduce
> > > the size of the binaries.
> > 
> > Is there any reason to not just always build the vdso with -O2? It's one
> > page/one VMA either way, and the vdso is about making certain system
> > calls cheaper, so if unconditional -O2 could save a few cycles compared
> > to -Os, why not? (And if, as it seems, there's only one user within the
> > DSO of _restgpr_31_x, yes, the overall size of the .text segment
> > probably increases slightly).
> 
> Sometimes -Os generates such horrid code you really never want to use it.
> A classic is on x86 where it replaces 'load register with byte constant'
> with 'push byte' 'pop register'.
> The code is actually smaller but the execution time is horrid.
> 
> There are also cases where -O2 actually generates smaller code.

Yes, as with all heuristics it doesn't always work out.  But usually -Os
is smaller.

> Although you may need to disable loop unrolling (often dubious at best)
> and either force or disable some function inlining.

The cases where GCC does loop unrolling at -O2 always help quite a lot.
Or, do you have a counter-example?  We'd love to see one.

And yup, inlining is hard.  GCC's heuristics there are very good
nowadays, but any single decision has big effects.  Doing the important
spots manually (always_inline or noinline) has good payoff.


Segher


Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure

2021-03-15 Thread Segher Boessenkool
On Mon, Mar 15, 2021 at 05:23:44PM +0100, Rasmus Villemoes wrote:
> On 12/03/2021 03.29, Segher Boessenkool wrote:
> > On Tue, Mar 09, 2021 at 06:19:30AM +, Christophe Leroy wrote:
> >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> >> generates a call to _restgpr_31_x.
> > 
> >> I don't know if there is a way to tell GCC not to emit that call, because 
> >> at the end we get more instructions than needed.
> > 
> > The function is required by the ABI, you need to have it.
> > 
> > You get *fewer* insns statically, and that is what -Os is about: reduce
> > the size of the binaries.
> 
> Is there any reason to not just always build the vdso with -O2? It's one
> page/one VMA either way, and the vdso is about making certain system
> calls cheaper, so if unconditional -O2 could save a few cycles compared
> to -Os, why not? (And if, as it seems, there's only one user within the
> DSO of _restgpr_31_x, yes, the overall size of the .text segment
> probably increases slightly).

You can use exactly the same reasoning for using -O2 instead of -Os
anywhere else.

-Os doesn't mean "smaller code, but only where that is reasonable".  It
means "smaller code".


Segher


Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure

2021-03-11 Thread Segher Boessenkool
Hi!

On Tue, Mar 09, 2021 at 06:19:30AM +, Christophe Leroy wrote:
> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> generates a call to _restgpr_31_x.

> I don't know if there is a way to tell GCC not to emit that call, because at 
> the end we get more instructions than needed.

The function is required by the ABI, you need to have it.

You get *fewer* insns statically, and that is what -Os is about: reduce
the size of the binaries.

(The only reason you get such problems is because Linux does not have
all of libgcc.  You can have that *and* have some symbols cause link
errors, it isn't rocket science).


Segher


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-10 Thread Segher Boessenkool
Hi!

On Wed, Mar 10, 2021 at 11:32:20AM +, Mark Rutland wrote:
> On Tue, Mar 09, 2021 at 04:05:32PM -0600, Segher Boessenkool wrote:
> > On Tue, Mar 09, 2021 at 04:05:23PM +, Mark Rutland wrote:
> > > On Thu, Mar 04, 2021 at 03:54:48PM -0600, Segher Boessenkool wrote:
> > > > On Thu, Mar 04, 2021 at 02:57:30PM +, Mark Rutland wrote:
> > > > > It looks like GCC is happy to give us the function-entry-time FP if 
> > > > > we use
> > > > > __builtin_frame_address(1),
> > > > 
> > > > From the GCC manual:
> > > >  Calling this function with a nonzero argument can have
> > > >  unpredictable effects, including crashing the calling program.  As
> > > >  a result, calls that are considered unsafe are diagnosed when the
> > > >  '-Wframe-address' option is in effect.  Such calls should only be
> > > >  made in debugging situations.
> > > > 
> > > > It *does* warn (the warning is in -Wall btw), on both powerpc and
> > > > aarch64.  Furthermore, using this builtin causes lousy code (it forces
> > > > the use of a frame pointer, which we normally try very hard to optimise
> > > > away, for good reason).
> > > > 
> > > > And, that warning is not an idle warning.  Non-zero arguments to
> > > > __builtin_frame_address can crash the program.  It won't on simpler
> > > > functions, but there is no real definition of what a simpler function
> > > > *is*.  It is meant for debugging, not for production use (this is also
> > > > why no one has bothered to make it faster).
> > > >
> > > > On Power it should work, but on pretty much any other arch it won't.
> > > 
> > > I understand this is true generally, and cannot be relied upon in
> > > portable code. However as you hint here for Power, I believe that on
> > > arm64 __builtin_frame_address(1) shouldn't crash the program due to the
> > > way frame records work on arm64, but I'll go check with some local
> > > compiler folk. I agree that __builtin_frame_address(2) and beyond
> > > certainly can, e.g.  by NULL dereference and similar.
> > 
> > I still do not know the aarch64 ABI well enough.  If only I had time!
> > 
> > > For context, why do you think this would work on power specifically? I
> > > wonder if our rationale is similar.
> > 
> > On most 64-bit Power ABIs all stack frames are connected together as a
> > linked list (which is updated atomically, importantly).  This makes it
> > possible to always find all previous stack frames.
> 
> We have something similar on arm64, where the kernel depends on being
> built with a frame pointer following the AAPCS frame pointer rules.

The huge difference is on Power this is about the stack itself: you do
not need a frame pointer at all for it (there is no specific register
named as frame pointer, even).

> Every stack frame contains a "frame record" *somewhere* within that
> stack frame, and the frame records are chained together as a linked
> list. The frame pointer points at the most recent frame record (and this
> is what __builtin_frame_address(0) returns).

> > See gcc.gnu.org/PR60109 for example.
> 
> Sure; I see that being true generally (and Ramana noted that on 32-bit
> arm a frame pointer wasn't mandated), but I think in this case we have a
> stronger target (and configuration) specific guarantee.

It sounds like it, yes.  You need to have a frame pointer in the ABI,
with pretty strong rules, and have everything follow those rules.

> > Is the frame pointer required?!
> 
> The arm64 Linux port mandates frame pointers for kernel code. It is
> generally possible to build code without frame pointers (e.g. userspace),
> but doing that for kernel code would be a bug.

I see.  And it even is less expensive to do this than on most machines,
because of register pair load/store instructions :-)

> > > > The real way forward is to bite the bullet and to no longer pretend you
> > > > can do a full backtrace from just the stack contents.  You cannot.
> > > 
> > > I think what you mean here is that there's no reliable way to handle the
> > > current/leaf function, right? If so I do agree.
> > 
> > No, I meant what I said.
> > 
> > There is the separate issue that you do not know where the return
> > address (etc.) is stored in a function that has not yet done a call
> > itself, sure.  You cannot assume anything the ABI does not tell you you
> > can depend on.
> 
> This is in the frame record per the AAPCS.

But you do not know where in the function it will store that.  It often
can be optimised by the compiler to only store the LR and FP on paths
where a call will happen later, and there is no way (without DWARF info
or similar) to know whether that has happened yet or not.

This is a well-known problem of course.  For the current function you
cannot know in general if there is an activation frame yet or not.


Segher


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-09 Thread Segher Boessenkool
Hi!

On Tue, Mar 09, 2021 at 04:05:23PM +, Mark Rutland wrote:
> On Thu, Mar 04, 2021 at 03:54:48PM -0600, Segher Boessenkool wrote:
> > On Thu, Mar 04, 2021 at 02:57:30PM +, Mark Rutland wrote:
> > > It looks like GCC is happy to give us the function-entry-time FP if we use
> > > __builtin_frame_address(1),
> > 
> > From the GCC manual:
> >  Calling this function with a nonzero argument can have
> >  unpredictable effects, including crashing the calling program.  As
> >  a result, calls that are considered unsafe are diagnosed when the
> >  '-Wframe-address' option is in effect.  Such calls should only be
> >  made in debugging situations.
> > 
> > It *does* warn (the warning is in -Wall btw), on both powerpc and
> > aarch64.  Furthermore, using this builtin causes lousy code (it forces
> > the use of a frame pointer, which we normally try very hard to optimise
> > away, for good reason).
> > 
> > And, that warning is not an idle warning.  Non-zero arguments to
> > __builtin_frame_address can crash the program.  It won't on simpler
> > functions, but there is no real definition of what a simpler function
> > *is*.  It is meant for debugging, not for production use (this is also
> > why no one has bothered to make it faster).
> >
> > On Power it should work, but on pretty much any other arch it won't.
> 
> I understand this is true generally, and cannot be relied upon in
> portable code. However as you hint here for Power, I believe that on
> arm64 __builtin_frame_address(1) shouldn't crash the program due to the
> way frame records work on arm64, but I'll go check with some local
> compiler folk. I agree that __builtin_frame_address(2) and beyond
> certainly can, e.g.  by NULL dereference and similar.

I still do not know the aarch64 ABI well enough.  If only I had time!

> For context, why do you think this would work on power specifically? I
> wonder if our rationale is similar.

On most 64-bit Power ABIs all stack frames are connected together as a
linked list (which is updated atomically, importantly).  This makes it
possible to always find all previous stack frames.

> Are you aware of anything in particular that breaks using
> __builtin_frame_address(1) in non-portable code, or is this just a
> general sentiment of this not being a supported use-case?

It is not supported, and trying to do it anyway can crash: it can use
random stack contents as pointer!  Not really "random" of course, but
where it thinks to find a pointer into the previous frame, which is not
something it can rely on (unless the ABI guarantees it somehow).

See gcc.gnu.org/PR60109 for example.

> > > Unless we can get some strong guarantees from compiler folk such that we
> > > can guarantee a specific function acts boundary for unwinding (and
> > > doesn't itself get split, etc), the only reliable way I can think to
> > > solve this requires an assembly trampoline. Whatever we do is liable to
> > > need some invasive rework.
> > 
> > You cannot get such a guarantee, other than not letting the compiler
> > see into the routine at all, like with assembler code (not inline asm,
> > real assembler code).
> 
> If we cannot reliably ensure this then I'm happy to go write an assembly
> trampoline to snapshot the state at a function call boundary (where our
> procedure call standard mandates the state of the LR, FP, and frame
> records pointed to by the FP).

Is the frame pointer required?!

> This'll require reworking a reasonable
> amount of code cross-architecture, so I'll need to get some more
> concrete justification (e.g. examples of things that can go wrong in
> practice).

Say you have a function that does dynamic stack allocation, then there
is usually no way to find the previous stack frame (without function-
specific knowledge).  So __builtin_frame_address cannot work (it knows
nothing about frames further up).

Dynamic stack allocation (alloca, or variable length automatic arrays)
is just the most common and most convenient example; it is not the only
case you have problems here.

> > The real way forward is to bite the bullet and to no longer pretend you
> > can do a full backtrace from just the stack contents.  You cannot.
> 
> I think what you mean here is that there's no reliable way to handle the
> current/leaf function, right? If so I do agree.

No, I meant what I said.

There is the separate issue that you do not know where the return
address (etc.) is stored in a function that has not yet done a call
itself, sure.  You cannot assume anything the ABI does not tell you you
can depend on.

> Beyond that I believe that arm64's frame records should be sufficient.

Do you have a simple linked list connecting all frames?  The aarch64 GCC
port does not define anything special here (DYNAMIC_CHAIN_ADDRESS), so
the default will be used: every frame pointer has to point to the
previous one, no exceptions whatsoever.


Segher


Re: [PATCH] Replace __toc_start + 0x8000 with .TOC.

2021-03-06 Thread Segher Boessenkool
Hi!

On Sat, Mar 06, 2021 at 09:14:33PM -0800, Fangrui Song wrote:
> TOC relocations are like GOT relocations on other architectures.
> However, unlike other architectures, GNU ld's ppc64 port defines .TOC.
> relative to the .got output section instead of the linker synthesized
> .got input section. LLD defines .TOC. as the .got input section plus
> 0x8000. When CONFIG_PPC_OF_BOOT_TRAMPOLINE=y,
> arch/powerpc/kernel/prom_init.o is built, and LLD computed .TOC. can be
> different from __toc_start defined by the linker script.
> 
> Simplify kernel_toc_addr with asm label .TOC. so that we can get rid of
> __toc_start.
> 
> With this change, powernv_defconfig with CONFIG_PPC_OF_BOOT_TRAMPOLINE=y
> is bootable with LLD. There is still an untriaged issue with Alexey's
> configuration.

Do you have any explanation why this *does* work, while the original
doesn't?  Some explanation that says *what* is wrong.  To me it doesn't
look like the kernel script is.


Segher


Re: [PATCH v2 1/7] cmdline: Add generic function to build command line.

2021-03-05 Thread Segher Boessenkool
On Fri, Mar 05, 2021 at 01:49:03PM +0100, Christophe Leroy wrote:
> Le 05/03/2021 à 12:58, Michael Ellerman a écrit :
> >prom_init runs as an OF client, with the MMU off (except on some Apple
> >machines), and we don't own the MMU. So there's really nothing we can do :)
> >
> >Though now that I look at it, I don't think we should be doing this
> >level of commandline handling in prom_init. It should just grab the
> >value from firmware and pass it to the kernel proper, and then all the
> >prepend/append/force etc. logic should happen there.
> 
> But then, how do you handle the command line parameters that are needed by 
> prom_init ?
> 
> For instance, prom_init_mem() use 'prom_memory_limit', which comes from the 
> "mem=" option in the command line.

*Reading* it is easy, much easier than modifying it.


Segher


Re: [PATCH v2 1/7] cmdline: Add generic function to build command line.

2021-03-05 Thread Segher Boessenkool
On Fri, Mar 05, 2021 at 10:58:02PM +1100, Michael Ellerman wrote:
> Will Deacon  writes:
> > That's very similar to us; we're not relocated, although we are at least
> > in control of the MMU (which is using a temporary set of page-tables).
> 
> prom_init runs as an OF client, with the MMU off (except on some Apple
> machines), and we don't own the MMU. So there's really nothing we can do :)

You *could* take over all memory mapping.  This is complex, and I
estimate the change you get this to work correctly on all supported
systems to be between -400% and 0%.

And not very long later Linux jettisons OF completely anyway.

> Though now that I look at it, I don't think we should be doing this
> level of commandline handling in prom_init. It should just grab the
> value from firmware and pass it to the kernel proper, and then all the
> prepend/append/force etc. logic should happen there.

That sounds much simpler, yes :-)


Segher


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-05 Thread Segher Boessenkool
On Fri, Mar 05, 2021 at 07:38:25AM +0100, Christophe Leroy wrote:
> Le 04/03/2021 à 20:24, Segher Boessenkool a écrit :
> https://github.com/linuxppc/linux/commit/a9a3ed1eff36
> 
> >
> >That is much heavier than needed (an mb()).  You can just put an empty
> >inline asm after a call before a return, and that call cannot be
> >optimised to a sibling call: (the end of a function is an implicit
> >return:)

> In the commit mentionned at the top, it is said:
> 
> The next attempt to prevent compilers from tail-call optimizing
> the last function call cpu_startup_entry(), ... , was to add an empty 
> asm("").
> 
> This current solution was short and sweet, and reportedly, is supported
> by both compilers but we didn't get very far this time: future (LTO?)
> optimization passes could potentially eliminate this,

This is simply not true.  A volatile inline asm (like this is, all
asm without outputs are) is always run on the reel machine exactly like
on the abstract machine.  LTO can not eliminate it, not more than any
other optimisation can.  The compiler makes no assumption about the
constents of the template of an asm, empty or not.

If you are really scared the compiler violates the rules of GCC inline
asm and thinks it knows what "" means, you can write
  asm(";#");
(that is a comment on all supported archs).

> which leads us
> to the third attempt: having an actual memory barrier there which the
> compiler cannot ignore or move around etc.

Why would it not be allowed to delete this, and delete some other asm?

And the compiler *can* move around asm like this.  But the point is,
it has to stay in order with other side effects, so there cannot be a
sibling call here, the call has to remain: any call contains a sequence
point, so side effects cannot be reordered over it, so the call (being
before the asm) cannot be transformed to a tail call.


Segher


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Segher Boessenkool
Hi!

On Thu, Mar 04, 2021 at 02:57:30PM +, Mark Rutland wrote:
> It looks like GCC is happy to give us the function-entry-time FP if we use
> __builtin_frame_address(1),

>From the GCC manual:
 Calling this function with a nonzero argument can have
 unpredictable effects, including crashing the calling program.  As
 a result, calls that are considered unsafe are diagnosed when the
 '-Wframe-address' option is in effect.  Such calls should only be
 made in debugging situations.

It *does* warn (the warning is in -Wall btw), on both powerpc and
aarch64.  Furthermore, using this builtin causes lousy code (it forces
the use of a frame pointer, which we normally try very hard to optimise
away, for good reason).

And, that warning is not an idle warning.  Non-zero arguments to
__builtin_frame_address can crash the program.  It won't on simpler
functions, but there is no real definition of what a simpler function
*is*.  It is meant for debugging, not for production use (this is also
why no one has bothered to make it faster).

On Power it should work, but on pretty much any other arch it won't.

> Unless we can get some strong guarantees from compiler folk such that we
> can guarantee a specific function acts boundary for unwinding (and
> doesn't itself get split, etc), the only reliable way I can think to
> solve this requires an assembly trampoline. Whatever we do is liable to
> need some invasive rework.

You cannot get such a guarantee, other than not letting the compiler
see into the routine at all, like with assembler code (not inline asm,
real assembler code).

The real way forward is to bite the bullet and to no longer pretend you
can do a full backtrace from just the stack contents.  You cannot.


Segher


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Segher Boessenkool
On Thu, Mar 04, 2021 at 09:54:44AM -0800, Nick Desaulniers wrote:
> On Thu, Mar 4, 2021 at 9:42 AM Marco Elver  wrote:
> include/linux/compiler.h:246:
> prevent_tail_call_optimization
> 
> commit a9a3ed1eff36 ("x86: Fix early boot crash on gcc-10, third try")

That is much heavier than needed (an mb()).  You can just put an empty
inline asm after a call before a return, and that call cannot be
optimised to a sibling call: (the end of a function is an implicit
return:)

Instead of:

void g(void);
void f(int x)
if (x)
g();
}

Do:

void g(void);
void f(int x)
if (x)
g();
asm("");
}

This costs no extra instructions, and certainly not something as heavy
as an mb()!  It works without the "if" as well, of course, but with it
it is a more interesting example of a tail call.


Segher


Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-02 Thread Segher Boessenkool
On Tue, Mar 02, 2021 at 10:40:03PM +1100, Michael Ellerman wrote:
> >> -- Change the unwinder, if it's possible for ppc32.
> >
> > I don't think it is possible.
> 
> I think this actually is the solution.
> 
> It seems the good architectures have all added support for
> arch_stack_walk(), and we have not.

I have no idea what arch_stack_walk does, but some background info:

PowerPC functions that do save the LR (== the return address), and/or
that set up a new stack frame, do not do this at the start of the
function necessarily (it is a lot faster to postpone this, even if you
always have to do it).  So, in a leaf function it isn't always known if
this has been done (in all callers further up it is always done, of
course).  If you have DWARF unwind info all is fine of course, but you
do not have that in the kernel.

> So I think it's probably on us to update to that new API. Or at least
> update our save_stack_trace() to fabricate an entry using the NIP, as it
> seems that's what callers expect.

This sounds very expensive?  If it is only a debug feature that won't
be used in production that does not matter, but it worries me.


Segher



Re: [PATCH v1 01/15] powerpc/uaccess: Remove __get_user_allowed() and unsafe_op_wrap()

2021-03-01 Thread Segher Boessenkool
On Tue, Mar 02, 2021 at 09:02:54AM +1100, Daniel Axtens wrote:
> Checkpatch does have one check that is relevant:
> 
> CHECK: Macro argument reuse 'p' - possible side-effects?
> #36: FILE: arch/powerpc/include/asm/uaccess.h:482:
> +#define unsafe_get_user(x, p, e) do {
> \
> + if (unlikely(__get_user_nocheck((x), (p), sizeof(*(p)), false)))\
> + goto e; \
> +} while (0)

sizeof (of something other than a VLA) does not evaluate its operand.
The checkpatch warning is incorrect (well, it does say "possible" --
it just didn't find a possible problem here).

You can write
  bla = sizeof *p++;
and p is *not* incremented.


Segher


Re: {standard input}:577: Error: unsupported relocation against base

2021-02-17 Thread Segher Boessenkool
Hi!

On Wed, Feb 17, 2021 at 04:43:45PM +1100, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Tue, Feb 16, 2021 at 08:36:02PM +1100, Michael Ellerman wrote:
> >> Feng Tang  writes:
> >> >   {standard input}:577: Error: unsupported relocation against base
> >> >   {standard input}:580: Error: unsupported relocation against base
> >> >   {standard input}:583: Error: unsupported relocation against base
> >
> >> > The reason is macro 'mfdcr' requirs an instant number as parameter,
> >> > which is not met by show_plbopb_regs().
> >> 
> >> It doesn't require a constant, it checks if the argument is constant:
> >> 
> >> #define mfdcr(rn)  \
> >>({unsigned int rval;\
> >>if (__builtin_constant_p(rn) && rn < 1024)  \
> >>asm volatile("mfdcr %0," __stringify(rn)\
> >>  : "=r" (rval));   \
> >>else if (likely(cpu_has_feature(CPU_FTR_INDEXED_DCR)))  \
> >>rval = mfdcrx(rn);  \
> >>else\
> >>rval = __mfdcr(rn); \
> >>rval;})
> >
> > It requires a constant number with known (at compile time) value, while
> > __builtin_constant_p checks for any constant.  The address of some
> > defined symbol is a constant as well normally, for example.
> >
> > It's better to write that asm as
> > asm volatile("mfdcr %0,%1" : "=r" (rval) : "n"(rn));
> > btw (the "n" constraint means "constant integer with known value" (it
> > stands for "numeric"), while the "i" constraint means just "constant
> > integer").
> 
> Actually that fixes it.

Huh interesting.  I was going to suggest to use __is_constexpr instead,
but that should return true for slightly fewer expressions (but probably
still okay in this case).

> diff --git a/arch/powerpc/include/asm/dcr-native.h 
> b/arch/powerpc/include/asm/dcr-native.h
> index 7141ccea8c94..d143308b0f95 100644
> --- a/arch/powerpc/include/asm/dcr-native.h
> +++ b/arch/powerpc/include/asm/dcr-native.h
> @@ -53,8 +53,8 @@ static inline void mtdcrx(unsigned int reg, unsigned int 
> val)
>  #define mfdcr(rn)  \
> ({unsigned int rval;\
> if (__builtin_constant_p(rn) && rn < 1024)  \
> -   asm volatile("mfdcr %0," __stringify(rn)\
> - : "=r" (rval));   \
> +   asm volatile("mfdcr %0, %1" : "=r" (rval)   \
> +: "n" (rn));   \
> else if (likely(cpu_has_feature(CPU_FTR_INDEXED_DCR)))  \
> rval = mfdcrx(rn);  \
> else\
> 
> 
> I guess because we give the compiler time to work out the constant,
> rather than stringifying it immediately.

Yeah, this is not preprocessor magic, the compiler actually does its
thing here.  It still won't work if the compiler cannot reduce the
expression to a number (but it does see it is constant), but that is the
best you can do here probably.


Segher


Re: {standard input}:577: Error: unsupported relocation against base

2021-02-16 Thread Segher Boessenkool
Hi!

On Tue, Feb 16, 2021 at 08:36:02PM +1100, Michael Ellerman wrote:
> Feng Tang  writes:
> >   {standard input}:577: Error: unsupported relocation against base
> >   {standard input}:580: Error: unsupported relocation against base
> >   {standard input}:583: Error: unsupported relocation against base

> > The reason is macro 'mfdcr' requirs an instant number as parameter,
> > which is not met by show_plbopb_regs().
> 
> It doesn't require a constant, it checks if the argument is constant:
> 
> #define mfdcr(rn) \
>   ({unsigned int rval;\
>   if (__builtin_constant_p(rn) && rn < 1024)  \
>   asm volatile("mfdcr %0," __stringify(rn)\
> : "=r" (rval));   \
>   else if (likely(cpu_has_feature(CPU_FTR_INDEXED_DCR)))  \
>   rval = mfdcrx(rn);  \
>   else\
>   rval = __mfdcr(rn); \
>   rval;})

It requires a constant number with known (at compile time) value, while
__builtin_constant_p checks for any constant.  The address of some
defined symbol is a constant as well normally, for example.

It's better to write that asm as
asm volatile("mfdcr %0,%1" : "=r" (rval) : "n"(rn));
btw (the "n" constraint means "constant integer with known value" (it
stands for "numeric"), while the "i" constraint means just "constant
integer").

> But the error you're seeing implies the compiler is choosing the first
> leg of the if, even when rn == "base + x", which is surprising.
> 
> We've had cases in the past of __builtin_constant_p() returning false
> for things that a human can see are constant at build time, but I've
> never seen the reverse.

And it doesn't here :-)

But, you need some way to figure out an arg is a constant known number
here.  We don't have a builtin for that I think.  Maybe some trick can
be done?  Maybe simply test "rn >= 0" as well, does that work?


Segher


Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()

2021-02-11 Thread Segher Boessenkool
On Thu, Feb 11, 2021 at 03:09:43PM +0100, Christophe Leroy wrote:
> Le 11/02/2021 à 12:49, Segher Boessenkool a écrit :
> >On Thu, Feb 11, 2021 at 07:41:52AM +, Christophe Leroy wrote:
> >>powerpc BUG_ON() is based on using twnei or tdnei instruction,
> >>which obliges gcc to format the condition into a 0 or 1 value
> >>in a register.
> >
> >Huh?  Why is that?
> >
> >Will it work better if this used __builtin_trap?  Or does the kernel only
> >detect very specific forms of trap instructions?
> 
> We already made a try with __builtin_trap() 1,5 year ago, see 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20510ce03cc9463f1c9e743c1d93b939de501b53.1566219503.git.christophe.le...@c-s.fr/
> 
> The main problems encountered are:
> - It is only possible to use it for BUG_ON, not for WARN_ON because GCC 
> considers it as noreturn. Is there any workaround ?

A trap is noreturn by definition:

 -- Built-in Function: void __builtin_trap (void)
 This function causes the program to exit abnormally.

> - The kernel (With CONFIG_DEBUG_BUGVERBOSE) needs to be able to identify 
> the source file and line corresponding to the trap. How can that be done 
> with __builtin_trap() ?

The DWARF debug info should be sufficient.  Perhaps you can post-process
some way?

You can create a trap that falls through yourself (by having a trap-on
condition with a condition that is always true, but make the compiler
not see that).  This isn't efficient though.

Could you file a feature request (in bugzilla)?  It is probably useful
for generic code as well, but we could implement this for powerpc only
if needed.


Segher


Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()

2021-02-11 Thread Segher Boessenkool
On Thu, Feb 11, 2021 at 01:26:12PM +0100, Christophe Leroy wrote:
> >What PowerPC cpus implement branch folding?  I know none.
> 
> Extract from powerpc mpc8323 reference manual:

> — Zero-cycle branch capability (branch folding)

Yeah, this is not what is traditionally called branch folding (which
stores the instruction being branched to in some cache, originally the
instruction cache itself; somewhat similar (but different) to the BTIC
on 750).  Overloaded terminology :-)

6xx/7xx CPUs had the branch execution unit in the frontend, and it would
not issue a branch at all if it could be resolved then already.  Power4
and later predict all branches, and most are not issued at all (those
that do need to be executed, like bdnz, are).  At completion time it is
checked if the prediction was correct (and corrective action is taken if
not).


Segher


Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()

2021-02-11 Thread Segher Boessenkool
On Thu, Feb 11, 2021 at 08:04:55PM +1000, Nicholas Piggin wrote:
> Excerpts from Christophe Leroy's message of February 11, 2021 5:41 pm:
> > As modern powerpc implement branch folding, that's even more efficient.

Ah, it seems you mean what Arm calls branch folding.  Sure, power4
already did that, and this has not changed.

> I think POWER will speculate conditional traps as non faulting always
> so it should be just as good if not better than the branch.

Right, these are not branch instructions, so are not branch predicted;
all trap instructions are assumed to fall through, like all other
non-branch instructions.


Segher


Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()

2021-02-11 Thread Segher Boessenkool
On Thu, Feb 11, 2021 at 07:41:52AM +, Christophe Leroy wrote:
> powerpc BUG_ON() is based on using twnei or tdnei instruction,
> which obliges gcc to format the condition into a 0 or 1 value
> in a register.

Huh?  Why is that?

Will it work better if this used __builtin_trap?  Or does the kernel only
detect very specific forms of trap instructions?

> By using a generic implementation, gcc will generate a branch
> to the unconditional trap generated by BUG().

That is many more instructions than ideal.

> As modern powerpc implement branch folding, that's even more efficient.

What PowerPC cpus implement branch folding?  I know none.

Some example code generated via __builtin_trap:

void trap(void) { __builtin_trap(); }
void trap_if_0(int x) { if (x == 0) __builtin_trap(); }
void trap_if_not_0(int x) { if (x != 0) __builtin_trap(); }

-m64:

trap:
trap
trap_if_0:
tdeqi 3,0
blr
trap_if_not_0:
tdnei 3,0
blr

-m32:

trap:
trap
trap_if_0:
tweqi 3,0
blr
trap_if_not_0:
twnei 3,0
blr


Segher


Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()

2021-02-11 Thread Segher Boessenkool
On Thu, Feb 11, 2021 at 08:04:55PM +1000, Nicholas Piggin wrote:
> It would be nice if we could have a __builtin_trap_if that gcc would use 
> conditional traps with, (and which never assumes following code is 
> unreachable even for constant true, so we can use it with WARN and put 
> explicit unreachable for BUG).

It automatically does that with just __builtin_trap, see my other mail :-)


Segher


Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-09 Thread Segher Boessenkool
On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
> What if you did this?

> +static inline struct task_struct *get_current(void)
> +{
> + register struct task_struct *task asm ("r2");
> +
> + return task;
> +}

Local register asm variables are *only* guaranteed to live in that
register as operands to an asm.  See
  
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
("The only supported use" etc.)

You can do something like

static inline struct task_struct *get_current(void)
{
register struct task_struct *task asm ("r2");

asm("" : "+r"(task));

return task;
}

which makes sure that "task" actually is in r2 at the point of that asm.


Segher


Re: [PATCH v4 11/23] powerpc/syscall: Rename syscall_64.c into syscall.c

2021-02-02 Thread Segher Boessenkool
On Tue, Feb 02, 2021 at 04:38:31PM +1000, Nicholas Piggin wrote:
> > So to avoid confusion, I'll rename it. But I think "interrupt" is maybe not 
> > the right name. An 
> > interrupt most of the time refers to IRQ.
> 
> That depends what you mean by interrupt and IRQ.

In the PowerPC architecture, an exception is an abnormal condition, and
that can often cause an interrupt.  What Christophe colloquially calls
an "IRQ" here is called an external exception c.q. external interrupt.

> But if you say irq it's more likely to mean
> a device interrupt, and "interrupt" usually refres to the asynch
> ones.

Power talks about "instruction-caused interrupts", for one aspect of the
difference here; and "precise" / "imprecise" interrupts for another.

> So it's really fine to use the proper arch-specific names for things
> in arch code. I'm trying to slowly change names from exception to
> interrupt.

Thanks :-)

> > For me system call is not an interrupt in the way it 
> > doesn't unexpectedly interrupt a program flow. In powerpc manuals it is 
> > generally called exceptions, 
> > no I'm more inclined to call it exception.c
> 
> Actually that's backwards. Powerpc manuals (at least the one I look at) 
> calls them all interrupts including system calls, and also the system
> call interrupt is actually the only one that doesn't appear to be
> associated with an exception.

Yeah.  You could easily make such an exception, which is set when you
execute a system call instruction, and cleared when the interrupt is
taken, of course; but the architecture doesn't.

> And on the other hand you can deal with exceptions in some cases without
> taking an interrupt at all. For example if you had MSR[EE]=0 you could
> change the decrementer or execute msgclr or change HMER SPR etc to clear
> various exceptions without ever taking the interrupt.

A well-known example is the exception bits in the FPSCR, which do not
cause an interrupt unless the corresponding enable bits are also set.


Segher


Re: [PATCH v2] powerpc: Handle .text.{hot,unlikely}.* in linker script

2021-01-16 Thread Segher Boessenkool
Hi!

Very late of course, and the patch is fine, but:

On Mon, Jan 04, 2021 at 01:59:53PM -0700, Nathan Chancellor wrote:
> Commit eff8728fe698 ("vmlinux.lds.h: Add PGO and AutoFDO input
> sections") added ".text.unlikely.*" and ".text.hot.*" due to an LLVM
> change [1].
> 
> After another LLVM change [2], these sections are seen in some PowerPC
> builds, where there is a orphan section warning then build failure:
> 
> $ make -skj"$(nproc)" \
>ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 O=out \
>distclean powernv_defconfig zImage.epapr
> ld.lld: warning: kernel/built-in.a(panic.o):(.text.unlikely.) is being placed 
> in '.text.unlikely.'

Is the section really called ".text.unlikely.", i.e. the name ending in
a dot?  How very unusual, is there some bug elsewhere?


Segher


Re: [PATCH] powerpc:Don't print raw EIP/LR hex values in dump_stack() and show_regs()

2020-12-22 Thread Segher Boessenkool
On Tue, Dec 22, 2020 at 09:45:03PM +0800, Xiaoming Ni wrote:
> On 2020/12/22 1:12, Segher Boessenkool wrote:
> >On Mon, Dec 21, 2020 at 04:42:23PM +, David Laight wrote:
> >>From: Segher Boessenkool
> >>>Sent: 21 December 2020 16:32
> >>>
> >>>On Mon, Dec 21, 2020 at 04:17:21PM +0100, Christophe Leroy wrote:
> >>>>Le 21/12/2020 à 04:27, Xiaoming Ni a écrit :
> >>>>>Since the commit 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR
> >>>>>infrastructure"), the powerpc system is ready to support KASLR.
> >>>>>To reduces the risk of invalidating address randomization, don't print 
> >>>>>the
> >>>>>EIP/LR hex values in dump_stack() and show_regs().
> >>>
> >>>>I think your change is not enough to hide EIP address, see below a dump
> >>>>with you patch, you get "Faulting instruction address: 0xc03a0c14"
> >>>
> >>>As far as I can see the patch does nothing to the GPR printout.  Often
> >>>GPRs contain code addresses.  As one example, the LR is moved via a GPR
> >>>(often GPR0, but not always) for storing on the stack.
> >>>
> >>>So this needs more work.
> >>
> >>If the dump_stack() is from an oops you need the real EIP value
> >>on order to stand any chance of making headway.
> >
> >Or at least the function name + offset, yes.
> >
> When the system is healthy, only symbols and offsets are printed,
> Output address and symbol + offset when the system is dying
> Does this meet both debugging and security requirements?

If you have the vmlinux, sym+off is enough to find what instruction
caused the crash.

It does of course not give all the information you get in a crash dump
with all the registers, so it does hinder debugging a bit.  This is a
tradeoff.

Most debugging will need xmon or similar (or printf-style debugging)
anyway; and otoh the register dump will render KASLR largely
ineffective.

> For example:
> 
> +static void __show_regs_ip_lr(const char *flag, unsigned long addr)
> +{
> + if (system_going_down()) { /* panic oops reboot */
> + pr_cont("%s["REG"] %pS", flag, addr, (void *)addr);
> + } else {
> + pr_cont("%s%pS", flag, (void *)addr);
> + }
> +}

*If* you are certain the system goes down immediately, and you are also
certain this information will not help defeat ASLR after a reboot, you
could just print whatever, sure.

Otherwise, you only want to show some very few registers.  Or, make sure
no attackers can ever see these dumps (which is hard, many systems trust
all (local) users with it!)  Which means we first will need some very
different patches, before any of this can be much useful :-(


Segher


Re: [PATCH] powerpc:Don't print raw EIP/LR hex values in dump_stack() and show_regs()

2020-12-21 Thread Segher Boessenkool
On Mon, Dec 21, 2020 at 04:42:23PM +, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 21 December 2020 16:32
> > 
> > On Mon, Dec 21, 2020 at 04:17:21PM +0100, Christophe Leroy wrote:
> > > Le 21/12/2020 à 04:27, Xiaoming Ni a écrit :
> > > >Since the commit 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR
> > > >infrastructure"), the powerpc system is ready to support KASLR.
> > > >To reduces the risk of invalidating address randomization, don't print 
> > > >the
> > > >EIP/LR hex values in dump_stack() and show_regs().
> > 
> > > I think your change is not enough to hide EIP address, see below a dump
> > > with you patch, you get "Faulting instruction address: 0xc03a0c14"
> > 
> > As far as I can see the patch does nothing to the GPR printout.  Often
> > GPRs contain code addresses.  As one example, the LR is moved via a GPR
> > (often GPR0, but not always) for storing on the stack.
> > 
> > So this needs more work.
> 
> If the dump_stack() is from an oops you need the real EIP value
> on order to stand any chance of making headway.

Or at least the function name + offset, yes.

> Otherwise you might just as well just print 'borked - tough luck'.

Yes.  ASLR is a house of cards.  But that isn't constructive wrt this
patch :-)


Segher


Re: [PATCH] powerpc:Don't print raw EIP/LR hex values in dump_stack() and show_regs()

2020-12-21 Thread Segher Boessenkool
On Mon, Dec 21, 2020 at 04:17:21PM +0100, Christophe Leroy wrote:
> Le 21/12/2020 à 04:27, Xiaoming Ni a écrit :
> >Since the commit 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR
> >infrastructure"), the powerpc system is ready to support KASLR.
> >To reduces the risk of invalidating address randomization, don't print the
> >EIP/LR hex values in dump_stack() and show_regs().

> I think your change is not enough to hide EIP address, see below a dump 
> with you patch, you get "Faulting instruction address: 0xc03a0c14"

As far as I can see the patch does nothing to the GPR printout.  Often
GPRs contain code addresses.  As one example, the LR is moved via a GPR
(often GPR0, but not always) for storing on the stack.

So this needs more work.


Segher


Re: [PATCH v2 3/4] Kbuild: make DWARF version a choice

2020-12-01 Thread Segher Boessenkool
On Tue, Dec 01, 2020 at 12:38:16PM +0900, Masahiro Yamada wrote:
> > We can bump -Wa,-gdwarf-2 to -Wa,-gdwarf-3 since GNU actually emits
> > DWARF v3 DW_AT_ranges (see
> > https://sourceware.org/bugzilla/show_bug.cgi?id=26850 )
> > This can avoid the `warning: DWARF2 only supports one section per
> > compilation unit` warning for Clang.

That warning should be "there can be only one section with executable
code per translation unit", or similar.

> I am not a DWARF spec expert.

Neither am I.

> Please teach me.
> 
> In my understanding, "DWARF2 only supports one section ..."
> is warned only when building .S files with LLVM_IAS=1

.S files are simply run through the C preprocessor first, and then given
to the assembler.  The only difference there should be wrt debug info is
you could have some macros that expand to assembler debug statements.

> If this is due to the limitation of DWARF v2, why is it OK to
> build .c files with LLVM_IAS?

The compiler can of course make sure not to use certain constructs in
its generated assembler code, while the assembler will have to swallow
whatever the user wrote.


Segher


Re: [PATCH v2 3/4] Kbuild: make DWARF version a choice

2020-11-24 Thread Segher Boessenkool
On Tue, Nov 24, 2020 at 11:56:02AM -0500, Arvind Sankar wrote:
> On Mon, Nov 23, 2020 at 06:33:57PM -0600, Segher Boessenkool wrote:
> > On Mon, Nov 23, 2020 at 06:22:10PM -0500, Arvind Sankar wrote:
> > > Btw, is -gsplit-dwarf at all useful for assembler files?
> > 
> > If you invoke the assembler via the compiler, with that flag it still
> > creates separate .o and .dwo files (via objcopy invocations as usual).
> > Whether that is useful depends on if you have any debug info that can
> > be split :-)
> 
> Right, the latter was what I was really asking :) We don't currently
> pass -gsplit-dwarf for assembler and I was wondering if that mattered.

If there is any debug info in the .s files, it will all end up in the .o
file, not a .dwo file.  That may matter aesthetically, and it can cost a
few bytes of disk space, but it doesn't matter functionally (GDB will
search in both places).


Segher


Re: [PATCH v2 3/4] Kbuild: make DWARF version a choice

2020-11-23 Thread Segher Boessenkool
On Mon, Nov 23, 2020 at 06:22:10PM -0500, Arvind Sankar wrote:
> Btw, is -gsplit-dwarf at all useful for assembler files?

If you invoke the assembler via the compiler, with that flag it still
creates separate .o and .dwo files (via objcopy invocations as usual).
Whether that is useful depends on if you have any debug info that can
be split :-)


Segher


Re: violating function pointer signature

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 05:42:34PM +, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 19 November 2020 16:35
> > I just meant "valid C language code as defined by the standards".  Many
> > people want all UB to just go away, while that is *impossible* to do for
> > many compilers: for example where different architectures or different
> > ABIs have contradictory requirements.
> 
> Some of the UB in the C language are (probably) there because
> certain (now obscure) hardware behaved that way.

Yes.

> For instance integer arithmetic may saturate on overflow
> (or do even stranger things if the sign is a separate bit).

And some still does!

> I'm not quite sure it was ever possible to write a C compiler
> for a cpu that processed numbers in ASCII (up to 10 digits),
> binary arithmetic was almost impossible.

A machine that really stores decimal numbers?  Not BCD or the like?
Yeah wow, that will be hard.

> There are also the CPU that only have 'word' addressing - so
> that 'pointers to characters' take extra instructions.

Such machines are still made, and are programmed in C as well.

> ISTM that a few years ago the gcc developers started looking
> at some of these 'UB' and decided they could make use of
> them to make some code faster (and break other code).

When UB would happen in some situation, the compiler can simply assume
that situation does not happen.  This makes it possible to do a lot of
optimisations (many to do with loops) that cannot be done otherwise
(including those to do with signed overflow).  And many of those
optimisations are worthwhile.

> One of the problems with UB is that whereas you might expect
> UB arithmetic to generate an unexpected result and/or signal
> it is completely open-ended and could fire an ICBM at the coder.

Yes, UB is undefined behaviour.  Unspecified is something else (and C
has that as well, also implementation-defined, etc.)

In some cases GCC (and any other modern compiler) can make UB be IB
instead, with some flag for example, like -fno-strict-* does.  In other
cases it isn't so easy at all.  In cases like you have here (where the
validity of what you want to do depends on the ABI in effect) things are
not easy :-/


Segher


Re: violating function pointer signature

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 09:59:51AM -0500, Steven Rostedt wrote:
> On Thu, 19 Nov 2020 08:37:35 -0600
> Segher Boessenkool  wrote:
> > > Note that we have a fairly extensive tradition of defining away UB with
> > > language extentions, -fno-strict-overflow, -fno-strict-aliasing,  
> > 
> > These are options to make a large swath of not correct C programs
> > compile (and often work) anyway.  This is useful because there are so
> > many such programs, because a) people did not lint; and/or b) the
> > problem never was obvious with some other (or older) compiler; and/or
> > c) people do not care about writing portable C and prefer writing in
> > their own non-C dialect.
> 
> Note, this is not about your average C program. This is about the Linux
> kernel, which already does a lot of tricks in C.

Yes, I know.  And some of that can be supported just fine (usually
because the compiler explicitly supports it); some of that will not
cause problems in practice (e.g. the scope it could cause problems in
is very limited); and some of that is just waiting to break down.  The
question is what category your situation here is in.  The middle one I
think.

> There's a lot of code in
> assembly that gets called from C (and vise versa).

That is just fine, that is what ABIs are for :-)

> We modify code on the
> fly (which tracepoints use two methods of that - with asm-goto/jump-labels
> and static functions).

And that is a lot trickier.  It can be made to work, but there are many
dark nooks and crannies problems can hide in.

> As for your point c), I'm not sure what you mean about portable C

I just meant "valid C language code as defined by the standards".  Many
people want all UB to just go away, while that is *impossible* to do for
many compilers: for example where different architectures or different
ABIs have contradictory requirements.

> (stuck to
> a single compiler, or stuck to a single architecture?). Linux obviously
> supports multiple architectures (more than any other OS), but it is pretty
> stuck to gcc as a compiler (with LLVM just starting to work too).
> 
> We are fine with being stuck to a compiler if it gives us what we want.

Right.  Just know that a compiler can make defined behaviour for *some*
things that are UB in standard C (possibly at a runtime performance
cost), but most things are not feasible.

Alexei's SPARC example (thanks!) shows that even "obvious" things are
not so obviously (or at all) implemented the way you expect on all
systems you care about.


Segher


Re: violating function pointer signature

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 09:36:48AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 18, 2020 at 01:11:27PM -0600, Segher Boessenkool wrote:
> > Calling this via a different declared function type is undefined
> > behaviour, but that is independent of how the function is *defined*.
> > Your program can make ducks appear from your nose even if that function
> > is never called, if you do that.  Just don't do UB, not even once!
> 
> Ah, see, here I think we disagree. UB is a flaw of the spec, but the
> real world often has very sane behaviour there (sometimes also very
> much not).

That attitude summons ducks.

> In this particular instance the behaviour is UB because the C spec
> doesn't want to pin down the calling convention, which is something I
> can understand.

How do you know?  Were you at the meetings where this was decided?

The most frequent reason something is made UB is when there are multiple
existing implementations with irreconcilable differences.

> But once you combine the C spec with the ABI(s) at hand,
> there really isn't two ways about it. This has to work, under the
> premise that the ABI defines a caller cleanup calling convention.

This is not clear at all (and what "caller cleanup calling convention"
would mean isn't either).  A function call at the C level does not
necessarily correspond at all with a function call at the ABI level, to
begin with.

> So in the view that the compiler is a glorified assembler,

But it isn't.

> Note that we have a fairly extensive tradition of defining away UB with
> language extentions, -fno-strict-overflow, -fno-strict-aliasing,

These are options to make a large swath of not correct C programs
compile (and often work) anyway.  This is useful because there are so
many such programs, because a) people did not lint; and/or b) the
problem never was obvious with some other (or older) compiler; and/or
c) people do not care about writing portable C and prefer writing in
their own non-C dialect.

> -fno-delete-null-pointer-checks etc..

This was added as a security hardening feature.  It of course also is
useful for other things -- most flags are.  It was not added to make yet
another dialect.


Segher


Re: violating function pointer signature

2020-11-18 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 02:33:43PM -0500, Steven Rostedt wrote:
> On Wed, 18 Nov 2020 13:11:27 -0600
> Segher Boessenkool  wrote:
> 
> > Calling this via a different declared function type is undefined
> > behaviour, but that is independent of how the function is *defined*.
> > Your program can make ducks appear from your nose even if that function
> > is never called, if you do that.  Just don't do UB, not even once!
> 
> But that's the whole point of this conversation. We are going to call this
> from functions that are going to have some random set of parameters.



> And you see the above, the macro does:
> 
>   ((void(*)(void *, proto))(it_func))(__data, args);

Yup.

> With it_func being the func from the struct tracepoint_func, which is a
> void pointer, it is typecast to the function that is defined by the
> tracepoint. args is defined as the arguments that match the proto.

If you have at most four or so args, what you wnat to do will work on
all systems the kernel currently supports, as far as I can tell.  It
is not valid C, and none of the compilers have an extension for this
either.  But it will likely work.

> The problem we are solving is on the removal case, if the memory is tight,
> it is possible that the new array can not be allocated. But we must still
> remove the called function. The idea in this case is to replace the
> function saved with a stub. The above loop will call the stub and not the
> removed function until another update happens.
> 
> This thread is about how safe is it to call:
> 
> void tp_stub_func(void) { return ; }
> 
> instead of the function that was removed?

Exactly as safe as calling a stub defined in asm.  The undefined
behaviour happens if your program has such a call, it doesn't matter
how the called function is defined, it doesn't have to be C.

> Thus, we are indeed calling that stub function from a call site that is not
> using the same parameters.
> 
> The question is, will this break?

It is unlikely to break if you use just a few arguments, all of simple
scalar types.  Just hope you will never encounter a crazy ABI :-)


Segher


Re: violating function pointer signature

2020-11-18 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 01:58:23PM -0500, Steven Rostedt wrote:
> I wonder if we should define on all architectures a void void_stub(void),
> in assembly, that just does a return, an not worry about gcc messing up the
> creation of the stub function.
> 
> On x86_64:
> 
> GLOBAL(void_stub)
>   retq
> 
> 
> And so on.

I don't see how you imagine a compiler to mess this up?

void void_stub(void) { }

will do fine?

.globl  void_stub
.type   void_stub, @function
void_stub:
.LFB0:
.cfi_startproc
ret
.cfi_endproc
.LFE0:
.size   void_stub, .-void_stub


Calling this via a different declared function type is undefined
behaviour, but that is independent of how the function is *defined*.
Your program can make ducks appear from your nose even if that function
is never called, if you do that.  Just don't do UB, not even once!


Segher


Re: violating function pointer signature

2020-11-18 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 07:31:50PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
> >> I could change the stub from (void) to () if that would be better.
> >
> > Don't?  In a function definition they mean exactly the same thing (and
> > the kernel uses (void) everywhere else, which many people find clearer).
> 
> And I think () functions expected a caller-provided parameter save
> area on powerpc64le, while (void) functions do not.

Like I said (but you cut off, didn't realise it matters I guess):

> > In a function declaration that is not part of a definition it means no
> > information about the arguments is specified, a quite different thing.

Since the caller does not know if the callee will need a save area, it
has to assume it does.  Similar is true for many ABIs.

> It does not
> matter for an empty function, but GCC prefers to use the parameter
> save area instead of setting up a stack frame if it is present.  So
> you get stack corruption if you call a () function as a (void)
> function.  (The other way round is fine.)

If you have no prototype for a function, you have to assume worst case,
yes.  Calling things "a () function" can mean two things (a declaration
that is or isn't a definition, two very different things), so it helps
to be explicit about it.

Just use (void) and do not worry :-)


Segher


Re: violating function pointer signature

2020-11-18 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
> I could change the stub from (void) to () if that would be better.

Don't?  In a function definition they mean exactly the same thing (and
the kernel uses (void) everywhere else, which many people find clearer).

In a function declaration that is not part of a definition it means no
information about the arguments is specified, a quite different thing.

This is an obsolescent feature, too.  Many many years from now it could
perhaps mean the same as (void), just like in C++, but not yet.


Segher


Re: Error: invalid switch -me200

2020-11-16 Thread Segher Boessenkool
On Mon, Nov 16, 2020 at 02:27:12PM -0600, Scott Wood wrote:
> On Fri, 2020-11-13 at 18:50 -0600, Segher Boessenkool wrote:
> > All the others work fine (and are needed afaics), it is only -me200 that
> > doesn't exist (in mainline binutils).  Perhaps -me5500 will work for it
> > instead.
> 
> According to Wikipedia e200 is from mpc55xx (for which I don't see any
> platform support having ever been added).  e5500 is completely different (64-
> bit version of e500mc).

Ah yes, confusing processor numbers :-(  That explains, sorry.


Segher


Re: Error: invalid switch -me200

2020-11-13 Thread Segher Boessenkool
On Fri, Nov 13, 2020 at 04:37:38PM -0800, Fāng-ruì Sòng wrote:
> On Fri, Nov 13, 2020 at 4:23 PM Segher Boessenkool
>  wrote:
> > On Fri, Nov 13, 2020 at 12:14:18PM -0800, Nick Desaulniers wrote:
> > > > > > Error: invalid switch -me200
> > > > > > Error: unrecognized option -me200
> > > > >
> > > > > 251 cpu-as-$(CONFIG_E200)   += -Wa,-me200
> > > > >
> > > > > Are those all broken configs, or is Kconfig messed up such that
> > > > > randconfig can select these when it should not?
> > > >
> > > > Hmmm, looks like this flag does not exist in mainline binutils? There is
> > > > a thread in 2010 about this that Segher commented on:
> > > >
> > > > https://lore.kernel.org/linuxppc-dev/9859e645-954d-4d07-8003-ffcd2391a...@kernel.crashing.org/
> > > >
> > > > Guess this config should be eliminated?
> >
> > The help text for this config options says that e200 is used in 55xx,
> > and there *is* an -me5500 GAS flag (which probably does this same
> > thing, too).  But is any of this tested, or useful, or wanted?
> >
> > Maybe Christophe knows, cc:ed.
> 
> CC Alan Modra, a binutils global maintainer.
> 
> Alan, can the few -Wa,-m* options deleted from arch/powerpc/Makefile ?

All the others work fine (and are needed afaics), it is only -me200 that
doesn't exist (in mainline binutils).  Perhaps -me5500 will work for it
instead.


Segher


Re: Error: invalid switch -me200

2020-11-13 Thread Segher Boessenkool
On Fri, Nov 13, 2020 at 12:14:18PM -0800, Nick Desaulniers wrote:
> > > > Error: invalid switch -me200
> > > > Error: unrecognized option -me200
> > >
> > > 251 cpu-as-$(CONFIG_E200)   += -Wa,-me200
> > >
> > > Are those all broken configs, or is Kconfig messed up such that
> > > randconfig can select these when it should not?
> >
> > Hmmm, looks like this flag does not exist in mainline binutils? There is
> > a thread in 2010 about this that Segher commented on:
> >
> > https://lore.kernel.org/linuxppc-dev/9859e645-954d-4d07-8003-ffcd2391a...@kernel.crashing.org/
> >
> > Guess this config should be eliminated?

The help text for this config options says that e200 is used in 55xx,
and there *is* an -me5500 GAS flag (which probably does this same
thing, too).  But is any of this tested, or useful, or wanted?

Maybe Christophe knows, cc:ed.


Segher


Re: [PATCH v5 04/17] x86/acrn: Introduce hypercall interfaces

2020-11-03 Thread Segher Boessenkool
On Tue, Nov 03, 2020 at 05:44:35PM +0100, Borislav Petkov wrote:
> On Mon, Nov 02, 2020 at 05:18:09PM -0600, Segher Boessenkool wrote:
> > That is invalid actually: local register asm as input to an inline asm
> > should use *that* register!
> > 
> > This is all correct until LRA ("reload").  Not that "movl %xmm0,$eax"
> > works, but at least it screams its head off, as it should.
> 
> Screams how?

$ cat xmm1.s
movl %xmm0,%eax

$ x86_64-linux-as xmm1.s -o xmm1.o
xmm1.s: Assembler messages:
xmm1.s:1: Error: unsupported instruction `mov'

(This isn't an existing insn IIUC.)

> It builds fine without a single peep with -Wall here.
> 
> Btw, that's a MOVD - not a MOVL. MOVD can do xmm -> gpr moves. And
> singlestepping it with gdb does, well, something, which is clearly
> wrong but nothing complains:
> 
> => 0x5131 :movd   %xmm0,%eax
> 
> and %xmm0 has:
> 
> (gdb) p $xmm0
> $2 = {v4_float = {0.9901, 0, 0, 0}, v2_double = {5.2627153433055495e-315, 
> 0}, v16_int8 = {-92, 112, 125, 63, 
> ^^
> 
> so that is correct.

The original code had movl.  And movl is needed for GPRs.

> and that same value goes into %r8d:
> 
> mov%eax,%r8d

Which violates what is required by register asm :-(

> > Yes.  But GCC doing what you should have said instead of doing what you
> > said, is not good.
> 
> Oh well, should I open a low prio bug, would that help?

Sure, thanks!

> I probably should test with the latest gcc first, though...

Yeah...  FWIW, I tested with
x86_64-linux-gcc (GCC) 11.0.0 20201015 (experimental)
so I doubt current ToT will have it fixed, but who knows.

Thanks,


Segher


Re: [PATCH v5 04/17] x86/acrn: Introduce hypercall interfaces

2020-11-02 Thread Segher Boessenkool
On Mon, Nov 02, 2020 at 11:54:39PM +0100, Borislav Petkov wrote:
> On Mon, Nov 02, 2020 at 02:01:13PM -0600, Segher Boessenkool wrote:
> > It just asks the general_operand function, which (for registers) accepts
> > the hard registers that are accessible.  This does include the float and
> > vector etc. registers, normally.
> > 
> > But you usually have a pseudo-register there (which is always allowed
> > here), unless you used some register asm variable.
> 
> You mean like this:
> 
> ---
> int main(void)
> {
>   register float foo asm ("xmm0") = 0.99f;
> 
> asm volatile("movl %0, %%r8d\n\t"
>   "vmcall\n\t"
>   :: "g" (foo));
> 
> return 0;
> }
> ---
> 
> That works ok AFAICT:
> 
> ---
> 
>  :
>0:   55  push   %rbp
>1:   48 89 e5mov%rsp,%rbp
>4:   f3 0f 10 05 00 00 00movss  0x0(%rip),%xmm0# c 
>b:   00 
>c:   66 0f 7e c0 movd   %xmm0,%eax
>   10:   41 89 c0mov%eax,%r8d
>   13:   0f 01 c1vmcall 
>   16:   b8 00 00 00 00  mov$0x0,%eax
>   1b:   5d  pop%rbp
>   1c:   c3  retq
> 
> ---

That is invalid actually: local register asm as input to an inline asm
should use *that* register!

This is all correct until LRA ("reload").  Not that "movl %xmm0,$eax"
works, but at least it screams its head off, as it should.  And then LRA
puts it in %eax, a different register than asked for.

> gcc smartypants shuffles it through a GPR before sticking it into %r8.
> 
> It works too If I use a float immediate:
> 
> ---
> int main(void)
> {
> asm volatile("movl %0, %%r8d\n\t"
>   "vmcall\n\t"
>   :: "g" (0.99f));
>   
> return 0; 
> }
> ---
> 
> ---
>  :
>0:   55  push   %rbp
>1:   48 89 e5mov%rsp,%rbp
>4:   41 b8 a4 70 7d 3f   mov$0x3f7d70a4,%r8d
>a:   0f 01 c1vmcall 
>d:   b8 00 00 00 00  mov$0x0,%eax
>   12:   5d  pop%rbp
>   13:   c3  retq
> ---

(this one is correct code)

> or maybe I'm missing some freaky way to specify the input operand so
> that I can make it take a float register. But even if I could, it would
> error out when generating the asm, I presume, due to invalid insn or
> whatnot.

Yes.  But GCC doing what you should have said instead of doing what you
said, is not good.

> > And pseudos usually are allocated a simple integer register during
> > register allocation, in an asm that is.
> 
> Interesting.
> 
> > > Might even make people copying from bad examples
> > > to go look at the docs first...
> > 
> > Optimism is cool :-)
> 
> In my experience so far, documenting stuff better might not always bring
> the expected results but sometimes it does move people in the most
> unexpected way and miracles happen.
> 
> :-)))

Now if only we had time to document what we wrote!  We *do* write docs
to go with new code (maybe not enough always), but no one spends a lot
of time on documenting the existing compiler, or with a larger view than
just a single aspect of the compiler.  Alas.


Segher


Re: [PATCH v5 04/17] x86/acrn: Introduce hypercall interfaces

2020-11-02 Thread Segher Boessenkool
On Mon, Nov 02, 2020 at 07:34:30PM +0100, Borislav Petkov wrote:
> On Mon, Nov 02, 2020 at 12:10:00PM -0600, Segher Boessenkool wrote:
> > (It does not allow *all* memory and *all* constants, btw...  And the
> > condition for registers is not really "general register", whatever that
> > means...
> 
> I think that means general purpose registers. I.e., it won't use, say
> FPU, XMM or whatever special regs.
> 
> What does the asm() parsing code in gcc do for "g"? There should be
> some logic which constraints what register is valid...

It just asks the general_operand function, which (for registers) accepts
the hard registers that are accessible.  This does include the float and
vector etc. registers, normally.

But you usually have a pseudo-register there (which is always allowed
here), unless you used some register asm variable.  And pseudos usually
are allocated a simple integer register during register allocation, in
an asm that is.

> > I hope no one ever told you our documentation does not have white
> > lies!)
> 
> I have convinced myself of this a couple of times already so I either go
> ask our gcc friends or go look straight at gcc source. It is useful to
> know folks which hack on it so that I can ask them stupid questions and
> not go off into the weeds, trying to figure out what the documentation
> says.
> 
> But hey, if that gets the documentation improved, that's a win-win
> situation right there.

Yes :-)

> Might even make people copying from bad examples
> to go look at the docs first...

Optimism is cool :-)


Segher


Re: [PATCH v5 04/17] x86/acrn: Introduce hypercall interfaces

2020-11-02 Thread Segher Boessenkool
Hi!

On Mon, Nov 02, 2020 at 06:19:50PM +0100, Borislav Petkov wrote:
> On Mon, Nov 02, 2020 at 10:09:01AM -0600, Segher Boessenkool wrote:
> > I think that will work for x86_64.  But it won't matter much, most of
> > the time you give an immediate number.
> 
> Yeah, my question is more along the lines of "is this constraint somehow
> special, i.e., 'ir' (and not 'm') or can it be whatever, i.e., 'g'"?

movl works for moving anything g->r.  This is not true for most insns,
and not for most architectures at all (usually loading from memory has a
separate mnemonic; moving an immediate number often as well (and it does
not usually allow *every* immediate anyway, not even all numbers!)

> > It is a tiny bit neater of course (if anyone still remembers what "g"
> > is, you cannot use it much these days).
> 
> Oh I always remember what it is because it is right there in the docs:
> 
> "6.47.3.1 Simple Constraints
> 
> ...
> 
> ‘g’   Any register, memory or immediate integer operand is allowed,
>   except for registers that are not general registers."

Most people do not read the documentation, they just copy from (bad)
examples (and arguably, any example you do not really understand is a
bad example).

(It does not allow *all* memory and *all* constants, btw...  And the
condition for registers is not really "general register", whatever that
means...  I hope no one ever told you our documentation does not have
white lies!)


Segher


Re: [PATCH v5 04/17] x86/acrn: Introduce hypercall interfaces

2020-11-02 Thread Segher Boessenkool
On Mon, Nov 02, 2020 at 03:56:57PM +0100, Borislav Petkov wrote:
> On Mon, Oct 19, 2020 at 02:17:50PM +0800, shuo.a@intel.com wrote:
> > +static inline long acrn_hypercall0(unsigned long hcall_id)
> > +{
> > +   long result;
> > +
> > +   asm volatile("movl %1, %%r8d\n\t"
> > +"vmcall\n\t"
> > +: "=a" (result)
> > +: "ir" (hcall_id)
>   ^^
> 
> Not "irm"? Or simply "g" in that case?

I think that will work for x86_64.  But it won't matter much, most of
the time you give an immediate number.  It is a tiny bit neater of
course (if anyone still remembers what "g" is, you cannot use it much
these days).


Segher


Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE

2020-10-29 Thread Segher Boessenkool
On Wed, Oct 28, 2020 at 10:57:45PM -0400, Arvind Sankar wrote:
> On Wed, Oct 28, 2020 at 04:20:01PM -0700, Alexei Starovoitov wrote:
> > All compilers have bugs. Kernel has bugs. What can go wrong?

Heh.

> +linux-toolchains. GCC updated the documentation in 7.x to discourage
> people from using the optimize attribute.
> 
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=893100c3fa9b3049ce84dcc0c1a839ddc7a21387

https://patchwork.ozlabs.org/project/gcc/patch/20151213081911.GA320@x4/
has all the discussion around that GCC patch.


Segher


Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"

2020-10-24 Thread Segher Boessenkool
On Fri, Oct 23, 2020 at 09:28:59PM +, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 23 October 2020 19:27
> > On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> > > On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
> > > On arm64 when callee expects a 32bit argument, the caller is *not* 
> > > responsible
> > > for clearing the upper half of 64bit register used to pass the value - it 
> > > only
> > > needs to store the actual value into the lower half.  The callee must 
> > > consider
> > > the contents of the upper half of that register as undefined.  See 
> > > AAPCS64 (e.g.
> > > https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules
> > > ); AFAICS, the relevant bit is
> > >   "Unlike in the 32-bit AAPCS, named integral values must be narrowed by
> > > the callee rather than the caller."
> > 
> > Or the formal rule:
> > 
> > C.9 If the argument is an Integral or Pointer Type, the size of the
> > argument is less than or equal to 8 bytes and the NGRN is less
> > than 8, the argument is copied to the least significant bits in
> > x[NGRN]. The NGRN is incremented by one. The argument has now
> > been allocated.
> 
> So, in essence, if the value is in a 64bit register the calling
> code is independent of the actual type of the formal parameter.
> Clearly a value might need explicit widening.

No, this says that if you pass a 32-bit integer in a 64-bit register,
then the top 32 bits of that register hold an undefined value.

> I've found a copy of the 64 bit arm instruction set.
> Unfortunately it is alpha sorted and repetitive so shows none
> of the symmetry and makes things difficult to find.

All of this is ABI, not ISA.  Look at the AAPCS64 pointed to above.

> But, contrary to what someone suggested most register writes
> (eg from arithmetic) seem to zero/extend the high bits.

Everything that writes a "w" does, yes.  But that has nothing to do with
the parameter passing rules, that is ABI.  It just means that very often
a 32-bit integer will be passed zero-extended in a 64-bit register, but
that is just luck (or not, it makes finding bugs harder ;-) )


Segher


Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"

2020-10-23 Thread Segher Boessenkool
On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
> 
> > Now, I am not a compiler expert, but as I already cited, at least on
> > x86-64 clang expects that the high bits were cleared by the caller - in
> > contrast to gcc. I suspect it's the same on arm64, but again, I am no
> > compiler expert.
> > 
> > If what I said and cites for x86-64 is correct, if the function expects
> > an "unsigned int", it will happily use 64bit operations without further
> > checks where valid when assuming high bits are zero. That's why even
> > converting everything to "unsigned int" as proposed by me won't work on
> > clang - it assumes high bits are zero (as indicated by Nick).
> > 
> > As I am neither a compiler experts (did I mention that already? ;) ) nor
> > an arm64 experts, I can't tell if this is a compiler BUG or not.
> 
> On arm64 when callee expects a 32bit argument, the caller is *not* responsible
> for clearing the upper half of 64bit register used to pass the value - it only
> needs to store the actual value into the lower half.  The callee must consider
> the contents of the upper half of that register as undefined.  See AAPCS64 
> (e.g.
> https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules
> ); AFAICS, the relevant bit is
>   "Unlike in the 32-bit AAPCS, named integral values must be narrowed by
> the callee rather than the caller."

Or the formal rule:

C.9 If the argument is an Integral or Pointer Type, the size of the
argument is less than or equal to 8 bytes and the NGRN is less
than 8, the argument is copied to the least significant bits in
x[NGRN]. The NGRN is incremented by one. The argument has now
been allocated.


Segher


Re: [PATCH] powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()

2020-10-22 Thread Segher Boessenkool
On Thu, Oct 22, 2020 at 02:05:46PM +, Christophe Leroy wrote:
> fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
> On powerpc, those builtins trivially use ctlzw and ctlzd power
> instructions.
> 
> Allthough those instructions provide the expected result with
> input argument 0, __builtin_ctz() and __builtin_ctzll() are
> documented as undefined for value 0.

> When the input of fls(x) is a constant, just check x for nullity and
> return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction
> directly.

That looks good :-)

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH v2 3/3] powerpc: Fix update form addressing in inline assembly

2020-10-20 Thread Segher Boessenkool
Hi!

On Tue, Oct 20, 2020 at 07:40:09AM +, Christophe Leroy wrote:
> In several places, inline assembly uses the "%Un" modifier
> to enable the use of instruction with update form addressing,
> but the associated "<>" constraint is missing.
> 
> As mentioned in previous patch, this fails with gcc 4.9, so
> "<>" can't be used directly.
> 
> Use UPD_CONSTR macro everywhere %Un modifier is used.
> 
> Signed-off-by: Christophe Leroy 

Oh well, it will be easy enough to remove this wart later, so

Reviewed-by: Segher Boessenkool 

> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -525,7 +525,7 @@ static inline void __set_pte_at(struct mm_struct *mm, 
> unsigned long addr,
>   stw%U0%X0 %2,%0\n\
>   eieio\n\
>   stw%U1%X1 %L2,%1"
> - : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> + : "=m"UPD_CONSTR (*ptep), "=m"UPD_CONSTR (*((unsigned char *)ptep+4))
>   : "r" (pte) : "memory");

Here it would pre-increment ptep+4.  That can never be something useful
afaics?  The order the two operands are (either or not) pre-modified in
the asm is not specified (GCC does not parse the asm template, by
design), so I fail to see how this could ever work.

> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -200,7 +200,7 @@ static inline void __set_pte_at(struct mm_struct *mm, 
> unsigned long addr,
>   stw%U0%X0 %2,%0\n\
>   eieio\n\
>   stw%U1%X1 %L2,%1"
> - : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> + : "=m"UPD_CONSTR (*ptep), "=m"UPD_CONSTR (*((unsigned char 
> *)ptep+4))
>   : "r" (pte) : "memory");

Same here.

The rest looks fine.


Segher


Re: [PATCH 3/3] powerpc: Fix pre-update addressing in inline assembly

2020-10-20 Thread Segher Boessenkool
On Tue, Oct 20, 2020 at 09:44:33AM +0200, Christophe Leroy wrote:
> Le 19/10/2020 à 22:24, Segher Boessenkool a écrit :
> >>but the associated "<>" constraint is missing.
> >
> >But that is just fine.  Pointless, sure, but not a bug.
> 
> Most of those are from prehistoric code. So at some point in time it was 
> effective. Then one day GCC changed it's way and they became pointless. So, 
> not a software bug, but still a regression at some point.
> 
> >>Use UPD_CONSTR macro everywhere %Un modifier is used.
> >
> >Eww.  My poor stomach.
> 
> There are not that many :)

Heh, your pain threshold is much higher than mine I guess :-)

> >Have you verified that update form is *correct* in all these, and that
> >we even *want* this there?
> 
> I can't see anything that would militate against it, do you ?
> 
> I guess if the elders have put %Us there, it was wanted.

On old CPUs, update form load/stores actually executed faster than a
"normal" memory access and an addi (or plain add).  But on more recent
stuff it mostly saves code size.  Which is nice of course, and can speed
up your code a bit, in theory at least.

It is quite hard to trigger the compiler to generate update form insns
in asm, sigh.  So testing will probably not show anything either way.
Oh well :-)


Segher


Re: [PATCH 3/3] powerpc: Fix pre-update addressing in inline assembly

2020-10-19 Thread Segher Boessenkool
On Mon, Oct 19, 2020 at 12:12:48PM +, Christophe Leroy wrote:
> In several places, inline assembly uses the "%Un" modifier
> to enable the use of instruction with pre-update addressing,

Calling this "pre-update" is misleading: the register is not updated
before the address is generated (or the memory access done!), and the
addressing is exactly the same as the "non-u" insn would use.  It is
called an "update form" instruction, because (at the same time as doing
the memory access, logically anyway) it writes back the address used to
the base register.

> but the associated "<>" constraint is missing.

But that is just fine.  Pointless, sure, but not a bug.

> Use UPD_CONSTR macro everywhere %Un modifier is used.

Eww.  My poor stomach.

Have you verified that update form is *correct* in all these, and that
we even *want* this there?


Segher


Re: [PATCH 2/3] powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at

2020-10-19 Thread Segher Boessenkool
On Mon, Oct 19, 2020 at 12:12:47PM +, Christophe Leroy wrote:
> From: Mathieu Desnoyers 
> 
> The placeholder for instruction selection should use the second
> argument's operand, which is %1, not %0. This could generate incorrect
> assembly code if the instruction selection for argument %0 ever differs
> from argument %1.

"Instruction selection" isn't correct here...  "if the memory addressing
of operand 0 is a different form from that of operand 1", perhaps?

The patch looks fine of course :-)

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH 1/3] powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9

2020-10-19 Thread Segher Boessenkool
On Mon, Oct 19, 2020 at 12:12:46PM +, Christophe Leroy wrote:
> GCC 4.9 sometimes fails to build with "m<>" constraint in
> inline assembly.

> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -223,7 +223,7 @@ do {  
> \
>   "1: " op "%U1%X1 %0,%1  # put_user\n"   \
>   EX_TABLE(1b, %l2)   \
>   :   \
> - : "r" (x), "m<>" (*addr)\
> + : "r" (x), "m"UPD_CONSTR (*addr)\
>   :   \
>   : label)
>  
> @@ -294,7 +294,7 @@ extern long __get_user_bad(void);
>   ".previous\n"   \
>   EX_TABLE(1b, 3b)\
>   : "=r" (err), "=r" (x)  \
> - : "m<>" (*addr), "i" (-EFAULT), "0" (err))
> + : "m"UPD_CONSTR (*addr), "i" (-EFAULT), "0" (err))

Wow, ugly!  But these are the only two places that use this, so

Acked-by: Segher Boessenkool  

I just hope that we get rid of 4.9 before we would use this a lot more ;-)


Segher


Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision

2020-10-19 Thread Segher Boessenkool
On Mon, Oct 19, 2020 at 10:54:40AM +0200, Christophe Leroy wrote:
> Le 19/10/2020 à 10:32, Segher Boessenkool a écrit :
> >The kernel should just use __always_inline if that is what it *wants*;
> >that is true here most likely.  GCC could perhaps improve its heuristics
> >so that it no longer thinks these functions are often too big for
> >inlining (they *are* pretty big, but not after basic optimisations with
> >constant integer arguments).
> 
> Yes I guess __always_inline is to be added on functions like this defined 
> in headers for exactly that, and that's the purpose of this patch.
> 
> However I find it odd that get_order() is outlined by GCC even in some 
> object files that don't use it at all, for instance in fs/pipe.o

It is (arguably) too big too always inline if you do not consider that
__builtin_constant_p will remove half of the function one way or
another.  Not sure if that is what happens here, but now we have a PR
(thanks!) and we will find out.


Segher


Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision

2020-10-19 Thread Segher Boessenkool
On Mon, Oct 19, 2020 at 07:50:41AM +0200, Christophe Leroy wrote:
> Le 19/10/2020 à 06:55, Joel Stanley a écrit :
> >>In the old days, marking a function 'static inline' was forcing
> >>GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable
> >>CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline
> >>a function.
> >>
> >>It looks like GCC 10 is taking poor decisions on this.

> >1952 bytes smaller with your patch applied. Did you raise this with
> >anyone from GCC?
> 
> Yes I did, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> 
> For the time being, it's at a standstill.

The kernel should just use __always_inline if that is what it *wants*;
that is true here most likely.  GCC could perhaps improve its heuristics
so that it no longer thinks these functions are often too big for
inlining (they *are* pretty big, but not after basic optimisations with
constant integer arguments).


Segher


Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10

2020-10-15 Thread Segher Boessenkool
Hi!

On Thu, Oct 15, 2020 at 10:52:20AM +, Christophe Leroy wrote:
> With gcc9 I get:



> With gcc10 I get:



> gcc10 defines multiple versions of csum_partial() which are just
> an unconditionnal branch to __csum_partial().

It doesn't inline it, yes.

Could you open a GCC PR for this please?

> To enforce inlining of that branch to __csum_partial(),
> mark csum_partial() as __always_inline.

That should be fine of course, but it would be good if we could fix this
in the compiler :-)

Reviewed-by: Segher Boessenkool  


Segher


Re: Additional debug info to aid cacheline analysis

2020-10-11 Thread Segher Boessenkool
Hi!

On Sat, Oct 10, 2020 at 10:58:36PM +0200, Mark Wielaard wrote:
> On Thu, Oct 08, 2020 at 02:23:00PM -0700, Andi Kleen wrote:
> > So I guess could disable it for 5.0+ only. 
> 
> Yes, that would work. I don't know what the lowest supported GCC
> version is, but technically it was definitely fixed in 4.10.0, 4.8.4
> and 4.9.2.

Fwiw, GCC 4.10 was renamed to GCC 5 before it was released (it was the
first release with the new version number scheme).  Only old development
versions (that no one should use) identify as 4.10.

> And various distros would probably have backported the
> fix. But checking for 5.0+ would certainly give you a good version.

Yes, esp. since some versions of 4.9 and 4.8 are still buggy.  No one
should use any version for which a newer bug-fix release has long been
available, but do you want to deal with bugs from people who do not?


Segher


Re: linux tooling mailing list

2020-10-02 Thread Segher Boessenkool
On Fri, Oct 02, 2020 at 02:01:13PM +0200, Sedat Dilek wrote:
> On Thu, Oct 1, 2020 at 12:26 AM David Miller  wrote:
> > From: Nick Desaulniers 
> > Date: Wed, 30 Sep 2020 12:29:38 -0700
> > > linux-toolcha...@vger.kernel.org
> >
> > Created.
> 
> I am subscribed, too.
> 
> Will there be a(n)...?
> 
> * archive (for example marc.info)

A lore archive would be good?

> * patchwork url

We do not have any repositories associated with this list, and there
won't be many patches anyway, so patchwork will only be useful as a kind
of mail archive.  I can ask to set one up though, if people want that?


Segher


Re: [PATCH 1/6] powerpc/time: Rename mftbl() to mftb()

2020-10-01 Thread Segher Boessenkool
On Thu, Oct 01, 2020 at 12:42:39PM +, Christophe Leroy wrote:
> On PPC64, we have mftb().
> On PPC32, we have mftbl() and an #define mftb() mftbl().
> 
> mftb() and mftbl() are equivalent, their purpose is to read the
> content of SPRN_TRBL, as returned by 'mftb' simplified instruction.
> 
> binutils seems to define 'mftbl' instruction as an equivalent
> of 'mftb'.
> 
> However in both 32 bits and 64 bits documentation, only 'mftb' is
> defined, and when performing a disassembly with objdump, the displayed
> instruction is 'mftb'
> 
> No need to have two ways to do the same thing with different
> names, rename mftbl() to have only mftb().

There are mttbl and mttbu insns (and no mttb insn); they write a 32-bit
half for the time base.  There is an mftb, and an mftbu.  mftbu reads
the upper half, while mftb reads the *whole* register.  SPR 269 is the
TBU register, while SPR 268 is called both TB and TBL.  Yes, it is
confusing :-)

The "mftb" name is much clearer than "mftbl" (on 64-bit), because it
reads the whole 64-bit register.  On 32-bit mftbl is clearer (but not
defined in the architecture, not officially an insn or even an extended
mnemonic).


Segher


Re: [RFC PATCH next-20200930] treewide: Convert macro and uses of __section(foo) to __section("foo")

2020-10-01 Thread Segher Boessenkool
Hi!

On Thu, Oct 01, 2020 at 12:15:39PM +0200, Miguel Ojeda wrote:
> > So it looks like the best option is to exclude these
> > 2 files from conversion.
> 
> Agreed. Nevertheless, is there any reason arch/powerpc/* should not be
> compiling cleanly with compiler.h? (CC'ing the rest of the PowerPC
> reviewers and ML).

You need to #include compiler_types.h to get this #define?

(The twice-defined thing is a warning, not an error.  It should be fixed
of course, but it is less important; although it may be pointing to a
deeper problem.)


Segher


Re: [PATCH v4 04/17] x86/acrn: Introduce hypercall interfaces

2020-09-30 Thread Segher Boessenkool
On Wed, Sep 30, 2020 at 07:38:15PM -0400, Arvind Sankar wrote:
> On Wed, Sep 30, 2020 at 06:25:59PM -0500, Segher Boessenkool wrote:
> > On Wed, Sep 30, 2020 at 12:14:03PM -0700, Nick Desaulniers wrote:
> > > Do we need register local storage here?
> > 
> > Depends what you want.  It looks like you do:
> > 
> > > static inline long bar(unsigned long hcall_id)
> > > {
> > >   long result;
> > >   asm volatile("movl %1, %%r8d\n\t"
> > >   "vmcall\n\t"
> > > : "=a" (result)
> > > : "ir" (hcall_id)
> > > : );
> > >   return result;
> > > }
> > 
> > "result" as output from the asm is in %rax, and the compiler will
> > shuffle that to wherever it needs it as the function return value.  That
> > part will work fine.
> > 
> > But how you are accessing %r8d is not correct, that needs to be a local
> > register asm (or r8 be made a fixed reg, probably not what you want ;-) )
> 
> Doesn't it just need an "r8" clobber to allow using r8d?

Yes, x86 asm is hard to read, what can I say :-)  Sorry about that.


Segher


Re: [PATCH v4 04/17] x86/acrn: Introduce hypercall interfaces

2020-09-30 Thread Segher Boessenkool
On Wed, Sep 30, 2020 at 03:59:15PM -0400, Arvind Sankar wrote:
> On Wed, Sep 30, 2020 at 12:14:03PM -0700, Nick Desaulniers wrote:
> > On Wed, Sep 30, 2020 at 10:13 AM Peter Zijlstra  
> > wrote:
> > >
> > > On Wed, Sep 30, 2020 at 11:10:36AM -0500, Segher Boessenkool wrote:
> > >
> > > > Since this variable is a local register asm, on entry to the asm the
> > > > compiler guarantees that the value lives in the assigned register (the
> > > > "r8" hardware register in this case).  This all works completely fine.
> > > > This is the only guaranteed behaviour for local register asm (well,
> > > > together with analogous behaviour for outputs).
> 
> How strict is the guarantee? This is an inline function -- could the
> compiler decide to reorder some other code in between the r8 assignment
> and the asm statement when it gets inlined?

Nope.  It will be in r8 on entry to the asm.  A guarantee is a
guarantee; it is not a "yeah maybe, we'll see".

> > Do we need register local storage here?
> > 
> > static inline long bar(unsigned long hcall_id)
> > {
> >   long result;
> >   asm volatile("movl %1, %%r8d\n\t"
> >   "vmcall\n\t"
> > : "=a" (result)
> > : "ir" (hcall_id)
> > : );
> >   return result;
> > }
> 
> This seems more robust, though you probably need an r8 clobber in there?

Oh, x86 has the operand order inverted, so this should work in fact.


Segher


Re: [PATCH v4 04/17] x86/acrn: Introduce hypercall interfaces

2020-09-30 Thread Segher Boessenkool
On Wed, Sep 30, 2020 at 09:42:40PM +0200, Peter Zijlstra wrote:
> > Looks like yes. You can even check different GCC versions via the
> > dropdown in the top right.
> 
> That only tells me it compiles it, not if that (IMO) weird construct is
> actually guaranteed to work as expected.
> 
> I'd almost dive into the GCC archives to read the back-story to this
> 'feature', it just seems to weird to me. A well, for another day that.

It was documented in 1996 (), the
feature is older than that though.

In 2004 (in ) the documentation for
this was made more explicit (it has been rewritten since, but it still
says the same thing).


Segher


Re: [PATCH v4 04/17] x86/acrn: Introduce hypercall interfaces

2020-09-30 Thread Segher Boessenkool
On Wed, Sep 30, 2020 at 12:14:03PM -0700, Nick Desaulniers wrote:
> Do we need register local storage here?

Depends what you want.  It looks like you do:

> static inline long bar(unsigned long hcall_id)
> {
>   long result;
>   asm volatile("movl %1, %%r8d\n\t"
>   "vmcall\n\t"
> : "=a" (result)
> : "ir" (hcall_id)
> : );
>   return result;
> }

"result" as output from the asm is in %rax, and the compiler will
shuffle that to wherever it needs it as the function return value.  That
part will work fine.

But how you are accessing %r8d is not correct, that needs to be a local
register asm (or r8 be made a fixed reg, probably not what you want ;-) )


Segher


Re: [PATCH v4 04/17] x86/acrn: Introduce hypercall interfaces

2020-09-30 Thread Segher Boessenkool
Hi!

On Wed, Sep 30, 2020 at 01:16:12PM +0200, Peter Zijlstra wrote:
> On Sun, Sep 27, 2020 at 08:38:03AM -0700, Dave Hansen wrote:
> > On 9/27/20 3:51 AM, Greg Kroah-Hartman wrote:
> > >> +static inline long acrn_hypercall0(unsigned long hcall_id)
> > >> +{
> > >> +register long r8 asm("r8");
> > >> +long result;
> > >> +
> > >> +/* Nothing can come between the r8 assignment and the asm: */
> > >> +r8 = hcall_id;
> > >> +asm volatile("vmcall\n\t"
> > >> + : "=a" (result)
> > >> + : "r" (r8)
> > >> + : );
> > > What keeps an interrupt from happening between the r8 assignment and the
> > > asm: ?
> > 
> > It's probably better phrased something like: "No other C code can come
> > between this r8 assignment and the inline asm".  An interrupt would
> > actually be fine in there because interrupts save and restore all
> > register state, including r8.
> > 
> > The problem (mentioned in the changelog) is that gcc does not let you
> > place data directly into r8.  But, it does allow you to declare a
> > register variable that you can assign to use r8.  There might be a
> > problem if a function calls was in between and clobber the register,
> > thus the "nothing can come between" comment.
> > 
> > The comment is really intended to scare away anyone from adding printk()'s.
> > 
> > More information about these register variables is here:
> > 
> > > https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
> > 
> > Any better ideas for comments would be greatly appreciated.  It has 4 or
> > 5 copies so I wanted it to be succinct.
> 
> This is disguisting.. Segher, does this actually work? Nick, does clang
> also support this?

The C variable "r8" is just a variable like any other; it can live in
memory, or in any register, and different in all places, too.  It can be
moved around too; where "the assignment to it" happens is a
philosophical question more than anything (the assignment there  can be
optimised away completely, for example; it is just a C variable, there
is no magic).

Since this variable is a local register asm, on entry to the asm the
compiler guarantees that the value lives in the assigned register (the
"r8" hardware register in this case).  This all works completely fine.
This is the only guaranteed behaviour for local register asm (well,
together with analogous behaviour for outputs).

If you want to *always* have it live in the hardware reg "r8", you have
to use a global register asm, and almost certainly do that for all
translation units, and use -ffixed-r8 as well.  This of course is
extremely costly.


Segher


Re: [PATCH v2 4/7] powerpc: Remove PowerPC 601

2020-09-29 Thread Segher Boessenkool
On Tue, Sep 29, 2020 at 06:09:21AM +, Christophe Leroy wrote:
> Powerpc 601 is 25 years old.

So is 603, but that one is still used!  :-)

> It is not selected by any defconfig.
> 
> It requires a lot of special handling as it deviates from the
> standard 6xx.
> 
> Retire it.

That is fine with me of course.  If I get a vote at all for this!

Thanks,


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Wed, Sep 09, 2020 at 02:33:36PM -0700, Linus Torvalds wrote:
> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
>  wrote:
> >
> > It will not work like this in GCC, no.  The LLVM people know about that.
> > I do not know why they insist on pushing this, being incompatible and
> > everything.
> 
> Umm. Since they'd be the ones supporting this, *gcc* would be the
> incompatible one, not clang.

This breaks the basic requirements of asm goto.

> So I'd phrase it differently. If gcc is planning on doing some
> different model for asm goto with outputs, that would be the
> incompatible case.

If we will do asm goto with outputs, the asm will still be a jump
instruction!  (It is not in LLVM!)

We probably *can* make asm goto have outputs (jump instructions can have
outputs just fine!  Just output reloads on jump instructions are hard,
because not always they are *possible*; but for asm goto it should be
fine).

Doing as LLVM does, and making the asm a "trapping" instruction, makes
it not a jump insn, and opens up whole new cans of worms (including
inferior code quality).  Since it has very different semantics, and we
might want to keep the semantics of asm goto as well anyway, this should
be called something different ("asm break" or "asm __anything" for
example).

It would be nice if they talked to us about it, too.  LLVM claims it
implements the GCC inline asm extension.  It already only is compatible
for the simplest of cases, but this would be much worse still :-(

> and honestly, (b) is actually inferior for the error cases, even if to
> a compiler person it might feel like the "RightThing(tm)" to do.
> Because when an exception happens, the outputs simply won't be
> initialized.

Sure, that is fine, and quite possible useful, but it is not the same as
asm goto.  asm goto is not some exception handling construct: it is a
jump instruction.

> Anyway, for either of those cases, the kernel won't care either way.
> We'll have to support the non-goto case for many years even if
> everybody were to magically implement it today, so it's not like this
> is a "you have to do it" thing.

Yes.

I'm just annoyed because of all the extra work created by people not
communicating.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 12:26:53PM +, David Laight wrote:
> Actually this is pretty sound:
>   __label__ label;
>   register int eax asm ("eax");
>   // Ensure eax can't be reloaded from anywhere
>   // In particular it can't be reloaded after the asm goto line
>   asm volatile ("" : "=r" (eax));

This asm is fine.  It says it writes the "eax" variable, which lives in
the eax register *in that asm* (so *not* guaranteed after it!).

>   // Provided gcc doesn't save eax here...
>   asm volatile goto ("x" ::: "eax" : label);

So this is incorrect.

>   // ... and reload the saved value here.
>   // The input value here will be that modified by the 'asm goto'.
>   // Since this modifies eax it can't be moved before the 'asm goto'.
>   asm volatile ("" : "+r" (eax));
>   // So here eax must contain the value set by the "x" instructions.

No, the register eax will contain the value of the eax variable.  In the
asm; it might well be there before or after the asm as well, but none of
that is guaranteed.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 03:31:53PM +, David Laight wrote:
> > >   asm volatile ("" : "+r" (eax));
> > >   // So here eax must contain the value set by the "x" instructions.
> > 
> > No, the register eax will contain the value of the eax variable.  In the
> > asm; it might well be there before or after the asm as well, but none of
> > that is guaranteed.
> 
> Perhaps not 'guaranteed', but very unlikely to be wrong.
> It doesn't give gcc much scope for not generating the desired code.

Wanna bet?  :-)

Correct is correct.  Anything else is not.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 09:26:28AM +, David Laight wrote:
> From: Christophe Leroy
> > Sent: 10 September 2020 09:14
> > 
> > Le 10/09/2020 à 10:04, David Laight a écrit :
> > > From: Linus Torvalds
> > >> Sent: 09 September 2020 22:34
> > >> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
> > >>  wrote:
> > >>>
> > >>> It will not work like this in GCC, no.  The LLVM people know about that.
> > >>> I do not know why they insist on pushing this, being incompatible and
> > >>> everything.
> > >>
> > >> Umm. Since they'd be the ones supporting this, *gcc* would be the
> > >> incompatible one, not clang.
> > >
> > > I had an 'interesting' idea.
> > >
> > > Can you use a local asm register variable as an input and output to
> > > an 'asm volatile goto' statement?
> > >
> > > Well you can - but is it guaranteed to work :-)
> > >
> > 
> > With gcc at least it should work according to
> > https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
> > 
> > They even explicitely tell: "The only supported use for this feature is
> > to specify registers for input and output operands when calling Extended
> > asm "
> 
> A quick test isn't good
> 
> int bar(char *z)
> {
> __label__ label;
> register int eax asm ("eax") = 6;
> asm volatile goto (" mov $1, %%eax" ::: "eax" : label);
> 
> label:
> return eax;
> }

It is neither input nor output operand here!  Only *then* is a local
register asm guaranteed to be in the given reg: as input or output to an
inline asm.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-09 Thread Segher Boessenkool
On Wed, Sep 09, 2020 at 10:31:34AM -0700, Linus Torvalds wrote:
> And apparently there are people working on this on the gcc side too,
> so it won't just be clang-specific. Nor kernel-specific in that Nick
> tells me some other projects are looking at using that asm goto with
> outputs too.

It will not work like this in GCC, no.  The LLVM people know about that.
I do not know why they insist on pushing this, being incompatible and
everything.


Segher


Re: [PATCH v2] x86/asm: Replace __force_order with memory clobber

2020-09-02 Thread Segher Boessenkool
On Wed, Sep 02, 2020 at 11:33:46AM -0400, Arvind Sankar wrote:
> The CRn accessor functions use __force_order as a dummy operand to
> prevent the compiler from reordering the inline asm.
> 
> The fact that the asm is volatile should be enough to prevent this
> already, however older versions of GCC had a bug that could sometimes
> result in reordering. This was fixed in 8.1, 7.3 and 6.5. Versions prior
> to these, including 5.x and 4.9.x, may reorder volatile asm.

Reordering them amongst themselves.  Yes, that is bad.  Reordering them
with "random" code is Just Fine.

Volatile asm should be executed on the real machine exactly as often as
on the C abstract machine, and in the same order.  That is all.

> + * The compiler should not reorder volatile asm,

So, this comment needs work.  And perhaps the rest of the patch as well?


Segher


Re: [PATCH 2/2] powerpc/vdso32: link vdso64 with linker

2020-09-02 Thread Segher Boessenkool
On Wed, Sep 02, 2020 at 05:43:03PM +0200, Christophe Leroy wrote:
> >Try with a newer ld?  If it still happens with current versions, please
> >open a bug report?  https://sourceware.org/bugzilla
> 
> Yes it works with 2.30 and 2.34

Ah okay, I missed this part.

> But minimum for building kernel is supposed to be 2.23

Sure.  Tthat could be upgraded to 2.24 -- you should use a binutils at
least as new as your GCC, and that requires 4.9 now -- but that
probably doesn't help you here).


Segher


Re: [PATCH 2/2] powerpc/vdso32: link vdso64 with linker

2020-09-02 Thread Segher Boessenkool
Hi!

On Wed, Sep 02, 2020 at 06:46:45AM +, Christophe Leroy wrote:
> ld crashes:
> 
>   LD  arch/powerpc/kernel/vdso32/vdso32.so.dbg
> /bin/sh: line 1: 23780 Segmentation fault  (core dumped) 
> ppc-linux-ld -EB -m elf32ppc -shared -soname linux-vdso32.so.1 
> --eh-frame-hdr --orphan-handling=warn -T 
> arch/powerpc/kernel/vdso32/vdso32.lds 
> arch/powerpc/kernel/vdso32/sigtramp.o 
> arch/powerpc/kernel/vdso32/gettimeofday.o 
> arch/powerpc/kernel/vdso32/datapage.o 
> arch/powerpc/kernel/vdso32/cacheflush.o 
> arch/powerpc/kernel/vdso32/note.o arch/powerpc/kernel/vdso32/getcpu.o -o 
> arch/powerpc/kernel/vdso32/vdso32.so.dbg
> make[4]: *** [arch/powerpc/kernel/vdso32/vdso32.so.dbg] Error 139
> 
> 
> [root@localhost linux-powerpc]# ppc-linux-ld --version
> GNU ld (GNU Binutils) 2.26.20160125

[ Don't build as root :-P ]

Try with a newer ld?  If it still happens with current versions, please
open a bug report?  https://sourceware.org/bugzilla


Segher


Re: Re:Re: [PATCH] powerpc: Fix a bug in __div64_32 if divisor is zero

2020-08-24 Thread Segher Boessenkool
On Mon, Aug 24, 2020 at 07:54:07PM +0800, Guohua Zhong wrote:
> >> Yet, I have noticed that there is no checking of 'base' in these functions.
> >> But I am not sure how to check is better.As we know that the result is 
> >> undefined when divisor is zero. It maybe good to print error and dump 
> >> stack.
> >>  Let the process to know that the divisor is zero by sending SIGFPE. 
> 
> > That is now what the PowerPC integer divide insns do: they just leave
> > the result undefined (and they can set the overflow flag then, but no
> > one uses that).
> 
> OK ,So just keep the patch as below. If this patch looks good for you, please
> help to review. I will send the new patch later.
> 
> Thanks for your reply.
> 
> diff --git a/arch/powerpc/boot/div64.S b/arch/powerpc/boot/div64.S
> index 4354928ed62e..1d3561cf16fa 100644
> --- a/arch/powerpc/boot/div64.S
> +++ b/arch/powerpc/boot/div64.S
> @@ -13,8 +13,10 @@
> 
> .globl __div64_32
> .globl __div64_32
>  __div64_32:
> + cmplwi  r4,0# check if divisor r4 is zero
> lwz r5,0(r3)# get the dividend into r5/r6
> lwz r6,4(r3)
> + beq 5f  # jump to label 5 if r4(divisor) is zero

Just "beqlr".

This instruction scheduling hurts all CPUs that aren't 8xx, fwiw (but
likely only in the case where r4 *is* zero, so who cares :-) )

So...  What is the *goal* of this patch?  It looks like the routine
would not get into a loop if r4 is 0, just return the wrong result?
But, it *always* will, there *is* no right result?

No caller should call it with zero as divisor ever, so in that sense,
checking for it in the division routine is just pure wasted work.


Segher


Re: Re:Re: [PATCH] powerpc: Fix a bug in __div64_32 if divisor is zero

2020-08-22 Thread Segher Boessenkool
On Sun, Aug 23, 2020 at 12:54:33AM +0800, Guohua Zhong wrote:
> Yet, I have noticed that there is no checking of 'base' in these functions.
> But I am not sure how to check is better.As we know that the result is 
> undefined when divisor is zero. It maybe good to print error and dump stack.
>  Let the process to know that the divisor is zero by sending SIGFPE. 

That is now what the PowerPC integer divide insns do: they just leave
the result undefined (and they can set the overflow flag then, but no
one uses that).


Segher


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Segher Boessenkool
On Sat, Aug 22, 2020 at 11:51:56AM +0200, Sedat Dilek wrote:
> On Sat, Aug 22, 2020 at 11:23 AM Sedat Dilek  wrote:
> >
> > On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool
> >  wrote:
> > >
> > > Hi Arvind,
> > >
> > > On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote:
> > > > Cc Segher.
> > > >
> > > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile
> > > > asm's (reported on ARM). The fix was backported to gcc-6.
> > >
> > > I know ;-)
> > >
> > > > Do you know if
> > > > there is any reason the problem couldn't occur on x86 on older gcc
> > > > without the fix?
> > >
> > > No, I see no particular reason, at least GCC 5 seems vulnerable.  (The
> > > GCC 5 release branch was closed at the time this bug report was made,
> > > already).  There is no reason I see why it would work on x86 but fail
> > > elsewhere, either.

> There exist gcc-4.8 and gcc-4.9 for Debian/jessie where EOL was June
> 30, 2020 (see [1] and [2]).
> 
> In the latest available version "4.9.2-10+deb8u1" I see no PR82602 was
> backported (see [3] and [4]).

[ There is GCC 4.9.4, no one should use an older 4.9. ]

I mentioned 5 for a reason: the whole function this patch is to did not
exist before then!  That does not mean the bug existed or did not exist
before GCC 5, but it does for example mean that a backport to 4.9 or
older isn't trivial at all.

> I am asking myself who is using such ancient compilers?

Some distros have a GCC 4.8 as system compiler.  We allow building GCC
itself with a compiler that far back, for various reasons as well (and
this is very sharp already, the last mainline GCC 4.8 release is from
June 2015, not all that long ago at all).

But, one reason this works is because people actually test it.  Does
anyone actually test the kernel with old compilers?  It isn't hard to
build a new compiler (because we make sure building a newer compiler
works with older compilers, etc. :-) ), and as you say, most distros
have newer compilers available nowadays.


Segher


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Segher Boessenkool
Hi Arvind,

On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote:
> Cc Segher.
> 
> Segher, we were looking at gcc PR82602, where IRA could reorder volatile
> asm's (reported on ARM). The fix was backported to gcc-6.

I know ;-)

> Do you know if
> there is any reason the problem couldn't occur on x86 on older gcc
> without the fix?

No, I see no particular reason, at least GCC 5 seems vulnerable.  (The
GCC 5 release branch was closed at the time this bug report was made,
already).  There is no reason I see why it would work on x86 but fail
elsewhere, either.


Segher


Re: [PATCH] sfc_ef100: Fix build failure on powerpc

2020-08-13 Thread Segher Boessenkool
On Thu, Aug 13, 2020 at 02:39:10PM +, Christophe Leroy wrote:
> ppc6xx_defconfig fails building sfc.ko module, complaining
> about the lack of _umoddi3 symbol.
> 
> This is due to the following test
> 
>   if (EFX_MIN_DMAQ_SIZE % reader->value) {
> 
> Because reader->value is u64.
> 
> As EFX_MIN_DMAQ_SIZE value is 512, reader->value is obviously small
> enough for an u32 calculation, so cast it as (u32) for the test, to
> avoid the need for _umoddi3.

That isn't the same e.g. if reader->value is 2**32 + small.  Which
probably cannot happen, but :-)


Segher


  1   2   3   4   5   6   7   8   9   >