Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 03:31:53PM +, David Laight wrote:
> > >   asm volatile ("" : "+r" (eax));
> > >   // So here eax must contain the value set by the "x" instructions.
> > 
> > No, the register eax will contain the value of the eax variable.  In the
> > asm; it might well be there before or after the asm as well, but none of
> > that is guaranteed.
> 
> Perhaps not 'guaranteed', but very unlikely to be wrong.
> It doesn't give gcc much scope for not generating the desired code.

Wanna bet?  :-)

Correct is correct.  Anything else is not.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Wed, Sep 09, 2020 at 02:33:36PM -0700, Linus Torvalds wrote:
> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
>  wrote:
> >
> > It will not work like this in GCC, no.  The LLVM people know about that.
> > I do not know why they insist on pushing this, being incompatible and
> > everything.
> 
> Umm. Since they'd be the ones supporting this, *gcc* would be the
> incompatible one, not clang.

This breaks the basic requirements of asm goto.

> So I'd phrase it differently. If gcc is planning on doing some
> different model for asm goto with outputs, that would be the
> incompatible case.

If we will do asm goto with outputs, the asm will still be a jump
instruction!  (It is not in LLVM!)

We probably *can* make asm goto have outputs (jump instructions can have
outputs just fine!  Just output reloads on jump instructions are hard,
because not always they are *possible*; but for asm goto it should be
fine).

Doing as LLVM does, and making the asm a "trapping" instruction, makes
it not a jump insn, and opens up whole new cans of worms (including
inferior code quality).  Since it has very different semantics, and we
might want to keep the semantics of asm goto as well anyway, this should
be called something different ("asm break" or "asm __anything" for
example).

It would be nice if they talked to us about it, too.  LLVM claims it
implements the GCC inline asm extension.  It already only is compatible
for the simplest of cases, but this would be much worse still :-(

> and honestly, (b) is actually inferior for the error cases, even if to
> a compiler person it might feel like the "RightThing(tm)" to do.
> Because when an exception happens, the outputs simply won't be
> initialized.

Sure, that is fine, and quite possible useful, but it is not the same as
asm goto.  asm goto is not some exception handling construct: it is a
jump instruction.

> Anyway, for either of those cases, the kernel won't care either way.
> We'll have to support the non-goto case for many years even if
> everybody were to magically implement it today, so it's not like this
> is a "you have to do it" thing.

Yes.

I'm just annoyed because of all the extra work created by people not
communicating.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 12:26:53PM +, David Laight wrote:
> Actually this is pretty sound:
>   __label__ label;
>   register int eax asm ("eax");
>   // Ensure eax can't be reloaded from anywhere
>   // In particular it can't be reloaded after the asm goto line
>   asm volatile ("" : "=r" (eax));

This asm is fine.  It says it writes the "eax" variable, which lives in
the eax register *in that asm* (so *not* guaranteed after it!).

>   // Provided gcc doesn't save eax here...
>   asm volatile goto ("x" ::: "eax" : label);

So this is incorrect.

>   // ... and reload the saved value here.
>   // The input value here will be that modified by the 'asm goto'.
>   // Since this modifies eax it can't be moved before the 'asm goto'.
>   asm volatile ("" : "+r" (eax));
>   // So here eax must contain the value set by the "x" instructions.

No, the register eax will contain the value of the eax variable.  In the
asm; it might well be there before or after the asm as well, but none of
that is guaranteed.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 09:26:28AM +, David Laight wrote:
> From: Christophe Leroy
> > Sent: 10 September 2020 09:14
> > 
> > Le 10/09/2020 à 10:04, David Laight a écrit :
> > > From: Linus Torvalds
> > >> Sent: 09 September 2020 22:34
> > >> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
> > >>  wrote:
> > >>>
> > >>> It will not work like this in GCC, no.  The LLVM people know about that.
> > >>> I do not know why they insist on pushing this, being incompatible and
> > >>> everything.
> > >>
> > >> Umm. Since they'd be the ones supporting this, *gcc* would be the
> > >> incompatible one, not clang.
> > >
> > > I had an 'interesting' idea.
> > >
> > > Can you use a local asm register variable as an input and output to
> > > an 'asm volatile goto' statement?
> > >
> > > Well you can - but is it guaranteed to work :-)
> > >
> > 
> > With gcc at least it should work according to
> > https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
> > 
> > They even explicitely tell: "The only supported use for this feature is
> > to specify registers for input and output operands when calling Extended
> > asm "
> 
> A quick test isn't good
> 
> int bar(char *z)
> {
> __label__ label;
> register int eax asm ("eax") = 6;
> asm volatile goto (" mov $1, %%eax" ::: "eax" : label);
> 
> label:
> return eax;
> }

It is neither input nor output operand here!  Only *then* is a local
register asm guaranteed to be in the given reg: as input or output to an
inline asm.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-09 Thread Segher Boessenkool
On Wed, Sep 09, 2020 at 10:31:34AM -0700, Linus Torvalds wrote:
> And apparently there are people working on this on the gcc side too,
> so it won't just be clang-specific. Nor kernel-specific in that Nick
> tells me some other projects are looking at using that asm goto with
> outputs too.

It will not work like this in GCC, no.  The LLVM people know about that.
I do not know why they insist on pushing this, being incompatible and
everything.


Segher


Re: [PATCH 2/2] powerpc/vdso32: link vdso64 with linker

2020-09-02 Thread Segher Boessenkool
On Wed, Sep 02, 2020 at 05:43:03PM +0200, Christophe Leroy wrote:
> >Try with a newer ld?  If it still happens with current versions, please
> >open a bug report?  https://sourceware.org/bugzilla
> 
> Yes it works with 2.30 and 2.34

Ah okay, I missed this part.

> But minimum for building kernel is supposed to be 2.23

Sure.  Tthat could be upgraded to 2.24 -- you should use a binutils at
least as new as your GCC, and that requires 4.9 now -- but that
probably doesn't help you here).


Segher


Re: [PATCH 2/2] powerpc/vdso32: link vdso64 with linker

2020-09-02 Thread Segher Boessenkool
Hi!

On Wed, Sep 02, 2020 at 06:46:45AM +, Christophe Leroy wrote:
> ld crashes:
> 
>   LD  arch/powerpc/kernel/vdso32/vdso32.so.dbg
> /bin/sh: line 1: 23780 Segmentation fault  (core dumped) 
> ppc-linux-ld -EB -m elf32ppc -shared -soname linux-vdso32.so.1 
> --eh-frame-hdr --orphan-handling=warn -T 
> arch/powerpc/kernel/vdso32/vdso32.lds 
> arch/powerpc/kernel/vdso32/sigtramp.o 
> arch/powerpc/kernel/vdso32/gettimeofday.o 
> arch/powerpc/kernel/vdso32/datapage.o 
> arch/powerpc/kernel/vdso32/cacheflush.o 
> arch/powerpc/kernel/vdso32/note.o arch/powerpc/kernel/vdso32/getcpu.o -o 
> arch/powerpc/kernel/vdso32/vdso32.so.dbg
> make[4]: *** [arch/powerpc/kernel/vdso32/vdso32.so.dbg] Error 139
> 
> 
> [root@localhost linux-powerpc]# ppc-linux-ld --version
> GNU ld (GNU Binutils) 2.26.20160125

[ Don't build as root :-P ]

Try with a newer ld?  If it still happens with current versions, please
open a bug report?  https://sourceware.org/bugzilla


Segher


Re: Re:Re: [PATCH] powerpc: Fix a bug in __div64_32 if divisor is zero

2020-08-24 Thread Segher Boessenkool
On Mon, Aug 24, 2020 at 07:54:07PM +0800, Guohua Zhong wrote:
> >> Yet, I have noticed that there is no checking of 'base' in these functions.
> >> But I am not sure how to check is better.As we know that the result is 
> >> undefined when divisor is zero. It maybe good to print error and dump 
> >> stack.
> >>  Let the process to know that the divisor is zero by sending SIGFPE. 
> 
> > That is now what the PowerPC integer divide insns do: they just leave
> > the result undefined (and they can set the overflow flag then, but no
> > one uses that).
> 
> OK ,So just keep the patch as below. If this patch looks good for you, please
> help to review. I will send the new patch later.
> 
> Thanks for your reply.
> 
> diff --git a/arch/powerpc/boot/div64.S b/arch/powerpc/boot/div64.S
> index 4354928ed62e..1d3561cf16fa 100644
> --- a/arch/powerpc/boot/div64.S
> +++ b/arch/powerpc/boot/div64.S
> @@ -13,8 +13,10 @@
> 
> .globl __div64_32
> .globl __div64_32
>  __div64_32:
> + cmplwi  r4,0# check if divisor r4 is zero
> lwz r5,0(r3)# get the dividend into r5/r6
> lwz r6,4(r3)
> + beq 5f  # jump to label 5 if r4(divisor) is zero

Just "beqlr".

This instruction scheduling hurts all CPUs that aren't 8xx, fwiw (but
likely only in the case where r4 *is* zero, so who cares :-) )

So...  What is the *goal* of this patch?  It looks like the routine
would not get into a loop if r4 is 0, just return the wrong result?
But, it *always* will, there *is* no right result?

No caller should call it with zero as divisor ever, so in that sense,
checking for it in the division routine is just pure wasted work.


Segher


Re: Re:Re: [PATCH] powerpc: Fix a bug in __div64_32 if divisor is zero

2020-08-22 Thread Segher Boessenkool
On Sun, Aug 23, 2020 at 12:54:33AM +0800, Guohua Zhong wrote:
> Yet, I have noticed that there is no checking of 'base' in these functions.
> But I am not sure how to check is better.As we know that the result is 
> undefined when divisor is zero. It maybe good to print error and dump stack.
>  Let the process to know that the divisor is zero by sending SIGFPE. 

That is now what the PowerPC integer divide insns do: they just leave
the result undefined (and they can set the overflow flag then, but no
one uses that).


Segher


Re: [PATCH] sfc_ef100: Fix build failure on powerpc

2020-08-13 Thread Segher Boessenkool
On Thu, Aug 13, 2020 at 02:39:10PM +, Christophe Leroy wrote:
> ppc6xx_defconfig fails building sfc.ko module, complaining
> about the lack of _umoddi3 symbol.
> 
> This is due to the following test
> 
>   if (EFX_MIN_DMAQ_SIZE % reader->value) {
> 
> Because reader->value is u64.
> 
> As EFX_MIN_DMAQ_SIZE value is 512, reader->value is obviously small
> enough for an u32 calculation, so cast it as (u32) for the test, to
> avoid the need for _umoddi3.

That isn't the same e.g. if reader->value is 2**32 + small.  Which
probably cannot happen, but :-)


Segher


Re: [PATCH v3 2/2] powerpc/uaccess: Add pre-update addressing to __get_user_asm() and __put_user_asm()

2020-08-12 Thread Segher Boessenkool
On Wed, Aug 12, 2020 at 12:25:17PM +, Christophe Leroy wrote:
> Enable pre-update addressing mode in __get_user_asm() and __put_user_asm()
> 
> Signed-off-by: Christophe Leroy 
> ---
> v3: new, splited out from patch 1.

It still looks fine to me, you can keep my Reviewed-by: :-)


Segher


Re: [PATCH v2] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-08-12 Thread Segher Boessenkool
On Wed, Aug 12, 2020 at 02:32:51PM +0200, Christophe Leroy wrote:
> Anyway, it seems that GCC doesn't make much use of the "m<>" and the 
> pre-update form.

GCC does not use update form outside of inner loops much.  Did you
expect anything else?

> Most of the benefit of flexible addressing seems to be 
> achieved with patch 1 ie without the "m<>" constraint and update form.

Yes.


Segher


Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-06 Thread Segher Boessenkool
Hi!

On Thu, Aug 06, 2020 at 12:03:33PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
> >> Christophe Leroy  writes:
> >> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
> >> > frame whenever it has anything to same.

^^^

> >> > fbb60:   94 21 ff e0 stwur1,-32(r1)
> >
> > This is the *only* place where you can use a negative offset from r1:
> > in the stwu to extend the stack (set up a new stack frame, or make the
> > current one bigger).
> 
> (You're talking about 32-bit code here right?)

The "SYSV" ELF binding, yeah, which is used for 32-bit on Linux (give or
take, ho hum).

The ABIs that have a red zone are much nicer here (but less simple) :-)

> >> At the same time it's much safer for us to just save/restore r2, and
> >> probably in the noise performance wise.
> >
> > If you want a function to be able to work with ABI-compliant code safely
> > (in all cases), you'll have to make it itself ABI-compliant as well,
> > yes :-)
> 
> True. Except this is the VDSO which has previously been a bit wild west
> as far as ABI goes :)

It could get away with many things because it was guaranteed to be a
leaf function.  Some of those things even violate the ABIs, but you can
get away with it easily, much reduced scope.  Now if this is generated
code, violating the rules will catch up with you sooner rather than
later ;-)


Segher


Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-05 Thread Segher Boessenkool
Hi!

On Wed, Aug 05, 2020 at 06:51:44PM +0200, Christophe Leroy wrote:
> Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> >On Wed, Aug 05, 2020 at 07:09:23AM +, Christophe Leroy wrote:
> >>+/*
> >>+ * The macros sets two stack frames, one for the caller and one for the 
> >>callee
> >>+ * because there are no requirement for the caller to set a stack frame 
> >>when
> >>+ * calling VDSO so it may have omitted to set one, especially on PPC64
> >>+ */
> >
> >If the caller follows the ABI, there always is a stack frame.  So what
> >is going on?
> 
> Looks like it is not the case. See discussion at 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.le...@c-s.fr/
> 
> Seems like GCC uses the redzone and doesn't set a stack frame. I guess 
> it doesn't know that the inline assembly contains a function call so it 
> doesn't set the frame.

Yes, that is the problem.  See
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Extended-Asm.html#AssemblerTemplate
where this is (briefly) discussed:
  "Accessing data from C programs without using input/output operands
  (such as by using global symbols directly from the assembler
  template) may not work as expected. Similarly, calling functions
  directly from an assembler template requires a detailed understanding
  of the target assembler and ABI."

I don't know of a good way to tell GCC some function needs a frame (that
is, one that doesn't result in extra code other than to set up the
frame).  I'll think about it.


Segher


Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-05 Thread Segher Boessenkool
Hi!

On Wed, Aug 05, 2020 at 04:40:16PM +, Christophe Leroy wrote:
> >It cannot optimise it because it does not know shift < 32.  The code
> >below is incorrect for shift equal to 32, fwiw.
> 
> Is there a way to tell it ?

Sure, for example the &31 should work (but it doesn't, with the GCC
version you used -- which version is that?)

> >What does the compiler do for just
> >
> >static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> > return ns >> (shift & 31);
> >}
> >
> 
> Worse:

I cannot make heads or tails of all that branch spaghetti, sorry.

>  73c: 55 8c 06 fe clrlwi  r12,r12,27
>  740: 7f c8 f0 14 addcr30,r8,r30
>  744: 7c c6 4a 14 add r6,r6,r9
>  748: 7c c6 e1 14 adder6,r6,r28
>  74c: 34 6c ff e0 addic.  r3,r12,-32
>  750: 41 80 00 70 blt 7c0 <__c_kernel_clock_gettime+0x114>

This branch is always true.  Hrm.


Segher


Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-05 Thread Segher Boessenkool
Hi!

On Wed, Aug 05, 2020 at 07:09:23AM +, Christophe Leroy wrote:
> Provide vdso_shift_ns(), as the generic x >> s gives the following
> bad result:
> 
>   18: 35 25 ff e0 addic.  r9,r5,-32
>   1c: 41 80 00 10 blt 2c 
>   20: 7c 64 4c 30 srw r4,r3,r9
>   24: 38 60 00 00 li  r3,0
> ...
>   2c: 54 69 08 3c rlwinm  r9,r3,1,0,30
>   30: 21 45 00 1f subfic  r10,r5,31
>   34: 7c 84 2c 30 srw r4,r4,r5
>   38: 7d 29 50 30 slw r9,r9,r10
>   3c: 7c 63 2c 30 srw r3,r3,r5
>   40: 7d 24 23 78 or  r4,r9,r4
> 
> In our case the shift is always <= 32. In addition,  the upper 32 bits
> of the result are likely nul. Lets GCC know it, it also optimises the
> following calculations.
> 
> With the patch, we get:
>0: 21 25 00 20 subfic  r9,r5,32
>4: 7c 69 48 30 slw r9,r3,r9
>8: 7c 84 2c 30 srw r4,r4,r5
>c: 7d 24 23 78 or  r4,r9,r4
>   10: 7c 63 2c 30 srw r3,r3,r5

See below.  Such code is valid on PowerPC for all shift < 64, and a
future version of GCC will do that (it is on various TODO lists, it is
bound to happen *some* day ;-), but it won't help you yet of course).


> +/*
> + * The macros sets two stack frames, one for the caller and one for the 
> callee
> + * because there are no requirement for the caller to set a stack frame when
> + * calling VDSO so it may have omitted to set one, especially on PPC64
> + */

If the caller follows the ABI, there always is a stack frame.  So what
is going on?


> +/*
> + * powerpc specific delta calculation.
> + *
> + * This variant removes the masking of the subtraction because the
> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
> + * which would result in a pointless operation. The compiler cannot
> + * optimize it away as the mask comes from the vdso data and is not compile
> + * time constant.
> + */

It cannot optimise it because it does not know shift < 32.  The code
below is incorrect for shift equal to 32, fwiw.

> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, 
> u32 mult)
> +{
> + return (cycles - last) * mult;
> +}
> +#define vdso_calc_delta vdso_calc_delta
> +
> +#ifndef __powerpc64__
> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> +{
> + u32 hi = ns >> 32;
> + u32 lo = ns;
> +
> + lo >>= shift;
> + lo |= hi << (32 - shift);
> + hi >>= shift;


> + if (likely(hi == 0))
> + return lo;

Removing these two lines shouldn't change generated object code?  Or not
make it worse, at least.

> + return ((u64)hi << 32) | lo;
> +}


What does the compiler do for just

static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
return ns >> (shift & 31);
}

?


Segher


Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-05 Thread Segher Boessenkool
Hi!

On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
> Christophe Leroy  writes:
> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
> > frame whenever it has anything to same.
> 
> Yeah OK that would explain it.
> 
> > Here is what I have in libc.so:
> >
> > 000fbb60 <__clock_gettime>:
> > fbb60:  94 21 ff e0 stwur1,-32(r1)

This is the *only* place where you can use a negative offset from r1:
in the stwu to extend the stack (set up a new stack frame, or make the
current one bigger).

> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
> > b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > index a0712a6e80d9..0b6fa245d54e 100644
> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > @@ -10,6 +10,7 @@
> > .cfi_startproc
> > PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
> > mflrr0
> > +   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
> > .cfi_register lr, r0
> 
> The cfi_register should come directly after the mflr I think.

That is the idiomatic way to write it, and most obviously correct.  But
as long as the value in LR at function entry is available in multiple
places (like, in LR and in R0 here), it is fine to use either for
unwinding.  Sometimes you can use this to optimise the unwind tables a
bit -- not really worth it in hand-written code, it's more important to
make it legible ;-)

> >> There's also no code to load/restore the TOC pointer on BE, which I
> >> think we'll need to handle.
> >
> > I see no code in the generated vdso64.so doing anything with r2, but if 
> > you think that's needed, just let's do it:
> 
> Hmm, true.
> 
> The compiler will use the toc for globals (and possibly also for large
> constants?)

And anything else it bloody well wants to, yeah :-)

> AFAIK there's no way to disable use of the toc, or make it a build error
> if it's needed.

Yes.

> At the same time it's much safer for us to just save/restore r2, and
> probably in the noise performance wise.

If you want a function to be able to work with ABI-compliant code safely
(in all cases), you'll have to make it itself ABI-compliant as well,
yes :-)

> So yeah we should probably do as below.

[ snip ]

Looks good yes.


Segher


Re: [PATCH v2 15/16] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-08-03 Thread Segher Boessenkool
On Mon, Aug 03, 2020 at 03:57:11PM +1000, Michael Ellerman wrote:
> > I would assume the function should still be generated since those checks
> > are relevant, just the return value is bogus.
> 
> Yeah, just sometimes missing warnings boil down to the compiler eliding
> whole sections of code, if it can convince itself they're unreachable.

Including when the compiler considers they must be unreachable because
they would perform undefined behaviour, like, act on uninitialised
values.  Such code is removed by cddce ("control dependence dead code
elimination", enabled by -ftree-dce at -O2 or above).

> AFAICS there's nothing weird going on here that should confuse GCC, it's
> about as straight forward as it gets.

Yes.  Please open a bug?

> Actually I can reproduce it with:
> 
> $ cat > test.c < int foo(void *p)
> {
> unsigned align;
> 
> if (!p)
> return align;
> 
> return 0;
> }
> EOF
> 
> $ gcc -Wall -Wno-maybe-uninitialized -c test.c
> $
> 
> No warning.

The whole if() is deleted pretty early.

===
static int foo(void *p)
{
unsigned align;

if (!p)
return align;

return 42;
}
int bork(void) { return foo(0); }
===

doesn't warn either, although that always uses the uninitialised var
(actually, that code is *removed*, and it always does that "return 42").

> But I guess that's behaving as documented. The compiler can't prove that
> foo() will be called with p == NULL, so it doesn't warn.

-Wmaybe-uninitialized doesn't warn for this either, btw.


Segher


Re: [PATCH] powerpc: fix function annotations to avoid section mismatch warnings with gcc-10

2020-07-29 Thread Segher Boessenkool
On Wed, Jul 29, 2020 at 03:44:56PM -0400, Vladis Dronov wrote:
> > > Certain warnings are emitted for powerpc code when building with a gcc-10
> > > toolset:
> > > 
> > > WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch 
> > > in
> > > reference from the function remove_pmd_table() to the function
> > > .meminit.text:split_kernel_mapping()
> > > The function remove_pmd_table() references
> > > the function __meminit split_kernel_mapping().
> > > This is often because remove_pmd_table lacks a __meminit
> > > annotation or the annotation of split_kernel_mapping is wrong.
> > > 
> > > Add the appropriate __init and __meminit annotations to make modpost not
> > > complain. In all the cases there are just a single callsite from another
> > > __init or __meminit function:
> > > 
> > > __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table()
> > > __init prom_init() -> setup_secure_guest()
> > > __init xive_spapr_init() -> xive_spapr_disabled()
> > 
> > So what changed?  These functions were inlined with older compilers, but
> > not anymore?
> 
> Yes, exactly. Gcc-10 does not inline them anymore. If this is because of my
> build system, this can happen to others also.
> 
> The same thing was fixed by Linus in e99332e7b4cd ("gcc-10: mark more 
> functions
> __init to avoid section mismatch warnings").

It sounds like this is part of "-finline-functions was retuned" on
?  So everyone should see it
(no matter what config or build system), and it is a good thing too :-)

Thanks for the confirmation,


Segher


Re: [PATCH] powerpc: fix function annotations to avoid section mismatch warnings with gcc-10

2020-07-29 Thread Segher Boessenkool
On Wed, Jul 29, 2020 at 03:37:41PM +0200, Vladis Dronov wrote:
> Certain warnings are emitted for powerpc code when building with a gcc-10
> toolset:
> 
> WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch in
> reference from the function remove_pmd_table() to the function
> .meminit.text:split_kernel_mapping()
> The function remove_pmd_table() references
> the function __meminit split_kernel_mapping().
> This is often because remove_pmd_table lacks a __meminit
> annotation or the annotation of split_kernel_mapping is wrong.
> 
> Add the appropriate __init and __meminit annotations to make modpost not
> complain. In all the cases there are just a single callsite from another
> __init or __meminit function:
> 
> __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table()
> __init prom_init() -> setup_secure_guest()
> __init xive_spapr_init() -> xive_spapr_disabled()

So what changed?  These functions were inlined with older compilers, but
not anymore?


Segher


Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-23 Thread Segher Boessenkool
On Thu, Jul 23, 2020 at 09:58:55PM +0200, pet...@infradead.org wrote:
>   asm ("addb  %[val], %b[var];"
>"cmovc %[sat], %[var];"
>: [var] "+r" (tmp)
>: [val] "ir" (val), [sat] "r" (sat)
>);

"var" (operand 0) needs an earlyclobber ("sat" is read after "var" is
written for the first time).


Segher


Re: [PATCH] powerpc/boot: Use address-of operator on section symbols

2020-07-20 Thread Segher Boessenkool
Hi!

On Sat, Jul 18, 2020 at 09:50:50AM +0200, Geert Uytterhoeven wrote:
> On Wed, Jun 24, 2020 at 6:02 AM Nathan Chancellor
>  wrote:
> > /* If we have an image attached to us, it overrides anything
> >  * supplied by the loader. */
> > -   if (_initrd_end > _initrd_start) {
> > +   if (&_initrd_end > &_initrd_start) {
> 
> Are you sure that fix is correct?
> 
> extern char _initrd_start[];
> extern char _initrd_end[];
> extern char _esm_blob_start[];
> extern char _esm_blob_end[];
> 
> Of course the result of their comparison is a constant, as the addresses
> are constant.  If clangs warns about it, perhaps that warning should be moved
> to W=1?
> 
> But adding "&" is not correct, according to C.

Why not?

6.5.3.2/3
The unary & operator yields the address of its operand.  [...]
Otherwise, the result is a pointer to the object or function designated
by its operand.

This is the same as using the name of an array without anything else,
yes.  It is a bit clearer if it would not be declared as array, perhaps,
but it is correct just fine like this.


Segher


Re: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register

2020-07-20 Thread Segher Boessenkool
On Mon, Jul 20, 2020 at 03:10:41PM -0500, Segher Boessenkool wrote:
> On Mon, Jul 20, 2020 at 11:39:56AM +0200, Laurent Dufour wrote:
> > Le 16/07/2020 à 10:32, Ram Pai a écrit :
> > >+  if (is_secure_guest()) {\
> > >+  __asm__ __volatile__("mfsprg0 %3;"  \
> > >+  "lnia %2;"  \
> > >+  "ld %2,12(%2);" \
> > >+  "mtsprg0 %2;"   \
> > >+  "sync;" \
> > >+  #insn" %0,%y1;" \
> > >+  "twi 0,%0,0;"   \
> > >+  "isync;"\
> > >+  "mtsprg0 %3"\
> > >+  : "=r" (ret)\
> > >+  : "Z" (*addr), "r" (0), "r" (0) \
> > 
> > I'm wondering if SPRG0 is restored to its original value.
> > You're using the same register (r0) for parameters 2 and 3, so when doing 
> > lnia %2, you're overwriting the SPRG0 value you saved in r0 just earlier.
> 
> It is putting the value 0 in the registers the compiler chooses for
> operands 2 and 3.  But operand 3 is written, while the asm says it is an
> input.  It needs an earlyclobber as well.
> 
> > It may be clearer to use explicit registers for %2 and %3 and to mark them 
> > as modified for the compiler.
> 
> That is not a good idea, imnsho.

(The explicit register number part, I mean; operand 2 should be an
output as well, yes.)


Segher


Re: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register

2020-07-20 Thread Segher Boessenkool
On Mon, Jul 20, 2020 at 11:39:56AM +0200, Laurent Dufour wrote:
> Le 16/07/2020 à 10:32, Ram Pai a écrit :
> >+if (is_secure_guest()) {\
> >+__asm__ __volatile__("mfsprg0 %3;"  \
> >+"lnia %2;"  \
> >+"ld %2,12(%2);" \
> >+"mtsprg0 %2;"   \
> >+"sync;" \
> >+#insn" %0,%y1;" \
> >+"twi 0,%0,0;"   \
> >+"isync;"\
> >+"mtsprg0 %3"\
> >+: "=r" (ret)\
> >+: "Z" (*addr), "r" (0), "r" (0) \
> 
> I'm wondering if SPRG0 is restored to its original value.
> You're using the same register (r0) for parameters 2 and 3, so when doing 
> lnia %2, you're overwriting the SPRG0 value you saved in r0 just earlier.

It is putting the value 0 in the registers the compiler chooses for
operands 2 and 3.  But operand 3 is written, while the asm says it is an
input.  It needs an earlyclobber as well.

> It may be clearer to use explicit registers for %2 and %3 and to mark them 
> as modified for the compiler.

That is not a good idea, imnsho.


Segher


Re: Failure to build librseq on ppc

2020-07-09 Thread Segher Boessenkool
On Thu, Jul 09, 2020 at 01:56:19PM -0400, Mathieu Desnoyers wrote:
> > Just to make sure I understand your recommendation. So rather than
> > hard coding r17 as the temporary registers, we could explicitly
> > declare the temporary register as a C variable, pass it as an
> > input operand to the inline asm, and then refer to it by operand
> > name in the macros using it. This way the compiler would be free
> > to perform its own register allocation.
> > 
> > If that is what you have in mind, then yes, I think it makes a
> > lot of sense.
> 
> Except that asm goto have this limitation with gcc: those cannot
> have any output operand, only inputs, clobbers and target labels.
> We cannot modify a temporary register received as input operand. So I don't
> see how to get a temporary register allocated by the compiler considering
> this limitation.

Heh, yet another reason not to obfuscate your inline asm: it didn't
register this is asm goto.

A clobber is one way, yes (those *are* allowed in asm goto).  Another
way is to not actually change that register: move the original value
back into there at the end of the asm!  (That isn't always easy to do,
it depends on your code).  So something like

long start = ...;
long tmp = start;
asm("stuff that modifies %0; ...; mr %0,%1" : : "r"(tmp), "r"(start));

is just fine: %0 isn't actually modified at all, as far as GCC is
concerned, and this isn't lying to it!


Segher


Re: Failure to build librseq on ppc

2020-07-09 Thread Segher Boessenkool
On Thu, Jul 09, 2020 at 01:42:56PM -0400, Mathieu Desnoyers wrote:
> > That works fine then, for a testcase.  Using r17 is not a great idea for
> > performance (it increases the active register footprint, and causes more
> > registers to be saved in the prologue of the functions, esp. on older
> > compilers), and it is easier to just let the compiler choose a good
> > register to use.  But maybe you want to see r17 in the generated
> > testcases, as eyecatcher or something, dunno :-)
> 
> Just to make sure I understand your recommendation. So rather than
> hard coding r17 as the temporary registers, we could explicitly
> declare the temporary register as a C variable, pass it as an
> input operand to the inline asm, and then refer to it by operand
> name in the macros using it. This way the compiler would be free
> to perform its own register allocation.
> 
> If that is what you have in mind, then yes, I think it makes a
> lot of sense.

You write to it as well, so an inout register ("+r" or such).  And yes,
you use a local var for it (like "long tmp;").  And then you can refer
to it like anything else in your asm, like "%3" or like
"%[a_long_name]"; and the compiler sees it as any other register,
exactly.


Segher


Re: Failure to build librseq on ppc

2020-07-09 Thread Segher Boessenkool
On Thu, Jul 09, 2020 at 09:43:47AM -0400, Mathieu Desnoyers wrote:
> > What protects r17 *after* this asm statement?
> 
> As discussed in the other leg of the thread (with the code example),
> r17 is in the clobber list of all asm statements using this macro, and
> is used as a temporary register within each inline asm.

That works fine then, for a testcase.  Using r17 is not a great idea for
performance (it increases the active register footprint, and causes more
registers to be saved in the prologue of the functions, esp. on older
compilers), and it is easier to just let the compiler choose a good
register to use.  But maybe you want to see r17 in the generated
testcases, as eyecatcher or something, dunno :-)


Segher


Re: Failure to build librseq on ppc

2020-07-09 Thread Segher Boessenkool
On Thu, Jul 09, 2020 at 09:33:18AM -0400, Mathieu Desnoyers wrote:
> > The way this all uses r17 will likely not work reliably.
> 
> r17 is only used as a temporary register within the inline assembler, and it 
> is
> in the clobber list. In which scenario would it not work reliably ?

This isn't clear at all, that is the problem.

> > The way multiple asm statements are used seems to have missing
> > dependencies between the statements.
> 
> I'm not sure I follow here. Note that we are injecting the CPP macros into
> a single inline asm statement as strings.

Yeah...  more trickiness.

> > And done macro-mess this, you want to be able to debug it, and you need
> > other people to be able to read it!
> 
> I understand that looking at macros can be cumbersome from the perspective
> of a reviewer only interested in a single architecture,

No, from the perspective of *any* reviewer.

> However, from my perspective, as a maintainer who must maintain similar code
> for x86 32/64, powerpc 32/64, arm, aarch64, s390, s390x, mips 32/64, and 
> likely
> other architectures in the future, the macros abstracting 32-bit and 64-bit
> allow to eliminate code duplication for each architecture with 32-bit and 
> 64-bit
> variants, which is better for maintainability.

IMNSHO it is MUCH better to just have simple separate implementations
for each.  They differ in *all* details.

Or have static inline functions, with proper dependencies, instead of
nasty text macros.

But it's your code, do what you want :-)


Segher


Re: powerpc: Incorrect stw operand modifier in __set_pte_at

2020-07-08 Thread Segher Boessenkool
On Wed, Jul 08, 2020 at 06:16:54PM +0200, Christophe Leroy wrote:
> Le 08/07/2020 à 16:45, Mathieu Desnoyers a écrit :
> >Reviewing use of the patterns "Un%Xn" with lwz and stw instructions
> >(where n should be the operand number) within the Linux kernel led
> >me to spot those 2 weird cases:
> >
> >arch/powerpc/include/asm/nohash/pgtable.h:__set_pte_at()
> >
> > __asm__ __volatile__("\
> > stw%U0%X0 %2,%0\n\
> > eieio\n\
> > stw%U0%X0 %L2,%1"
> > : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> > : "r" (pte) : "memory");
> >
> >I would have expected the stw to be:
> >
> > stw%U1%X1 %L2,%1"
> >
> >and:
> >arch/powerpc/include/asm/book3s/32/pgtable.h:__set_pte_at()
> >
> > __asm__ __volatile__("\
> > stw%U0%X0 %2,%0\n\
> > eieio\n\
> > stw%U0%X0 %L2,%1"
> > : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> > : "r" (pte) : "memory");
> >
> >where I would have expected:
> >
> > stw%U1%X1 %L2,%1"
> >
> >Is it a bug or am I missing something ?
> 
> Well spotted. I guess it's definitly a bug.

Yes :-)

> Introduced 12 years ago by commit 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9bf2b5cd
>  
> ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support").
> 
> It's gone unnoticed until now it seems.

Apparently it always could use offset form memory accesses?  Or even
when not, %0 and %1 are likely to use the same base register for
addressing :-)


Segher


Re: Failure to build librseq on ppc

2020-07-08 Thread Segher Boessenkool
On Wed, Jul 08, 2020 at 08:01:23PM -0400, Mathieu Desnoyers wrote:
> > > #define RSEQ_ASM_OP_CMPEQ(var, expect, label) 
> > >   \
> > > LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" 
> > >   \
> > 
> > The way this hardcodes r17 *will* break, btw.  The compiler will not
> > likely want to use r17 as long as your code (after inlining etc.!) stays
> > small, but there is Murphy's law.
> 
> r17 is in the clobber list, so it should be ok.

What protects r17 *after* this asm statement?


Segher


Re: Failure to build librseq on ppc

2020-07-08 Thread Segher Boessenkool
On Wed, Jul 08, 2020 at 10:32:20AM -0400, Mathieu Desnoyers wrote:
> > As far as I can see, %U is mentioned in
> > https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html in the
> > powerpc subpart, at the "m" constraint.
> 
> Yep, I did notice it, but mistakenly thought it was only needed for "m<>" 
> operand,
> not "m".

Historically, "m" meant what "m<>" does now (in inline asm).  Too many
people couldn't get it right ever (on other targets -- not that the
situation was great for PowerPC, heh), so in inline asm "m" now means
"no pre-modify or post-modify".


Segher


Re: Failure to build librseq on ppc

2020-07-08 Thread Segher Boessenkool
Hi!

On Wed, Jul 08, 2020 at 10:00:01AM -0400, Mathieu Desnoyers wrote:
> >> So perhaps you have code like
> >> 
> >>  int *p;
> >>  int x;
> >>  ...
> >>  asm ("lwz %0,%1" : "=r"(x) : "m"(*p));
> > 
> > We indeed have explicit "lwz" and "stw" instructions in there.
> > 
> >> 
> >> where that last line should actually read
> >> 
> >>  asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));
> > 
> > Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.
> > 
> > There has been some level of extra CPP macro coating around those 
> > instructions
> > to
> > support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is 
> > not
> > trivial.
> > Let me see what can be done here.
> 
> I did the following changes which appear to generate valid asm.
> See attached corresponding .S output.
> 
> I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's 
> pretty much
> always used with e.g. "lwz%U1%X1". I could find one blog post discussing that 
> %U is about
> update flag, and nothing about %X. Are those documented ?

Historically, no machine-specific output modifiers were documented.
For GCC 10 i added a few (in
https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Machine-Constraints.html#Machine-Constraints
), but not all (that user code should use!) yet.

> Although it appears to generate valid asm, I have the feeling I'm relying on 
> undocumented
> features here. :-/

It is supported for 30 years or so now.  GCC itself uses this a *lot*
internally as well.  It works, and it will work forever.

> -#define STORE_WORD "std "
> -#define LOAD_WORD  "ld "
> -#define LOADX_WORD "ldx "
> +#define STORE_WORD(arg)"std%U[" __rseq_str(arg) "]%X[" 
> __rseq_str(arg) "] "/* To memory ("m" constraint) */
> +#define LOAD_WORD(arg) "lwd%U[" __rseq_str(arg) "]%X[" __rseq_str(arg) "] "  
>   /* From memory ("m" constraint) */

That cannot work (you typoed "ld" here).

Some more advice about this code, pretty generic stuff:

The way this all uses r17 will likely not work reliably.

The way multiple asm statements are used seems to have missing
dependencies between the statements.

Don't try to work *against* the compiler.  You will not win.

Alternatively, write assembler code, if that is what you actually want
to do?  Not C code.

And done macro-mess this, you want to be able to debug it, and you need
other people to be able to read it!


Segher


Re: Failure to build librseq on ppc

2020-07-08 Thread Segher Boessenkool
Hi!

On Wed, Jul 08, 2020 at 10:27:27PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > You'll have to show the actual failing machine code, and with enough
> > context that we can relate this to the source code.
> >
> > -save-temps helps, or use -S instead of -c, etc.
> 
> Attached below.

Thanks!

> I think that's from:
> 
> #define LOAD_WORD   "ld "
> 
> #define RSEQ_ASM_OP_CMPEQ(var, expect, label) 
>   \
> LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" 
>   \

The way this hardcodes r17 *will* break, btw.  The compiler will not
likely want to use r17 as long as your code (after inlining etc.!) stays
small, but there is Murphy's law.

Anyway...  something in rseq_str is wrong, missing %X.  This may
have to do with the abuse of inline asm here, making a fix harder :-(


Segher


Re: Failure to build librseq on ppc

2020-07-07 Thread Segher Boessenkool
Hi!

On Tue, Jul 07, 2020 at 03:17:10PM -0400, Mathieu Desnoyers wrote:
> I'm trying to build librseq at:
> 
> https://git.kernel.org/pub/scm/libs/librseq/librseq.git
> 
> on powerpc, and I get these errors when building the rseq basic
> test mirrored from the kernel selftests code:
> 
> /tmp/ccieEWxU.s: Assembler messages:
> /tmp/ccieEWxU.s:118: Error: syntax error; found `,', expected `('
> /tmp/ccieEWxU.s:118: Error: junk at end of line: `,8'
> /tmp/ccieEWxU.s:121: Error: syntax error; found `,', expected `('
> /tmp/ccieEWxU.s:121: Error: junk at end of line: `,8'
> /tmp/ccieEWxU.s:626: Error: syntax error; found `,', expected `('
> /tmp/ccieEWxU.s:626: Error: junk at end of line: `,8'
> /tmp/ccieEWxU.s:629: Error: syntax error; found `,', expected `('
> /tmp/ccieEWxU.s:629: Error: junk at end of line: `,8'
> /tmp/ccieEWxU.s:735: Error: syntax error; found `,', expected `('
> /tmp/ccieEWxU.s:735: Error: junk at end of line: `,8'
> /tmp/ccieEWxU.s:738: Error: syntax error; found `,', expected `('
> /tmp/ccieEWxU.s:738: Error: junk at end of line: `,8'
> /tmp/ccieEWxU.s:741: Error: syntax error; found `,', expected `('
> /tmp/ccieEWxU.s:741: Error: junk at end of line: `,8'
> Makefile:581: recipe for target 'basic_percpu_ops_test.o' failed

You'll have to show the actual failing machine code, and with enough
context that we can relate this to the source code.

-save-temps helps, or use -S instead of -c, etc.

> I am using this compiler:
> 
> gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
> Target: powerpc-linux-gnu
> 
> So far, I got things to build by changing "m" operands to "Q" operands.
> Based on 
> https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> it seems that "Q" means "A memory operand addressed by just a base register."

Yup.

> I suspect that lwz and stw don't expect some kind of immediate offset which
> can be kept with "m", and "Q" fixes this. Is that the right fix ?
> 
> And should we change all operands passed to lwz and stw to a "Q" operand ?

No, lwz and stw exactly *do* take an immediate offset.

It sounds like the compiler passed memory addressed by indexed
addressing, instead.  Which is fine for "m", and also fine for those
insns... well, you need lwzx and stwx.

So perhaps you have code like

  int *p;
  int x;
  ...
  asm ("lwz %0,%1" : "=r"(x) : "m"(*p));

where that last line should actually read

  asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));

?


Segher


Re: [PATCH v2] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-06-30 Thread Segher Boessenkool
Hi again,

Thanks for your work so far!

On Tue, Jun 30, 2020 at 06:53:39PM +, Christophe Leroy wrote:
> On 06/30/2020 04:33 PM, Segher Boessenkool wrote:
> >>>+ make -s CC=powerpc64-linux-gnu-gcc -j 160
> >>>In file included from /linux/include/linux/uaccess.h:11:0,
> >>>  from /linux/include/linux/sched/task.h:11,
> >>>  from /linux/include/linux/sched/signal.h:9,
> >>>  from /linux/include/linux/rcuwait.h:6,
> >>>  from /linux/include/linux/percpu-rwsem.h:7,
> >>>  from /linux/include/linux/fs.h:33,
> >>>  from /linux/include/linux/huge_mm.h:8,
> >>>  from /linux/include/linux/mm.h:675,
> >>>  from /linux/arch/powerpc/kernel/signal_32.c:17:
> >>>/linux/arch/powerpc/kernel/signal_32.c: In function
> >>>'save_user_regs.isra.14.constprop':
> >>>/linux/arch/powerpc/include/asm/uaccess.h:161:2: error: 'asm' operand has
> >>>impossible constraints
> >>>   __asm__ __volatile__( \
> >>>   ^
> >>>/linux/arch/powerpc/include/asm/uaccess.h:197:12: note: in expansion of
> >>>macro '__put_user_asm'
> >>> case 4: __put_user_asm(x, ptr, retval, "stw"); break; \
> >>> ^
> >>>/linux/arch/powerpc/include/asm/uaccess.h:206:2: note: in expansion of
> >>>macro '__put_user_size_allowed'
> >>>   __put_user_size_allowed(x, ptr, size, retval);  \
> >>>   ^
> >>>/linux/arch/powerpc/include/asm/uaccess.h:220:2: note: in expansion of
> >>>macro '__put_user_size'
> >>>   __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \
> >>>   ^
> >>>/linux/arch/powerpc/include/asm/uaccess.h:96:2: note: in expansion of
> >>>macro '__put_user_nocheck'
> >>>   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
> >>>   ^
> >>>/linux/arch/powerpc/kernel/signal_32.c:120:7: note: in expansion of macro
> >>>'__put_user'
> >>>if (__put_user((unsigned int)gregs[i], >mc_gregs[i]))
> >>>^
> >
> >Can we see what that was after the macro jungle?  Like, the actual
> >preprocessed code?
> 
> Sorry for previous misunderstanding
> 
> Here is the code:
> 
> #define __put_user_asm(x, addr, err, op)  \
>   __asm__ __volatile__(   \
>   "1: " op "%U2%X2 %1,%2  # put_user\n"   \
>   "2:\n"  \
>   ".section .fixup,\"ax\"\n"  \
>   "3: li %0,%3\n" \
>   "   b 2b\n" \
>   ".previous\n"   \
>   EX_TABLE(1b, 3b)\
>   : "=r" (err)\
>   : "r" (x), "m<>" (*addr), "i" (-EFAULT), "0" (err))

Yeah I don't see it.  I'll have to look at compiler debug dumps, but I
don't have any working 4.9 around, and I cannot reproduce this with
either older or newer compilers.

It is complainig that constrain_operands just does not work *at all* on
this "m<>" constraint apparently, which doesn't make much sense.

I'll try later when I have more time, sorry :-/


Segher


Re: [PATCH v2] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-06-30 Thread Segher Boessenkool
On Tue, Jun 30, 2020 at 04:55:05PM +0200, Christophe Leroy wrote:
> Le 30/06/2020 à 03:19, Michael Ellerman a écrit :
> >Michael Ellerman  writes:
> >>Because it uses the "m<>" constraint which didn't work on GCC 4.6.
> >>
> >>https://github.com/linuxppc/issues/issues/297
> >>
> >>So we should be able to pick it up for v5.9 hopefully.
> >
> >It seems to break the build with the kernel.org 4.9.4 compiler and
> >corenet64_smp_defconfig:
> 
> Looks like 4.9.4 doesn't accept "m<>" constraint either.

The evidence contradicts this assertion.

> Changing it to "m" make it build.

But that just means something else is wrong.

> >+ make -s CC=powerpc64-linux-gnu-gcc -j 160
> >In file included from /linux/include/linux/uaccess.h:11:0,
> >  from /linux/include/linux/sched/task.h:11,
> >  from /linux/include/linux/sched/signal.h:9,
> >  from /linux/include/linux/rcuwait.h:6,
> >  from /linux/include/linux/percpu-rwsem.h:7,
> >  from /linux/include/linux/fs.h:33,
> >  from /linux/include/linux/huge_mm.h:8,
> >  from /linux/include/linux/mm.h:675,
> >  from /linux/arch/powerpc/kernel/signal_32.c:17:
> >/linux/arch/powerpc/kernel/signal_32.c: In function 
> >'save_user_regs.isra.14.constprop':
> >/linux/arch/powerpc/include/asm/uaccess.h:161:2: error: 'asm' operand has 
> >impossible constraints
> >   __asm__ __volatile__( \
> >   ^
> >/linux/arch/powerpc/include/asm/uaccess.h:197:12: note: in expansion of 
> >macro '__put_user_asm'
> > case 4: __put_user_asm(x, ptr, retval, "stw"); break; \
> > ^
> >/linux/arch/powerpc/include/asm/uaccess.h:206:2: note: in expansion of 
> >macro '__put_user_size_allowed'
> >   __put_user_size_allowed(x, ptr, size, retval);  \
> >   ^
> >/linux/arch/powerpc/include/asm/uaccess.h:220:2: note: in expansion of 
> >macro '__put_user_size'
> >   __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \
> >   ^
> >/linux/arch/powerpc/include/asm/uaccess.h:96:2: note: in expansion of 
> >macro '__put_user_nocheck'
> >   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
> >   ^
> >/linux/arch/powerpc/kernel/signal_32.c:120:7: note: in expansion of macro 
> >'__put_user'
> >if (__put_user((unsigned int)gregs[i], >mc_gregs[i]))
> >^

Can we see what that was after the macro jungle?  Like, the actual
preprocessed code?

Also, what GCC version *does* work on this?


Segher


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-06-12 Thread Segher Boessenkool
Hi!

On Fri, Jun 12, 2020 at 02:33:09PM -0700, Nick Desaulniers wrote:
> On Thu, Jun 11, 2020 at 4:53 PM Segher Boessenkool
>  wrote:
> > The PowerPC part of
> > https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> > (sorry, no anchor) documents %U.
> 
> I thought those were constraints, not output templates?  Oh,
> The asm statement must also use %U as a placeholder for the
> “update” flag in the corresponding load or store instruction.
> got it.

Traditionally, *all* constraints were documented here, including the
ones that are only meant for GCC's internal use.  And the output
modifiers were largely not documented at all.

For GCC 10, for Power, I changed it to only document the constraints
that should be public in gcc.info (and everything in gccint.info).  The
output modifiers can neatly be documented here as well, since it such a
short section now.  We're not quite there yet, but getting there.

> > Traditionally the source code is the documentation for this.  The code
> > here starts with the comment
> >   /* Write second word of DImode or DFmode reference.  Works on register
> >  or non-indexed memory only.  */
> > (which is very out-of-date itself, it works fine for e.g. TImode as well,
> > but alas).
> >
> > Unit tests are completely unsuitable for most compiler things like this.
> 
> What? No, surely one may write tests for output operands.  Grepping
> for `%L` in gcc/ was less fun than I was hoping.

You should look for 'L' instead (incl. those quotes) ;-)

Unit tests are 100x as much work, and gets <5% of the problems, compared
to regression tests.  Unit tests only test the stuff you should have
written *anyway*.  It is much more useful to test that much higher level
things work, IMNSHO.

> > HtH,
> 
> Yes, perfect, thank you so much!  So it looks like LLVM does not yet
> handle %L properly for memory operands.
> https://bugs.llvm.org/show_bug.cgi?id=46186#c4
> It's neat to see how this is implemented in GCC (and how many aren't
> implemented in LLVM, yikes :( ).  For reference, this is implemented
> in PPCAsmPrinter::PrintAsmOperand() and
> PPCAsmPrinter::PrintAsmMemoryOperand() in
> llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp.  GCC switches first on the
> modifier characters, then the operand type.

That is what the rs6000 backend currently does, yeah.  The print_operand
function just gets passed the modifier character (as "int code", or 0 if
there is no modifier).  Since there are so many modifiers there aren't
really any better options than just doing a "switch (code)" around
everything else (well, things can be factored, some helper functions,
etc., but this is mostly very old code, and it has grown organically).

> LLVM dispatches on operand type, then modifier.

That is neater, certainly for REG operands.

> When I was looking into LLVM's AsmPrinter class,
> I was surprised to see it's basically an assembler that just has
> complex logic to just do a bunch of prints, so it makes sense to see
> that pattern in GCC literally calling printf.

GCC always outputs assembler code.  This is usually a big advantage, for
things like output_operand.

> Some things I don't understand from PPC parlance is the "mode"
> (preinc, predec, premodify) and small data operands?

"mode" is "machine mode" -- SImode and the like.  PRE_DEC etc. are
*codes* (rtx codes), like,  (mem:DF (pre_dec:SI (reg:SI 39)))  (straight
from the manual).

> IIUC the bug report correctly, it looks like LLVM is failing for the
> __put_user_asm2_goto case for -m32.  A simple reproducer:
> https://godbolt.org/z/jBBF9b
> 
> void foo(long long in, long long* out) {
> asm volatile(
>   "stw%X1 %0, %1\n\t"
>   "stw%X1 %L0, %L1"
>   ::"r"(in), "m"(*out));
> }

This is wrong if operands[0] is a register, btw.  So it should use 'o'
as constraint (not 'm'), and then the 'X' output modifier has become
useless.

> prints (in GCC):
> foo:
>   stw 3, 0(5)
>   stw 4, 4(5)
>   blr
> (first time looking at ppc assembler, seems constants and registers
> are not as easy to distinguish,

The instruction mnemonic always tells you what types all arguments are.
Traditionally we don't write spaces after commas, either.  That is
actually easier to read -- well, if you are used to it, anyway! :-)

> https://developer.ibm.com/technologies/linux/articles/l-ppc/ say "Get
> used to it." LOL, ok).

Since quite a while you can write your assembler using register names as
well.  Not using the dangerous macros the Linux kernel had/has(with
which you can write "rN" in place of any "N", and it doesn't force you
to use the register name either, so you could write "li r3,r4" 

Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-06-11 Thread Segher Boessenkool
On Thu, Jun 11, 2020 at 03:43:55PM -0700, Nick Desaulniers wrote:
> Segher, Cristophe, I suspect Clang is missing support for the %L and %U
> output templates [1].

The arch/powerpc kernel first used the %U output modifier in 0c176fa80fdf
(from 2016), and %L in b8b572e1015f (2008).  include/asm-ppc (and ppc64)
have had %U since 2005 (1da177e4c3f4), and %L as well (0c541b4406a6).

> I've implemented support for some of these before
> in Clang via the documentation at [2], but these seem to be machine
> specific?

Yes, almost all output modifiers are.  Only %l, %a, %n, and part of %c
are generic (and %% and %= and on some targets, %{, %|, %}).

> Can you please point me to documentation/unit tests/source for
> these so that I can figure out what they should be doing, and look into
> implementing them in Clang?

The PowerPC part of
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
(sorry, no anchor) documents %U.

Traditionally the source code is the documentation for this.  The code
here starts with the comment
  /* Write second word of DImode or DFmode reference.  Works on register
 or non-indexed memory only.  */
(which is very out-of-date itself, it works fine for e.g. TImode as well,
but alas).

Unit tests are completely unsuitable for most compiler things like this.

The source code is gcc/config/rs6000/rs6000.c, easiest is to search for
'L' (with those quotes).  Function print_operand.

HtH,


Segher


Re: Linux powerpc new system call instruction and ABI

2020-06-11 Thread Segher Boessenkool
Hi!

On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote:
> Calling convention
> --
> The proposal is for scv 0 to provide the standard Linux system call ABI 
> with the following differences from sc convention[1]:
> 
> - lr is to be volatile across scv calls. This is necessary because the 
>   scv instruction clobbers lr. From previous discussion, this should be 
>   possible to deal with in GCC clobbers and CFI.
> 
> - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the
>   kernel system call exit to avoid restoring the volatile cr registers
>   (although we probably still would anyway to avoid information leaks).
> 
> - Error handling: The consensus among kernel, glibc, and musl is to move to
>   using negative return values in r3 rather than CR0[SO]=1 to indicate error,
>   which matches most other architectures, and is closer to a function call.

What about cr0 then?  Will it be volatile as well (exactly like for
function calls)?

> Notes
> -
> - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as
>   submitted currently preserves them. This is to leave room for deciding
>   which way to go with these.

The kernel has to set it to *something* that doesn't leak information ;-)


Segher


Re: [PATCH v10 6/6] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH

2020-06-06 Thread Segher Boessenkool
On Sat, Jun 06, 2020 at 06:04:11PM +0530, Vaibhav Jain wrote:
> >> +  /* update health struct with various flags derived from health bitmap */
> >> +  health = (struct nd_papr_pdsm_health) {
> >> +  .dimm_unarmed = p->health_bitmap & PAPR_PMEM_UNARMED_MASK,
> >> +  .dimm_bad_shutdown = p->health_bitmap & 
> >> PAPR_PMEM_BAD_SHUTDOWN_MASK,
> >> +  .dimm_bad_restore = p->health_bitmap & 
> >> PAPR_PMEM_BAD_RESTORE_MASK,
> >> +  .dimm_encrypted = p->health_bitmap & PAPR_PMEM_ENCRYPTED,
> >> +  .dimm_locked = p->health_bitmap & PAPR_PMEM_SCRUBBED_AND_LOCKED,
> >> +  .dimm_scrubbed = p->health_bitmap & 
> >> PAPR_PMEM_SCRUBBED_AND_LOCKED,
> >
> > Are you sure these work?  These are not assignments to a bool so I don't 
> > think
> > gcc will do what you want here.
> Yeah, somehow this slipped by and didnt show up in my tests. I checked
> the assembly dump and seems GCC was silently skipping initializing these
> fields without making any noise.

It's not "skipping" that, it initialises the field to 0, just like your
code said it should :-)

If you think GCC should warn for this, please open a PR?  It is *normal*
for bit-fields to be truncated from what is assigned to it, but maybe we
could warn for it in the 1-bit case (we currently don't seem to, even
when the bit-field type is _Bool).

Thanks,


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-05 Thread Segher Boessenkool
On Fri, Jun 05, 2020 at 11:59:32PM +0200, Daniel Kolesa wrote:
> On Fri, Jun 5, 2020, at 19:27, Segher Boessenkool wrote:
> > > Third party precompiled stuff doesn't really need to concern us, since 
> > > none really exists.
> > 
> > ... Yet.  And if you claim you support ELFv2, not mentioning the ways
> > your implementation deviates from it, users will be unhappy.
> 
> Will they?

Yes; not only *your* users, but also users of the "actual" ELFv2 (for
BE), if anything starts using that: the ABI is defined, and at least at
some point it worked, it is not unreasonable to think someone might
want to start using it).

Just don't pretend X and Y are the same if they are not, people do not
like to be misled.  Just be clear upfront what the differences are, and
everyone will be happy.  Confusing people and wasting their time won't
make you very popular though ;-)

> The system explicitly mentions that the minimum target in the binary packages 
> is 970. Users can't be expecting features that the hardware we support 
> doesn't support :)

But you are not the only user of ELFv2.

> Also, we're not the only ones that do this - there's musl of course, but 
> there's also the BSDs. FreeBSD 13 uses ELFv2, and supports all the old 
> hardware including processors without VMX, let alone VSX. OpenBSD is likely 
> to take the same path. But I'm not opposed to making this explicit, if that's 
> what it takes to make other people happy with it.

Yeah, please do :-)

> > > It's also still an upgrade over ELFv1 regardless (I mean, the same things 
> > > apply there).
> > 
> > Yeah, in mostly minor ways, but it all adds up for sure.
> 
> It's made my life simpler on numerous occasions,

Great to hear that!

> and allowed us to bring in software that'd otherwise take significant 
> patching (no software is complete until it has its own assembly 
> implementation of coroutines or something like that :P),

.. but does it have a mail client?

> > That depends on what you call the average case.  Code that is control
> > and memory-bound will not benefit much from *anything* :-)
> 
> Average case is, quite literally, an average case - i.e. the average of all 
> the software packages shipped in a distro :)

Ah, mostly boring stuff :-)

> > Yeah, but it helps quite a bit if your system (shared) libraries get all
> > improvements they can as well.
> 
> Well, glibc will still benefit automatically to a degree even if not built 
> for a modern baseline, since it has runtime checks in place already; as for 
> other things... well, at least for Void, I already mentioned before we're as 
> much of a source distro as a binary one - people can easily rebuild things 
> that bottleneck them, with modern CFLAGS, and still have things be 
> interoperable.

Yeah, good point there.

> > I'm not trying to dissuade you from not requiring VSX and 2.07 -- this
> > sounds like your best option, given the constraints.  I'm just saying
> > the cost is not trivial (even ignoring the ABI divergence).
> 
> Of course the cost is there - it's just not something I can do anything 
> about. I generally recommend that people who can run LE should run LE. We're 
> a bi-endian distribution, so there is a complete, fully functional, 
> ISA-2.07-baseline ppc64le variant (in fact, it's our best-supported port, 
> with greatest repo coverage and testing), as well as the 970-targeting ppc64 
> variant.

Ah, so your BE target is mostly for legacy hardware?  That took a while
to sink in, sorry!  :-)

> I know about the biarch case as well, and there is also multilib, as an even 
> more elaborate form of that. That's not directly related to what I originally 
> said, though

Biarch is why -m32 and -m64 can work at all (for building user binaries).
My point is that this does *not* work for most finer ABI (or OS) -related
points -- you really do need a toolchain built specifically for that
config.


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-05 Thread Segher Boessenkool
Hi!

On Fri, Jun 05, 2020 at 01:50:46PM -0400, Rich Felker wrote:
> On Fri, Jun 05, 2020 at 12:27:02PM -0500, Segher Boessenkool wrote:
> > > I'm also not really all that convinced that vectors make a huge 
> > > difference in non-specialized code (autovectorization still has a way to 
> > > go)
> > 
> > They do make a huge difference, depending on the application of course.
> > But VSX is not just vectors even: it also gives you twice as many
> > floating point scalars (64 now), and in newer versions of the ISA it can
> > be beneficially used for integer scalars even.
> 
> Vectorization is useful for a lot of things, and I'm sure there are
> specialized workloads that benefit from 64 scalars, but I've never
> encountered a place where having more than 16 registers made a
> practical difference.

20 years ago 32 FP registers was already often a limitation, making FFT
and similar kernels almost twice slower than they could otherwise be.
Things are only *worse* with short vectors, not better.  In general with
floating point data you need more registers (because you have more state
to look at concurrently) than with integer data.

*Of course* having 64 floating point registers does not matter if your
whole program only ever uses three floating point values, total, let
alone concurrently.

> The fact that there are specialized areas where this stuff matters
> does not imply there aren't huge domains where it's completely
> irrelevant.

There are very few domains where ISA 2.07 does not have significant
advantages over ISA 2.01.  That is Power8 vs. Power4.

> > No, that is exactly the point of requiring ISA 2.07.  Anything can use
> > ISA 2.07 (incl. VSX) without checking first, and without having a
> > fallback to some other implementation.  Going from ISA 2.01 to 2.07 is
> > more than a decade of improvements, it is not trivial at all.
> 
> This only affects code that's non-portable and PPC-specific, which a

No, it does not.  It is not only about vector registers, either.

> I think a lot of the unnecessary fighting on this topic is arising
> from differences of opinion over what an ABI entails. I would call
> what you're talking about a "platform" and more of a platform-specific
> *API* than an ABI -- it's about guarantees of interfaces available to
> the programmer, not implementation details of linkage.

No, this is very much about the ABI.  The B stands for Binary.  Which
is what this is about.


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-05 Thread Segher Boessenkool
On Fri, Jun 05, 2020 at 04:18:18AM +0200, Daniel Kolesa wrote:
> On Fri, Jun 5, 2020, at 01:35, Segher Boessenkool wrote:
> > > The thing is, I've yet to see in which way the ELFv2 ABI *actually* 
> > > requires VSX - I don't think compiling for 970 introduces any actual 
> > > differences. There will be omissions, yes - but then the more accurate 
> > > thing would be to say that a subset of ELFv2 is used, rather than it 
> > > being a different ABI per se.
> > 
> > Two big things are that binaries that someone else made are supposed to
> > work for you as well -- including binaries using VSX registers, or any
> > instructions that require ISA 2.07 (or some older ISA after 970).  This
> > includes DSOs (shared libraries).  So for a distribution this means that
> > they will not use VSX *anywhere*, or only in very specialised things.
> > That is a many-years setback, for people/situations where it could be
> > used.
> 
> Third party precompiled stuff doesn't really need to concern us, since none 
> really exists.

... Yet.  And if you claim you support ELFv2, not mentioning the ways
your implementation deviates from it, users will be unhappy.

> It's also still an upgrade over ELFv1 regardless (I mean, the same things 
> apply there).

Yeah, in mostly minor ways, but it all adds up for sure.

> I'm also not really all that convinced that vectors make a huge difference in 
> non-specialized code (autovectorization still has a way to go)

They do make a huge difference, depending on the application of course.
But VSX is not just vectors even: it also gives you twice as many
floating point scalars (64 now), and in newer versions of the ISA it can
be beneficially used for integer scalars even.

> and code written to use vector instructions should probably check auxval and 
> take those paths at runtime.

No, that is exactly the point of requiring ISA 2.07.  Anything can use
ISA 2.07 (incl. VSX) without checking first, and without having a
fallback to some other implementation.  Going from ISA 2.01 to 2.07 is
more than a decade of improvements, it is not trivial at all.


> As for other instructions, fair enough, but from my rough testing, it doesn't 
> make such a massive difference for average case

That depends on what you call the average case.  Code that is control
and memory-bound will not benefit much from *anything* :-)

> (and where it does, one can always rebuild their thing with 
> CFLAGS=-mcpu=power9)

Yeah, but it helps quite a bit if your system (shared) libraries get all
improvements they can as well.


I'm not trying to dissuade you from not requiring VSX and 2.07 -- this
sounds like your best option, given the constraints.  I'm just saying
the cost is not trivial (even ignoring the ABI divergence).


> > The target name allows to make such distinctions: this could for example
> > be  powerpc64-*-linux-void  (maybe I put the distinction in the wrong
> > part of the name here?  The glibc people will know better, and "void" is
> > probably not a great name anyway).
> 
> Hm, I'm not a huge fan of putting ABI specifics in the triplet, it feels 
> wrong - there is no precedent for it with POWER (ARM did it with EABI though),

Maybe look at what the various BSDs use?  We do have things like this.

> the last part should remain 'gnu' as it's still glibc; besides, gcc is 
> compiled for exactly one target triplet, and traditionally with ppc compilers 
> it's always been possible to target everything with just one compiler 
> (endian, 32bit, 64bit, abi...).

This isn't completely true.

Yes, the compiler allows you to change word size, endianness, ABI, some
more things.  That does not mean you can actually build working binaries
for all resulting combinations.  As a trivial example, it will still
pick up the same libraries from the same library paths usually, and
those will spectacularly fail to work.

We are biarch for some targets, which means that both powerpc-linux
targets and powerpc64-linux targets can actually handle both of those,
with just -m32 or -m64 needed to switch which configuration is used.
But you cannot magically transparently switch to many other
configurations: for those, you just build a separate toolchain for that
specfic (variant) configuration, in the general case.

> The best way would probably be adding a new -mabi, e.g. -mabi=elfv2-novsx 
> (just an example), which would behave exactly like -mabi=elfv2, except it'd 
> emit some extra detection macro

Yeah, that sounds like a good idea.  Patches welcome :-)

(A separate target name is still needed, but this will make development
simpler for sure).


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-04 Thread Segher Boessenkool
Hi!

On Fri, Jun 05, 2020 at 12:26:22AM +0200, Daniel Kolesa wrote:
> Either way I'll think about it some more and possibly prepare an RFC port. 
> I'm definitely willing to put in the work and later maintenance effort if 
> that's what it takes to make it happen.

Yeah, you'll need to convince all parties involved that it will not be
more work to them then they are willing to put in.  Initial development
and ongoing work, as you say.

For GCC it is probably an easy decision (but we'll see your proposed ABI
amendments): it shouldn't be much work at all, and it even benefits us
directly (it'll fill some holes in our testing matrix).


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-04 Thread Segher Boessenkool
Hi!

On Thu, Jun 04, 2020 at 10:08:02PM +, Joseph Myers wrote:
> > The ELFv2 document specifies things like passing of quadruple precision 
> > floats. Indeed, VSX is needed there, but that's not a concern if you 
> > *don't* use quadruple precision floats.
> 
> My understanding is that the registers used for argument passing are all 
> ones that exactly correspond to the Vector registers in earlier 
> instruction set versions.  In other words, you could *in principle* 
> produce an object, or a whole libm shared library,  [...]

And then there is the VRSAVE register, if your OS still uses that.
Let's hope not :-)

This is similar to what -mno-float128-hardware does (which actually
requires VSX hardware for the emulation library currently).


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-04 Thread Segher Boessenkool
Hi!

On Thu, Jun 04, 2020 at 11:43:53PM +0200, Daniel Kolesa wrote:
> The thing is, I've yet to see in which way the ELFv2 ABI *actually* requires 
> VSX - I don't think compiling for 970 introduces any actual differences. 
> There will be omissions, yes - but then the more accurate thing would be to 
> say that a subset of ELFv2 is used, rather than it being a different ABI per 
> se.

Two big things are that binaries that someone else made are supposed to
work for you as well -- including binaries using VSX registers, or any
instructions that require ISA 2.07 (or some older ISA after 970).  This
includes DSOs (shared libraries).  So for a distribution this means that
they will not use VSX *anywhere*, or only in very specialised things.
That is a many-years setback, for people/situations where it could be
used.

> The ELFv2 document specifies things like passing of quadruple precision 
> floats. Indeed, VSX is needed there, but that's not a concern if you *don't* 
> use quadruple precision floats.

As Joseph says, QP float is passed in the VRs, so that works just fine
*if* you have AltiVec.  If not, you probably should pass such values in
the GPRs, or however a struct of 16 bytes is passed in whatever ABI you
use (this is a much more general problem than just BE ELFv2 ;-) )  And
then you have some kind of "partial soft float".  Fun!  (Or not...)

> > If you always use -mcpu=970 (or similar), then not very much is
> > different for you most likely -- except of course there is no promise
> > to the user that they can use VSX and all instructions in ISA 2.07,
> > which is a very useful promise to have normally.
> 
> Yes, -mcpu=970 is used for default packages. *However*, it is possible that 
> the user compiles their own packages with -mcpu=power9 or something similar, 
> and then it'll be possible to utilize VSX and all, and it should still work 
> with the existing userland. When speaking of ABI, what matters is... well, 
> the binary interface, which is the same - so I believe this is still ELFv2. A 
> subset is always compliant with the whole.

The same calling convention will probably work, yes.  An ABI is more
than that though.


> > Btw, if you use GCC, *please* send in testresults?  :-)
> 
> Yes, it's all gcc (we do have clang, but compiling repo packages with clang 
> is generally frowned upon in the project, as we have vast majority of 
> packages cross-compilable, and our cross-compiling infrastructure is 
> gcc-centric, plus we enable certain things by default such as hardening flags 
> that clang does not support). I'll try to remember next time I'm running 
> tests.

Thanks in advance!

> I have a feeling that glibc would object to such port, since it means it 
> would have to exist in parallel with a potential different ELFv2 port that 
> does have a POWER8 minimal requirement; gcc would need a way to tell them 
> apart, too (based on what would they be told apart? the simplest way would 
> probably be something like, if --with-abi=elfv2 { if --with-cpu < power8 -> 
> use glibc-novsx else use glibc-vsx } ...)

The target name allows to make such distinctions: this could for example
be  powerpc64-*-linux-void  (maybe I put the distinction in the wrong
part of the name here?  The glibc people will know better, and "void" is
probably not a great name anyway).


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-04 Thread Segher Boessenkool
Hi!

On Thu, Jun 04, 2020 at 11:55:11PM +0200, Phil Blundell wrote:
> 1a. Define your own subset of ELFv2 which is interworkable with the full 
> ABI at the function call interface but doesn't make all the same 
> guarantees about binary compatibility.  That would mean that a binary 
> built with your toolchain and conforming to the subset ABI would run on 
> any system that implements the full ELFv2 ABI, but the opposite is not 
> necessarily true.  There should be no impediment to getting support for 
> such an ABI upstream in any part of the GNU toolchain where it's 
> required if you can demonstrate that there's a non-trivial userbase for 
> it.  The hardest part may be thinking of a name.

And you can only use shared objects also built for that subset ABI.  If
you use some binary distribution, then it will also have to be built for
that subset, practically anyway.

This is very similar to soft-float targets.  There are "standard" ways
to deal with this.  Distros usually balk at having to maintain multiple
variants of a target, and users do not usually want to be restricted to
the lowest common denominator.  There always is that tension.

> 1b. Or, if the missing instructions are severe enough that it simply 
> isn't possible to have an interworkable implementation, you just need to 
> define your own ABI that fits your needs.  You can still borrow as much 
> as necessary from ELFv2 but you definitely need to call it something 
> else at that point.  All the other comments from 1a above still apply.

A different name is handy in casual conversation then, yes; but also in
case 1a, it should be clear what is what somehow.

> 2. Implement kernel emulation for the missing instructions.  If they
> are seldom used in practice then this might be adequate.  Of course,
> binaries that use them intensively will be slow; you'd have to judge
> whether this is better or worse than having them not run at all.  If
> you do this then you can implement the full ELFv2 ABI; your own
> toolchain might still choose not to use the instructions that it knows
> are going to be emulated, but at least other binaries will still run
> and you can call yourself compatible.

But not just instructions, there are actual new registers!  This might
be way too much work in the case of VSX.

But it is possible that implementing QP float (binary128) this way is
a feasible way forward, _if_ you have AltiVec enabled.

> 3. Persuade whoever controls the ELFv2 ABI to relax their requirements.
> But I assume they didn't make the original decision capriciously so
> this might be hard/impossible.  ABI definitions from hardware vendors
> are always slightly political and we just have to accept this.

There is more process involved than most open source people are
comfortable with :-/

> FWIW, we faced a similar situation about 20 years ago when the then-new 
> ARM EABI was defined.  This essentially required implementations to 
> support the ARMv5T instruction set; the committee that defined the ABI 
> took the view that requiring implementations to cater for older 
> architectures would be too onerous.  It was entirely possible to 
> implement 99% of the EABI on older processors; such implementations 
> weren't strictly conforming but they were interworkable enough to be 
> useful in practice, and the "almost-EABI" was still significantly
> better than what had gone before.

Yeah, this situation is quite similar in some ways :-)

The compilers should be able to adjust to what you need pretty easily.
Since you seem to have a distribution on-board already, the biggest
hurdle left is getting glibc to accept the new port, I think.  I don't
know if it will be easy to them, or a lot of work instead.

Thanks,


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-04 Thread Segher Boessenkool
Hi!

On Thu, Jun 04, 2020 at 10:39:30PM +0200, Daniel Kolesa wrote:
> On Thu, Jun 4, 2020, at 19:33, Segher Boessenkool wrote:
> > It is the ABI.  If you think it should be different, make your own ABI,
> > don't pretend the existing ABI is different than what it is.  Thank you.
> 
> Well then - in that case, what do you suggest that I do?
> 
> Void currently ships an ELFv2 (or apparently not, I guess) 64-bit big endian 
> port that works on 970/G5 up. It is important to me that it stays that way (a 
> large amount of users are running 970s, so introducing a VSX dependency means 
> I might as well abandon the port entirely).

You can just clearly document what ABI changes you use, and try to make
sure that everyone who uses your distro / your tools / your ABI variant
knows about it.  Telling your users that it is ELFv2, without telling
them it is not compliant, namely X Y Z are different, is a bit of a
disservice to your users, and worse to everyone else involved.

If you always use -mcpu=970 (or similar), then not very much is
different for you most likely -- except of course there is no promise
to the user that they can use VSX and all instructions in ISA 2.07,
which is a very useful promise to have normally.

> It currently works out of box - there are no changes required in glibc, and 
> nearly the entire userland builds and works (about ~11500 out of ~12000 
> software packages, those that don't work either don't work on ppc64le either, 
> or have issues related to endianness, or some other unrelated reason).

Very nice!

> I'd like to eventually get this into a state where I don't have to worry 
> about glibc arbitrarily breaking it - which means it would be necessary to 
> stabilize it upstream. While I can probably maintain a downstream patchset 
> when it comes to it, I'd much prefer if I didn't have to - but this sounds 
> like an official ELFv2 glibc BE port would be impossible unless the VSX 
> requirement (and thus IEEE 128-bit long double and so on) was in place, which 
> would defeat the point of the port.
> 
> Is there *any* way I can take that would make upstreams of all parts of the 
> toolchain happy? I explicitly don't want to go back to ELFv1.

Oh absolutely, it sounds like things are in quite good shape already!
It will safe a lot of grief on all sides if you make clear this is not
"plain" ELFv2, and in what ways it differs.

Btw, if you use GCC, *please* send in testresults?  :-)

> While at it, I'd like to transition to ld64 long double format, to match musl 
> and improve software compatibility, which I feel will raise more objections 
> from IBM side.

I have no idea what "ld64 long double" is?  Is that just IEEE DP float?
Aka "long double is the same as double".  That is likely easier for new
ports than "double-double", yes, even if the eventual goal should be
IEEE QP float -- a much smoother transition.

Same goes here: document it!  If your users know that the ELFv2 variant
you give them is not *the* ELFv2, but it differs in some clear ways,
everyone will be happier :-)


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-04 Thread Segher Boessenkool
On Thu, Jun 04, 2020 at 01:18:44PM -0400, Rich Felker wrote:
> On Thu, Jun 04, 2020 at 12:12:32PM -0500, Segher Boessenkool wrote:
> > On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > > well, ppc64le already cannot be run on those, as far as I know (I
> > > don't think it's possible to build ppc64le userland without VSX in
> > > any configuration)
> > 
> > VSX is required by the ELFv2 ABI:
> > 
> > """
> > Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
> > compliant processors must implement the following categories:
> 
> This is not actually ABI but IBM policy laundered into an ABI
> document, which musl does not honor.

It is the ABI.  If you think it should be different, make your own ABI,
don't pretend the existing ABI is different than what it is.  Thank you.


Segher


Re: ppc64le and 32-bit LE userland compatibility

2020-06-04 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 05:27:24PM +0200, Michal Suchánek wrote:
> Naturally on POWER the first cpu that has LE support is POWER8 so you
> can count on all other POWER8 features to be present.

This is not true.

The oldest CPU the ELFv2 ABI (and so, powerpc64le-linux) supports is
POWER8, but most 6xx/7xx CPUs had a working LE mode already.  There are
very old ABIs that support LE as well.


Segher


Re: ppc64le and 32-bit LE userland compatibility

2020-06-04 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> well, ppc64le already cannot be run on those, as far as I know (I don't think 
> it's possible to build ppc64le userland without VSX in any configuration)

VSX is required by the ELFv2 ABI:

"""
Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
compliant processors must implement the following categories:

[...]

Vector-Scalar
"""


Segher


Re: [PATCH 1/4] powerpc: Add a ppc_inst_as_str() helper

2020-06-02 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 03:27:25PM +1000, Jordan Niethe wrote:
> There are quite a few places where instructions are printed, this is
> done using a '%x' format specifier. With the introduction of prefixed
> instructions, this does not work well. Currently in these places,
> ppc_inst_val() is used for the value for %x so only the first word of
> prefixed instructions are printed.
> 
> When the instructions are word instructions, only a single word should
> be printed. For prefixed instructions both the prefix and suffix should
> be printed. To accommodate both of these situations, instead of a '%x'
> specifier use '%s' and introduce a helper, __ppc_inst_as_str() which
> returns a char *. The char * __ppc_inst_as_str() returns is buffer that
> is passed to it by the caller.
> 
> It is cumbersome to require every caller of __ppc_inst_as_str() to now
> declare a buffer. To make it more convenient to use __ppc_inst_as_str(),
> wrap it in a macro that uses a compound statement to allocate a buffer
> on the caller's stack before calling it.
> 
> Signed-off-by: Jordan Niethe 
Acked-by: Segher Boessenkool 

> +#define PPC_INST_STR_LEN sizeof("0x 0x")

sizeof is not a function or anything function-like; it's just an
operator, like (unary) "+" or "-" or dereference "*".  The parentheses
are completely redundant here.  And it kind of matters, as written a
reader might think that a string object is actually constructed here.

But, so very many people do this, I'm not going to fight this fight ;-)


Segher


Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 01:50:32PM +, Joseph Myers wrote:
> On Mon, 1 Jun 2020, Segher Boessenkool wrote:
> 
> > > The supported glibc ABIs are listed at 
> > > <https://sourceware.org/glibc/wiki/ABIList>.
> > 
> > powerpcle-linux already does work somewhat, and that should also he
> > worth something, official or not ;-)
> > 
> > (It has worked for very many years already...  That is, I have built it
> > a lot, I have no idea about running a full distro that way).
> 
> Greg McGary's patches as referenced at 
> <https://sourceware.org/legacy-ml/libc-hacker/2006-11/msg00013.html> were 
> never merged, and I don't know how big the changes to .S files were or if 
> they were ever posted.  Many files were subsequently fixed as part of 
> bringing up support for powerpc64le, but without actual 32-bit LE testing 
> as part of each release cycle there's not much evidence of correctness for 
> LE of code not used for powerpc64le.

Yeah sorry, I meant I have built the toolchain parts a lot, not libc.
GCC and binutils.  I reckon there is more work to do in libc, certainly
for configuration and similar.


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-01 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 04:12:26AM +0200, Daniel Kolesa wrote:
> On Tue, Jun 2, 2020, at 03:58, Segher Boessenkool wrote:
> > I recommend new ports that cannot jump to IEEE QP float directly to use
> > long double == double for the time being, avoiding the extra
> > complications that IBM double double would bring.  But you'll still have
> > a transition to IEEE 128 if you ever want to go there.
> > 
> > But if you already use double-double, I don't know if the cost changing
> > away from that is worth it now.
> 
> The transition cost is relatively low, which is why I'm thinking about this 
> in the first place. For one, relatively few things use long double in the 
> first place.

Then your cost switching to QP float later will be low as well.  I envy
you :-)

> For two, on ppc*, at least the bfd linker (which we use always in Void) 
> always tags ELFs with an FP ABI tag, and things not using long double (or 
> using 64-bit long double) don't receive this tag. It even tags *which* ABI is 
> used. See:

That works for statically linked stuff, sure.  That is the easy case :-/

> I went through this once already (I had the 64-bit ldbl transition nearly 
> done) and the number of packages to rebuild in the whole repo was about 
> 200-300 out of ~12000.

Cool!  Do you perchance have info you can share about which packages?
Offline, if you want.

> > > There is also one more thing while we're at this. The 64-bit big endian 
> > > Void port uses the ELFv2 ABI, even on glibc. This is not officially 
> > > supported on glibc as far as I can tell, but it does work out of box, 
> > > without any patching (things in general match little endian then, i.e. 
> > > ld64.so.2 etc, but they're big endian). Is there any chance of making 
> > > that support official?
> > 
> > (I don't talk for glibc).
> > 
> > The first thing needed is for "us" to have faith in it.  That starts
> > with seeing test results for the testsuites!
> > 
> > (Something similar goes for the GCC port -- there is no official support
> > for BE ELFv2, but of course it does work, and if we get test results we
> > may keep it that way, hint hint :-) )
> 
> Well, FreeBSD defaults to it since 13; OpenBSD's new powerpc64 port (which is 
> supposedly dual-endian) defaults to it; musl defaults to it on LE and BE.

... and no one ever has sent us (GCC) test results (nothing I have seen
anyway).  All we "officially" know is that Power7 BE ELFv2 was a
bring-up vehicle for the current powerpc64le-linux.  Everyone tries not
to break things without reason to, of course, and things are a little
bit tested anyway because it is convenient to build ELFv2 stuff on BE
systems as well (if only to figure out effortlessly if some bug is due
to the ABI or due to the endianness), but if we do not know something is
used and we never officially supported it, we might just want to take it
away if it is inconvenient.

> FreeBSD and OpenBSD have to, since they primarily target LLVM system 
> toolchain (with GCC in ports) and ld.lld doesn't support ELFv1 (at all). 
> Void's port was new (and any precompiled binaries would generally be 
> enterprisey stuff which doesn't concern us enough - people can just make a 
> chroot/container with say, Debian, if they really need to),

Yeah, enterprisey enough, then just rebuild :-)

> so I felt like it didn't make sense to go with the legacy ABI (besides, 
> function descriptors are gross ;)).

Descriptors are Great, you just do not understand the True Way!

> The situation in the overall userland has been improving too, so the patch 
> burden is actually very low nowadays.

Is that because long double just isn't used a lot?  Or are there more
reasons?


Segher


Re: ppc64le and 32-bit LE userland compatibility

2020-06-01 Thread Segher Boessenkool
On Mon, Jun 01, 2020 at 11:45:51PM +, Joseph Myers wrote:
> On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> > Are you sure this would be a new port? Glibc already works in this 
> > combination, as it seems to me it'd be best if it was just a variant of 
> > the existing 32-bit PowerPC port, sharing most conventions besides 
> > endianness with the BE port.
> 
> The supported glibc ABIs are listed at 
> .

powerpcle-linux already does work somewhat, and that should also he
worth something, official or not ;-)

(It has worked for very many years already...  That is, I have built it
a lot, I have no idea about running a full distro that way).

> > 128-bit IEEE long double would not work, since that relies on VSX being 
> > present (gcc will explicitly complain if it's not). I'd be all for using 
> 
> The minimum supported architecture for powerpc64le (POWER8) has VSX.  My 
> understanding was that the suggestion was for 32-bit userspace to run 
> under powerpc64le kernels running on POWER8 or later, meaning that such a 
> 32-bit LE port, and any ABI designed for such a port, can assume VSX is 
> available.  Or does VSX not work, at the hardware level, for 32-bit 
> POWER8?  (In which case you could pick another ABI for binary128 argument 
> passing and return.)

The current powerpcle-linux runs on anything 6xx, or maybe older.  It
isn't actually supported of course.

If the CPU is Power8, that does not mean VSX is available to you.

VSX works fine in 32-bit mode (with the standard gotcha that the GPRs
do not preserve the high part in all cases, so e.g. the m[ft]vsr insns
might not work as you want.

Passing IEEE QP float in GPRs would be natural for most ABIs, and it
should work fine indeed.  That isn't currently supported in GCC (needs
some libgcc work), and it might need __int128 to work on 32-bit ports
first.

> > There is also one more thing while we're at this. The 64-bit big endian 
> > Void port uses the ELFv2 ABI, even on glibc. This is not officially 
> > supported on glibc as far as I can tell, but it does work out of box, 
> > without any patching (things in general match little endian then, i.e. 
> > ld64.so.2 etc, but they're big endian). Is there any chance of making 
> > that support official?
> 
> If you want to support ELFv2 for 64-bit big endian in glibc, again that 
> should have a unique dynamic linker name, new symbol versions, only 
> binary128 long double, etc. - avoid all the legacy aspects of the existing 
> ELFv1 port rather than selectively saying that "ELFv1" itself is the only 
> legacy aspect and keeping the others (when it's the others that are 
> actually more problematic in glibc).

You should view it as a variant of the LE ELFv2 port, it has nothing
much to do with the other BE 64-bit PowerPC ports, other than being BE.


Segher


Re: ppc64le and 32-bit LE userland compatibility

2020-06-01 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 01:26:37AM +0200, Daniel Kolesa wrote:
> On Mon, Jun 1, 2020, at 23:28, Joseph Myers wrote:
> Are you sure this would be a new port? Glibc already works in this 
> combination, as it seems to me it'd be best if it was just a variant of the 
> existing 32-bit PowerPC port, sharing most conventions besides endianness 
> with the BE port.

That's right.  Except it isn't an "official" existing port, never has
been "officially" supported.

> 128-bit IEEE long double would not work, since that relies on VSX being 
> present (gcc will explicitly complain if it's not). I'd be all for using 
> 64-bit long double, though (musl already does, on all ppc ports).

The current IEEE QP float support requires VSX for its emulation, yes
(possibly even Power8?)  As Mike reminded me today, it also requires
__int128 support, which rules out anything 32-bit currently.  Without
that restriction, we could just make QP float passed in GPRs (use the
ABIs for any struct passed that way), and that'll just work out with
all ABIs, older or not.

> While we're at long double, I'd actually be interested in transitioning the 
> existing big endian ports in Void (64-bit and 32-bit, neither has VSX 
> baseline requirement in my case) to using 64-bit long double, abandoning the 
> IBM format altogether (little endian will transition to 128-bit IEEE long 
> double once it's ready on your side, as that assumes POWER8 baseline which 
> includes VSX).

I recommend new ports that cannot jump to IEEE QP float directly to use
long double == double for the time being, avoiding the extra
complications that IBM double double would bring.  But you'll still have
a transition to IEEE 128 if you ever want to go there.

But if you already use double-double, I don't know if the cost changing
away from that is worth it now.

> What would be the best way for me to proceed with that? I actually 
> experimented with this, using the old glibc compat symbols from pre-ibm128 
> times, and I mostly had it working, except I haven't managed to find a way to 
> switch the default symbols to 64-bit ones, which is problematic as linking 
> everything against nldbl_nonshared is fragile and potentially quirky (breaks 
> dlsym, function pointer equality across libraries, etc).

Yup.  "Rebuild the world" works :-/  I don't have any  better advice,
nothing you cannot figure out yourself.

> There is also one more thing while we're at this. The 64-bit big endian Void 
> port uses the ELFv2 ABI, even on glibc. This is not officially supported on 
> glibc as far as I can tell, but it does work out of box, without any patching 
> (things in general match little endian then, i.e. ld64.so.2 etc, but they're 
> big endian). Is there any chance of making that support official?

(I don't talk for glibc).

The first thing needed is for "us" to have faith in it.  That starts
with seeing test results for the testsuites!

(Something similar goes for the GCC port -- there is no official support
for BE ELFv2, but of course it does work, and if we get test results we
may keep it that way, hint hint :-) )


Segher


Re: ppc64le and 32-bit LE userland compatibility

2020-06-01 Thread Segher Boessenkool
Hi Joseph,

On Mon, Jun 01, 2020 at 09:28:25PM +, Joseph Myers wrote:
> On Fri, 29 May 2020, Will Springer via Binutils wrote:
> 
> > Hey all, a couple of us over in #talos-workstation on freenode have been
> > working on an effort to bring up a Linux PowerPC userland that runs in 
> > 32-bit
> > little-endian mode, aka ppcle. As far as we can tell, no ABI has ever been
> > designated for this (unless you count the patchset from a decade ago [1]), 
> > so
> > it's pretty much uncharted territory as far as Linux is concerned. We want 
> > to
> > sync up with libc and the relevant kernel folks to establish the best path
> > forward.
> 
> As a general comment on the glibc side of things, if this is considered 
> like a new port, and it probably is, the same principles that apply to new 
> ports apply here.
> 
> There's a general discussion at 
> , although much of that is 
> only applicable when adding new CPU architecture support.  More specific 
> points include that new 32-bit ports should default to 64-bit time and 
> file offsets from the start, with no support for 32-bit time or offsets 
> (meaning that if you want to use this with some kind of library call 
> translation, the library call translation will need to deal with 
> corresponding type size conversions).

Either that, or use the same as BE 32-bit PowerPC Linux, I'd say (it
won't make things worse, and if it is easier?)  But preferably the
newer, better, thing of course :-)

> And a new port should not be added 
> that uses the IBM long double format.  You can use IEEE binary128 long 
> double, possibly with an ABI similar to that used on powerpc64le, or can 
> use long double = double, but should not support IBM long double, and 
> preferably should only have one long double format rather than using the 
> glibc support for building with different options resulting in functions 
> for different long double formats being called.

You cannot use IEEE QP float ("binary128") here, but more on that in a
later post.

(I so very much agree about the problems having more than one long
double format -- on the other hand, you'll just share it with BE, and
with the existing powerpcle-linux (sup)port).


Segher


Re: [musl] Re: ppc64le and 32-bit LE userland compatibility

2020-06-01 Thread Segher Boessenkool
On Mon, Jun 01, 2020 at 12:29:56AM +0200, Daniel Kolesa wrote:
> On Sun, May 31, 2020, at 22:42, Segher Boessenkool wrote:
> > > There was just an assumption that LE == powerpc64le in libgo, spotted by 
> > > q66 (daniel@ on the CC). I just pushed the patch to [1].
> > 
> > Please send GCC patches to gcc-patches@ ?
> 
> FWIW, that patch alone is not very useful, we'd need to otherwise patch libgo 
> to recognize a new GOARCH (as right now it's likely to just use 'ppc' which 
> is wrong).

Gotcha.

> That said, we'll get back to you with any patches we have. One I can already 
> think of - we will need to update the dynamic linker name so that it uses 
> ld-musl-powerpcle.so instead of powerpc (musl needs to be updated the same 
> way by adding the subarch variable for the 'le' prefix).

Thanks!  That would be good progress.

> > > > Almost no project that used 32-bit PowerPC in LE mode has sent patches
> > > > to the upstreams.
> > > 
> > > Right, but I have heard concerns from at least one person familiar with 
> > > the ppc kernel about breaking existing users of this arch-endianness 
> > > combo, if any. It seems likely that none of those use upstream, though ^^;
> > 
> > So we don't care, because we *cannot* care.
> 
> Well, that's the reason this thread was opened in the first place - to call 
> out to any potential users, and synchronize with upstreams on a single way 
> forward that all upstreams can agree on, since this effort requires changes 
> in various parts of the stack. We don't want to hog changes locally or 
> otherwise do any changes that would be in conflict with upstream projects, as 
> that would mean needlessly diverging, which only means trouble later on.

Much appreciated!

I don't actually foresee any huge problems -- just lots of hard work ;-)


Segher


Re: ppc64le and 32-bit LE userland compatibility

2020-05-31 Thread Segher Boessenkool
On Sun, May 31, 2020 at 12:57:12AM +, Will Springer wrote:
> On Saturday, May 30, 2020 12:22:12 PM PDT Segher Boessenkool wrote:
> > The original sysv PowerPC supplement
> > http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf
> > supports LE as well, and most powerpcle ports use that.  But, the
> > big-endian Linux ABI differs in quite a few places, and it of course
> > makes a lot better sense if powerpcle-linux follows that.
> 
> Right, I should have clarified I was talking about Linux ABIs 
> specifically.

That was the link you deleted.

> > What patches did you need?  I regularly build >30 cross compilers (on
> > both BE and LE hosts; I haven't used 32-bit hosts for a long time, but
> > in the past those worked fine as well).  I also cross-built
> > powerpcle-linux-gcc quite a few times (from powerpc64le, from powerpc64,
> > from various x86).
> 
> There was just an assumption that LE == powerpc64le in libgo, spotted by 
> q66 (daniel@ on the CC). I just pushed the patch to [1].

Please send GCC patches to gcc-patches@ ?

> > Almost no project that used 32-bit PowerPC in LE mode has sent patches
> > to the upstreams.
> 
> Right, but I have heard concerns from at least one person familiar with 
> the ppc kernel about breaking existing users of this arch-endianness 
> combo, if any. It seems likely that none of those use upstream, though ^^;

So we don't care, because we *cannot* care.

> > A huge factor in having good GCC support for powerpcle-linux (or
> > anything else) is someone needs to regularly test it, and share test
> > results with us (via gcc-testresults@).  Hint hint hint :-)
> > 
> > That way we know it is in good shape, know when we are regressing it,
> > know there is interest in it.
> 
> Once I have more of a bootstrapped userland than a barely-functional 
> cross chroot, I'll get back to you on that :)

Cool!  Looking forward to it.

Thanks,


Segher


Re: ppc64le and 32-bit LE userland compatibility

2020-05-30 Thread Segher Boessenkool
Hi!

On Fri, May 29, 2020 at 07:03:48PM +, Will Springer wrote:
> Hey all, a couple of us over in #talos-workstation on freenode have been
> working on an effort to bring up a Linux PowerPC userland that runs in 32-bit
> little-endian mode, aka ppcle. As far as we can tell, no ABI has ever been
> designated for this

https://www.polyomino.org.uk/publications/2011/Power-Arch-32-bit-ABI-supp-1.0-Embedded.pdf

> (unless you count the patchset from a decade ago [1]), so

The original sysv PowerPC supplement
http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf
supports LE as well, and most powerpcle ports use that.  But, the
big-endian Linux ABI differs in quite a few places, and it of course
makes a lot better sense if powerpcle-linux follows that.

> it's pretty much uncharted territory as far as Linux is concerned. We want to
> sync up with libc and the relevant kernel folks to establish the best path
> forward.
> 
> The practical application that drove these early developments (as you might
> expect) is x86 emulation. The box86 project [2] implements a translation layer
> for ia32 library calls to native architecture ones; this way, emulation
> overhead is significantly reduced by relying on native libraries where
> possible (libc, libGL, etc.) instead of emulating an entire x86 userspace.
> box86 is primarily targeted at ARM, but it can be adapted to other
> architectures—so long as they match ia32's 32-bit, little-endian nature. Hence
> the need for a ppcle userland; modern POWER brought ppc64le as a supported
> configuration, but without a 32-bit equivalent there is no option for a 32/64
> multilib environment, as seen with ppc/ppc64 and arm/aarch64.
> 
> Surprisingly, beyond minor patching of gcc to get crosscompile going,

What patches did you need?  I regularly build >30 cross compilers (on
both BE and LE hosts; I haven't used 32-bit hosts for a long time, but
in the past those worked fine as well).  I also cross-built
powerpcle-linux-gcc quite a few times (from powerpc64le, from powerpc64,
from various x86).

> bootstrapping the initial userland was not much of a problem. The work has
> been done on top of the Void Linux PowerPC project [3], and much of that is
> now present in its source package tree [4].
> 
> The first issue with running the userland came from the ppc32 signal handler 
> forcing BE in the MSR, causing any 32LE process receiving a signal (such as a 
> shell receiving SIGCHLD) to terminate with SIGILL. This was trivially 
> patched, 

Heh :-)

> along with enabling the 32-bit vDSO on ppc64le kernels [5]. (Given that this 
> behavior has been in place since 2006, I don't think anyone has been using 
> the 
> kernel in this state to run ppcle userlands.)

Almost no project that used 32-bit PowerPC in LE mode has sent patches
to the upstreams.

> The next problem concerns the ABI more directly. The failure mode was `file`
> surfacing EINVAL from pread64 when invoked on an ELF; pread64 was passed a
> garbage value for `pos`, which didn't appear to be caused by anything in 
> `file`. Initially it seemed as though the 32-bit components of the arg were
> getting swapped, and we made hacky fixes to glibc and musl to put them in the
> "right order"; however, we weren't sure if that was the correct approach, or
> if there were knock-on effects we didn't know about. So we found the relevant
> compat code path in the kernel, at arch/powerpc/kernel/sys_ppc32.c, where
> there exists this comment:
> 
> > /*
> >  * long long munging:
> >  * The 32 bit ABI passes long longs in an odd even register pair.
> >  */
> 
> It seems that the opposite is true in LE mode, and something is expecting long
> longs to start on an even register. I realized this after I tried swapping hi/
> lo `u32`s here and didn't see an improvement. I whipped up a patch [6] that
> switches which syscalls use padding arguments depending on endianness, while
> hopefully remaining tidy enough to be unobtrusive. (I took some liberties with
> variable names/types so that the macro could be consistent.)

The ABI says long longs are passed in the same order in registers as it
would be in memory; so the high part and the low part are swapped between
BE and LE.  Which registers make up a pair is exactly the same between
the two.  (You can verify this with an existing powerpcle-* compiler, too;
I did, and we implement it correctly as far as I can see).

> This was enough to fix up the `file` bug. I'm no seasoned kernel hacker,
> though, and there is still concern over the right way to approach this,
> whether it should live in the kernel or libc, etc. Frankly, I don't know the
> ABI structure enough to understand why the register padding has to be
> different in this case, or what lower-level component is responsible for it. 
> For comparison, I had a look at the mips tree, since it's bi-endian and has a 
> similar 32/64 situation. There is a macro conditional upon endianness that is 
> responsible for munging long longs; it uses 

Re: [PATCH v2 6/7] powerpc/dt_cpu_ftrs: Add MMA feature

2020-05-19 Thread Segher Boessenkool
On Tue, May 19, 2020 at 10:22:40AM -0500, Paul A. Clarke wrote:
> On Tue, May 19, 2020 at 10:05:56AM -0500, Segher Boessenkool wrote:
> > On Tue, May 19, 2020 at 09:49:22AM -0500, Paul A. Clarke wrote:
> > > On Tue, May 19, 2020 at 10:31:56AM +1000, Alistair Popple wrote:
> > > > Matrix multiple accumulate (MMA) is a new feature added to ISAv3.1 and
> > > 
> > > nit: "Matrix-Multiply Accelerator".
> > 
> > "Matrix-Multiply Assist" in fact :-)
> 
> Not according to the ISA (p. 1129).

There is one mistake in the 3.1 ABI yes, in the description for the PCR
bit of this same name.  Everything else calls this "assist", like all
similar things are called as well (whereas an "accelerator" is an
external device).


Segher


Re: [PATCH v2 6/7] powerpc/dt_cpu_ftrs: Add MMA feature

2020-05-19 Thread Segher Boessenkool
On Tue, May 19, 2020 at 09:49:22AM -0500, Paul A. Clarke wrote:
> On Tue, May 19, 2020 at 10:31:56AM +1000, Alistair Popple wrote:
> > Matrix multiple accumulate (MMA) is a new feature added to ISAv3.1 and
> 
> nit: "Matrix-Multiply Accelerator".

"Matrix-Multiply Assist" in fact :-)


Segher


Re: [PATCH v3] powerpc/64: Option to use ELF V2 ABI for big-endian kernels

2020-05-18 Thread Segher Boessenkool
Hi!

On Mon, May 18, 2020 at 04:35:22PM +1000, Michael Ellerman wrote:
> Nicholas Piggin  writes:
> > Provide an option to build big-endian kernels using the ELF V2 ABI. This 
> > works
> > on GCC and clang (since about 2014). it is is not officially supported by 
> > the
> > GNU toolchain, but it can give big-endian kernels  some useful advantages of
> > the V2 ABI (e.g., less stack usage).

> This doesn't build with clang:
> 
>   /tmp/aesp8-ppc-dad624.s: Assembler messages:
>   /tmp/aesp8-ppc-dad624.s: Error: .size expression for aes_p8_set_encrypt_key 
> does not evaluate to a constant

What does this assembler code that clang doesn't swallow look like?  Is
that valid code?  Etc.


Segher


Re: [PATCH RFC 3/4] powerpc/microwatt: Add early debug UART support for Microwatt

2020-05-11 Thread Segher Boessenkool
Hi!

On Sat, May 09, 2020 at 03:03:40PM +1000, Paul Mackerras wrote:
> + __asm__ volatile("mtmsrd %3,0; ldcix %0,%1,%2; mtmsrd %4,0"
> +  : "=r" (val) : "b" (potato_uart_base), "r" (offset),
> +"r" (msr & ~MSR_DR), "r" (msr));

That should be  "="(val)  (an earlyclobber), because when %0 is
written, %4 will still be used later.

Looks fine otherwise.

Reviewed-by: Segher Boessenkool 


Segher


Re: [PATCH] powerpc/uaccess: Don't use "m<>" constraint

2020-05-07 Thread Segher Boessenkool
On Thu, May 07, 2020 at 10:33:24PM +1000, Michael Ellerman wrote:
> The "m<>" constraint breaks compilation with GCC 4.6.x era compilers.
> 
> The use of the constraint allows the compiler to use update-form
> instructions, however in practice current compilers never generate
> those forms for any of the current uses of __put_user_asm_goto().
> 
> We anticipate that GCC 4.6 will be declared unsupported for building
> the kernel in the not too distant future. So for now just switch to
> the "m" constraint.
> 
> Fixes: 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm 
> goto'")
> Signed-off-by: Michael Ellerman 

Acked-by: Segher Boessenkool 

Thanks!  So much trouble for what looked like such a simple change, all
those years ago :-(


Segher


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-06 Thread Segher Boessenkool
On Wed, May 06, 2020 at 08:10:57PM +0200, Christophe Leroy wrote:
> Le 06/05/2020 à 19:58, Segher Boessenkool a écrit :
> >>  #define __put_user_asm_goto(x, addr, label, op)   \
> >>asm volatile goto(  \
> >>-   "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> >>+   "1: " op "%X1 %0,%1 # put_user\n"   \
> >>EX_TABLE(1b, %l2)   \
> >>:   \
> >>-   : "r" (x), "m<>" (*addr)\
> >>+   : "r" (x), "m" (*addr)  \
> >>:   \
> >>: label)
> >
> >Like that.  But you will have to do that to *all* places we use the "<>"
> >constraints, or wait for more stuff to fail?  And, there probably are
> >places we *do* want update form insns used (they do help in some loops,
> >for example)?
> >
> 
> AFAICT, git grep "m<>" provides no result.

Ah, okay.

> However many places have %Ux:



Yes, all of those are from when "m" still meant what "m<>" is now.  For
seeing how many update form insns can be generated (and how much that
matters), these all should be fixed, at a minimum.


Segher


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-06 Thread Segher Boessenkool
On Wed, May 06, 2020 at 11:36:00AM +1000, Michael Ellerman wrote:
> >> As far as I understood that's mandatory on recent gcc to get the 
> >> pre-update form of the instruction. With older versions "m" was doing 
> >> the same, but not anymore.
> >
> > Yes.  How much that matters depends on the asm.  On older CPUs (6xx/7xx,
> > say) the update form was just as fast as the non-update form.  On newer
> > or bigger CPUs it is usually executed just the same as an add followed
> > by the memory access, so it just saves a bit of code size.
> 
> The update-forms are stdux, sthux etc. right?

And stdu, sthu, etc.  "x" is "indexed form" (reg+reg addressing).

> I don't see any change in the number of those with or without the
> constraint. That's using GCC 9.3.0.

It's most useful in loops (and happens more often there).  You probably
do not have many loops that allow update form insns.

> >> Should we ifdef the "m<>" or "m" based on GCC 
> >> version ?
> >
> > That will be a lot of churn.  Just make 4.8 minimum?
> 
> As I said in my other mail that's not really up to us. We could mandate
> a higher minimum for powerpc, but I'd rather not.

Yeah, I quite understand that.

> I think for now I'm inclined to just drop the "<>", and we can revisit
> in a release or two when hopefully GCC 4.8 has become the minimum.

An unhappy resolution, but it leaves a light on the horizon :-)

In that case, leave the "%Un", if you will but the "<>" back soonish?
Not much point removing it and putting it back later (it is harmless,
there are more instances of it in the kernel as well, since many years).

Thanks!


Segher


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-06 Thread Segher Boessenkool
On Wed, May 06, 2020 at 10:58:55AM +1000, Michael Ellerman wrote:
> >> The "m<>" here is breaking GCC 4.6.3, which we allegedly still support.
> >
> > [ You shouldn't use 4.6.3, there has been 4.6.4 since a while.  And 4.6
> >   is nine years old now.  Most projects do not support < 4.8 anymore, on
> >   any architecture.  ]
> 
> Moving up to 4.6.4 wouldn't actually help with this though would it?

Nope.  But 4.6.4 is a bug-fix release, 91 bugs fixed since 4.6.3, so you
should switch to it if you can :-)

> Also I have 4.6.3 compilers already built, I don't really have time to
> rebuild them for 4.6.4.
> 
> The kernel has a top-level minimum version, which I'm not in charge of, see:
> 
> https://www.kernel.org/doc/html/latest/process/changes.html?highlight=gcc

Yes, I know.  And it is much preferred not to have stricter requirements
for Power, I know that too.  Something has to give though :-/

> There were discussions about making 4.8 the minimum, but I'm not sure
> where they got to.

Yeah, just petered out I think?

All significant distros come with a 4.8 as system compiler.

> >> Plain "m" works, how much does the "<>" affect code gen in practice?
> >> 
> >> A quick diff here shows no difference from removing "<>".
> >
> > It will make it impossible to use update-form instructions here.  That
> > probably does not matter much at all, in this case.
> >
> > If you remove the "<>" constraints, also remove the "%Un" output modifier?
> 
> So like this?
> 
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index 62cc8d7640ec..ca847aed8e45 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -207,10 +207,10 @@ do {
> \
>  
>  #define __put_user_asm_goto(x, addr, label, op)  \
>   asm volatile goto(  \
> - "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> + "1: " op "%X1 %0,%1 # put_user\n"   \
>   EX_TABLE(1b, %l2)   \
>   :   \
> - : "r" (x), "m<>" (*addr)\
> + : "r" (x), "m" (*addr)  \
>   :   \
>   : label)

Like that.  But you will have to do that to *all* places we use the "<>"
constraints, or wait for more stuff to fail?  And, there probably are
places we *do* want update form insns used (they do help in some loops,
for example)?


Segher


Re: [RFC PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions

2020-05-05 Thread Segher Boessenkool
Hi!

On Thu, Apr 30, 2020 at 02:02:02PM +1000, Nicholas Piggin wrote:
> Add support for the scv instruction on POWER9 and later CPUs.

Looks good to me in general :-)

> For now this implements the zeroth scv vector 'scv 0', as identical
> to 'sc' system calls, with the exception that lr is not preserved, and
> it is 64-bit only. There may yet be changes made to this ABI, so it's
> for testing only.

What does it do with SF=0?  I don't see how it is obviously not a
security hole currently (but I didn't look too closely).


Segher


Re: New powerpc vdso calling convention

2020-05-05 Thread Segher Boessenkool
Hi!

On Wed, Apr 29, 2020 at 12:39:22PM +1000, Nicholas Piggin wrote:
> Excerpts from Adhemerval Zanella's message of April 27, 2020 11:09 pm:
> >> Right, I'm just talking about those comments -- it seems like the kernel 
> >> vdso should contain an .opd section with function descriptors in it for
> >> elfv1 calls, rather than the hack it has now of creating one in the 
> >> caller's .data section.
> >> 
> >> But all that function descriptor code is gated by
> >> 
> >> #if (defined(__PPC64__) || defined(__powerpc64__)) && _CALL_ELF != 2
> >> 
> >> So it seems PPC32 does not use function descriptors but a direct pointer 
> >> to the entry point like PPC64 with ELFv2.
> > 
> > Yes, this hack is only for ELFv1.  The missing ODP has not been an issue 
> > or glibc because it has been using the inline assembly to emulate the 
> > functions call since initial vDSO support (INTERNAL_VSYSCALL_CALL_TYPE).
> > It just has become an issue when I added a ifunc optimization to 
> > gettimeofday so it can bypass the libc.so and make plt branch to vDSO 
> > directly.
> 
> I can't understand if it's actually a problem for you or not.
> 
> Regardless if you can hack around it, it seems to me that if we're going 
> to add sane calling conventions to the vdso, then we should also just 
> have a .opd section for it as well, whether or not a particular libc 
> requires it.

An OPD ("official procedure descriptor") is required for every function,
to have proper C semantics, so that pointers to functions (which are
pointers to descriptors, in fact) are unique.  You can "manually" make
descriptors just fine, and use those to call functions -- but you cannot
(in general) use a pointer to such a "fake" descriptor as the "id" of
the function.

The way the ABIs define the OPDs makes them guaranteed unique.


Segher


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-05 Thread Segher Boessenkool
On Tue, May 05, 2020 at 05:40:21PM +0200, Christophe Leroy wrote:
> >>+#define __put_user_asm_goto(x, addr, label, op)\
> >>+   asm volatile goto(  \
> >>+   "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> >>+   EX_TABLE(1b, %l2)   \
> >>+   :   \
> >>+   : "r" (x), "m<>" (*addr)\
> >
> >The "m<>" here is breaking GCC 4.6.3, which we allegedly still support.
> >
> >Plain "m" works, how much does the "<>" affect code gen in practice?
> >
> >A quick diff here shows no difference from removing "<>".
> 
> It was recommended by Segher, there has been some discussion about it on 
> v1 of this patch, see 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/4fdc2aba6f5e51887d1cd0fee94be0989eada2cd.1586942312.git.christophe.le...@c-s.fr/
> 
> As far as I understood that's mandatory on recent gcc to get the 
> pre-update form of the instruction. With older versions "m" was doing 
> the same, but not anymore.

Yes.  How much that matters depends on the asm.  On older CPUs (6xx/7xx,
say) the update form was just as fast as the non-update form.  On newer
or bigger CPUs it is usually executed just the same as an add followed
by the memory access, so it just saves a bit of code size.

> Should we ifdef the "m<>" or "m" based on GCC 
> version ?

That will be a lot of churn.  Just make 4.8 minimum?


Segher


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-05 Thread Segher Boessenkool
Hi!

On Wed, May 06, 2020 at 12:27:58AM +1000, Michael Ellerman wrote:
> Christophe Leroy  writes:
> > unsafe_put_user() is designed to take benefit of 'asm goto'.
> >
> > Instead of using the standard __put_user() approach and branch
> > based on the returned error, use 'asm goto' and make the
> > exception code branch directly to the error label. There is
> > no code anymore in the fixup section.
> >
> > This change significantly simplifies functions using
> > unsafe_put_user()
> >
> ...
> >
> > Signed-off-by: Christophe Leroy 
> > ---
> >  arch/powerpc/include/asm/uaccess.h | 61 +-
> >  1 file changed, 52 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/uaccess.h 
> > b/arch/powerpc/include/asm/uaccess.h
> > index 9cc9c106ae2a..9365b59495a2 100644
> > --- a/arch/powerpc/include/asm/uaccess.h
> > +++ b/arch/powerpc/include/asm/uaccess.h
> > @@ -196,6 +193,52 @@ do {   
> > \
> >  })
> >  
> >  
> > +#define __put_user_asm_goto(x, addr, label, op)\
> > +   asm volatile goto(  \
> > +   "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> > +   EX_TABLE(1b, %l2)   \
> > +   :   \
> > +   : "r" (x), "m<>" (*addr)\
> 
> The "m<>" here is breaking GCC 4.6.3, which we allegedly still support.

[ You shouldn't use 4.6.3, there has been 4.6.4 since a while.  And 4.6
  is nine years old now.  Most projects do not support < 4.8 anymore, on
  any architecture.  ]

> Plain "m" works, how much does the "<>" affect code gen in practice?
> 
> A quick diff here shows no difference from removing "<>".

It will make it impossible to use update-form instructions here.  That
probably does not matter much at all, in this case.

If you remove the "<>" constraints, also remove the "%Un" output modifier?


Segher


Re: [PATCH v2] powerpc/64: BE option to use ELFv2 ABI for big endian kernels

2020-04-28 Thread Segher Boessenkool
Hi!

On Tue, Apr 28, 2020 at 09:25:17PM +1000, Nicholas Piggin wrote:
> +config BUILD_BIG_ENDIAN_ELF_V2
> + bool "Build big-endian kernel using ELFv2 ABI (EXPERIMENTAL)"
> + depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
> + default n
> + select BUILD_ELF_V2
> + help
> +   This builds the kernel image using the ELFv2 ABI, which has a
> +   reduced stack overhead and faster function calls. This does not
> +   affect the userspace ABIs.
> +
> +   ELFv2 is the standard ABI for little-endian, but for big-endian
> +   this is an experimental option that is less tested (kernel and
> +   toolchain). This requires gcc 4.9 or newer and binutils 2.24 or
> +   newer.

Is it clear that this is only for 64-bit?  Maybe this text should fit
that in somewhere?

It's not obvious to people who do not already know that ELFv2 is just
the (nick-)name of a particular ABI, not a new kind of ELF (it is just
version 1 ELF in fact), and that ABI is for 64-bit Power only.


Segher


Re: [PATCH] x86: Fix early boot crash on gcc-10, next try

2020-04-25 Thread Segher Boessenkool
On Sat, Apr 25, 2020 at 08:53:13PM +0200, Borislav Petkov wrote:
> On Sat, Apr 25, 2020 at 01:37:01PM -0500, Segher Boessenkool wrote:
> > That is a lot more typing then
> > asm("");
> 
> That's why a macro with a hopefully more descriptive name would be
> telling more than a mere asm("").

My point is that you should explain at *every use* of this why you cannot
have tail calls *there*.  This is very unusual, after all.

There are *very* few places where you want to prevent tail calls, that's
why there is no attribute for it.


Segher


Re: [PATCH] x86: Fix early boot crash on gcc-10, next try

2020-04-25 Thread Segher Boessenkool
On Sat, Apr 25, 2020 at 07:31:40PM +0200, Borislav Petkov wrote:
> > There's also the one in init/main.c which is used by multiple
> > architectures. On x86 at least, the call to arch_call_rest_init at the
> > end of start_kernel does not get tail-call optimized by gcc-10, but I
> > don't see anything that actually prevents that from happening. We should
> > add the asm("") there as well I think, unless the compiler guys see
> > something about this function that will always prevent the optimization?
> 
> Hmm, that's what I was afraid of - having to sprinkle this around. Yah, let's
> wait for compiler guys to have a look here and then maybe I'll convert that
> thing to a macro called
> 
>   compiler_prevent_tail_call_opt()
> 
> or so, so that it can be sprinkled around. ;-\

That is a lot more typing then
asm("");
but more seriously, you probably should explain why you do not want a
tail call *anyway*, and in such a comment you can say that is what the
asm is for.

I don't see anything that prevents the tailcall in current code either,
fwiw.


Segher


Re: [PATCH 1/2] powerpc: Add base support for ISA v3.1

2020-04-21 Thread Segher Boessenkool
Hi,

On Tue, Apr 21, 2020 at 11:58:31AM +1000, Oliver O'Halloran wrote:
> On Tue, Apr 21, 2020 at 11:53 AM Alistair Popple  
> wrote:
> >
> > On Tuesday, 21 April 2020 11:30:52 AM AEST Alistair Popple wrote:
> > > On Saturday, 4 April 2020 2:32:08 AM AEST Segher Boessenkool wrote:
> > > > Hi!
> > > >
> > > > On Fri, Apr 03, 2020 at 03:10:54PM +1100, Alistair Popple wrote:
> > > > > +#define   PCR_ARCH_300   0x10/* Architecture 3.00 */
> > > >
> > > > It's called 3.0, not 3.00?
> > >
> > > Thanks. I'll fix that up.
> >
> > Actually we have already defined it upstream as CPU_FTR_ARCH_300 in arch/
> > powerpc/include/asm/cputable.h and PVR_ARCH_300 in arch/powerpc/include/asm/
> > reg.h so for consistency we should make it the same here. So either we leave
> > this patch as is or we change it to PCR_ARCH_30 along with the existing
> > upstream #defines. Thoughts?
> 
> I used 300 for consistency with the existing three digit definitions
> when I added the CPU_FTR macro. It doesn't really matter since these
> are all internal definitions, but I SAY THE BIKESHED SHOULD BE THREE
> DIGITS LONG.

You can do in your own symbols whatever you want, but the comment is
refering to an existing real-world version of the ISA, namely, 3.0 :-)

(It's important to get names right, and especially in the canonical
places, like this one.  IMO of course).


Segher


Re: [PATCH 2/5] powerpc: Replace _ALIGN_DOWN() by ALIGN_DOWN()

2020-04-21 Thread Segher Boessenkool
Hi!

On Tue, Apr 21, 2020 at 01:04:05AM +, Joel Stanley wrote:
> On Mon, 20 Apr 2020 at 18:38, Christophe Leroy  
> wrote:
> > _ALIGN_DOWN() is specific to powerpc
> > ALIGN_DOWN() is generic and does the same
> >
> > Replace _ALIGN_DOWN() by ALIGN_DOWN()
> 
> This one is a bit less obvious. It becomes (leaving the typeof's alone
> for clarity):
> 
> -((addr)&(~((typeof(addr))(size)-1)))
> +addr) - ((size) - 1)) + ((typeof(addr))(size) - 1)) &
> ~((typeof(addr))(size)-1))
> 
> Which I assume the compiler will sort out?

[ This is line-wrapped, something in your mailer?  Took me a bit to figure
  out the - and + are diff -u things :-) ]

In the common case where size is a constant integer power of two, the
compiler will have no problem with this.  But why do so complicated?

Why are the casts there, btw?


Segher


Re: How to use "y" constraint in GCC inline powerpc assembly ?

2020-04-18 Thread Segher Boessenkool
Hi!

On Sat, Apr 18, 2020 at 08:28:53AM +, Christophe Leroy wrote:
> I'd like to use cr instead of gpr to return error condition from 
> __get_user().
> 
> I saw in GCC doc 
> (https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html) that it is 
> possible to use "y" as constraint to refer to "Any condition register 
> field, cr0…cr7".
> 
> I tried the test below, but it fails with "error: impossible register 
> constraint in 'asm'"
> 
> How does "y" has to be used ?

The same as "x".  You cannot really use these constraints in asm, it's
internal only.  I'll remove it from the inline asm documentation.  Thanks!

(You should put the cr fields you use in an inline asm in its clobber
list, i.e. "cr0" or "cr7").


Segher


Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

2020-04-16 Thread Segher Boessenkool
On Thu, Apr 16, 2020 at 08:34:42PM -0400, Rich Felker wrote:
> On Thu, Apr 16, 2020 at 06:02:35PM -0500, Segher Boessenkool wrote:
> > On Thu, Apr 16, 2020 at 08:12:19PM +0200, Florian Weimer wrote:
> > > > I think my choice would be just making the inline syscall be a single
> > > > call insn to an asm source file that out-of-lines the loading of TOC
> > > > pointer and call through it or branch based on hwcap so that it's not
> > > > repeated all over the place.
> > > 
> > > I don't know how problematic control flow out of an inline asm is on
> > > POWER.  But this is basically the -moutline-atomics approach.
> > 
> > Control flow out of inline asm (other than with "asm goto") is not
> > allowed at all, just like on any other target (and will not work in
> > practice, either -- just like on any other target).  But the suggestion
> > was to use actual assembler code, not inline asm?
> 
> Calling it control flow out of inline asm is something of a misnomer.
> The enclosing state is not discarded or altered; the asm statement
> exits normally, reaching the next instruction in the enclosing
> block/function as soon as the call from the asm statement returns,
> with all register/clobber constraints satisfied.

Ah.  That should always Just Work, then -- our ABIs guarantee you can.

> Control flow out of inline asm would be more like longjmp, and it can
> be valid -- for instance, you can implement coroutines this way
> (assuming you switch stack correctly) or do longjmp this way (jumping
> to the location saved by setjmp). But it's not what'd be happening
> here.

Yeah, you cannot do that in C, not without making assumptions about what
machine code the compiler generates.  GCC explicitly disallows it, too:

 'asm' statements may not perform jumps into other 'asm' statements,
 only to the listed GOTOLABELS.  GCC's optimizers do not know about
 other jumps; therefore they cannot take account of them when
 deciding how to optimize.


Segher


Re: [PATCH] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-04-16 Thread Segher Boessenkool
Hi!

On Thu, Apr 16, 2020 at 07:50:00AM +0200, Christophe Leroy wrote:
> Le 16/04/2020 à 00:06, Segher Boessenkool a écrit :
> >On Wed, Apr 15, 2020 at 09:20:26AM +, Christophe Leroy wrote:
> >>At the time being, __put_user()/__get_user() and friends only use
> >>register indirect with immediate index addressing, with the index
> >>set to 0. Ex:
> >>
> >>lwz reg1, 0(reg2)
> >
> >This is called a "D-form" instruction, or sometimes "offset addressing".
> >Don't talk about an "index", it confuses things, because the *other*
> >kind is called "indexed" already, also in the ISA docs!  (X-form, aka
> >indexed addressing, [reg+reg], where D-form does [reg+imm], and both
> >forms can do [reg]).
> 
> In the "Programming Environments Manual for 32-Bit Implementations of 
> the PowerPC™ Architecture", they list the following addressing modes:
> 
> Load and store operations have three categories of effective address 
> generation that depend on the
> operands specified:
> • Register indirect with immediate index mode
> • Register indirect with index mode
> • Register indirect mode

Huh.  I must have pushed all that confusing terminology to the back of
my head :-)

> >%Un on an "m" operand doesn't do much: you need to make it "m<>" if you
> >want pre-modify ("update") insns to be generated.  (You then will want
> >to make sure that operand is used in a way GCC can understand; since it
> >is used only once here, that works fine).
> 
> Ah ? Indeed I got the idea from include/asm/io.h where there is:
> 
> #define DEF_MMIO_IN_D(name, size, insn)   \
> static inline u##size name(const volatile u##size __iomem *addr)  \
> { \
>   u##size ret;\
>   __asm__ __volatile__("sync;"#insn"%U1%X1 %0,%1;twi 0,%0,0;isync"\
>   : "=r" (ret) : "m" (*addr) : "memory"); \
>   return ret; \
> }
> 
> It should be "m<>" there as well ?

Yes, that will work here.

Long ago, "m" in inline assembler code did the same as "m<>" now does
(and "m" internal in GCC still does).  To get a memory without pre-modify
addressing, you used "es".

Since people kept getting that wrong (it *is* surprising), it was changed
to the current scheme.  But the kernel uses weren't updated (and no one
seems to have missed it).

> >>  #else /* __powerpc64__ */
> >>  #define __put_user_asm2(x, addr, err) \
> >>__asm__ __volatile__(   \
> >>-   "1: stw %1,0(%2)\n" \
> >>-   "2: stw %1+1,4(%2)\n"   \
> >>+   "1: stw%U2%X2 %1,%2\n"  \
> >>+   "2: stw%U2%X2 %L1,%L2\n"\
> >>"3:\n"  \
> >>".section .fixup,\"ax\"\n"  \
> >>"4: li %0,%3\n" \
> >>@@ -140,7 +140,7 @@ extern long __put_user_bad(void);
> >>EX_TABLE(1b, 4b)\
> >>EX_TABLE(2b, 4b)\
> >>: "=r" (err)\
> >>-   : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> >>+   : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))
> >
> >Here, it doesn't work.  You don't want two consecutive update insns in
> >any case.  Easiest is to just not use "m<>", and then, don't use %Un
> >(which won't do anything, but it is confusing).
> 
> Can't we leave the Un on the second stw ?

That cannot work.  You can use it on only the *first* though :-)

And this doesn't work on LE I think?  (But the asm doesn't anyway?)

Or, you can decide this is all way too tricky, and not worth it.


Segher


Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

2020-04-16 Thread Segher Boessenkool
On Thu, Apr 16, 2020 at 08:12:19PM +0200, Florian Weimer wrote:
> > I think my choice would be just making the inline syscall be a single
> > call insn to an asm source file that out-of-lines the loading of TOC
> > pointer and call through it or branch based on hwcap so that it's not
> > repeated all over the place.
> 
> I don't know how problematic control flow out of an inline asm is on
> POWER.  But this is basically the -moutline-atomics approach.

Control flow out of inline asm (other than with "asm goto") is not
allowed at all, just like on any other target (and will not work in
practice, either -- just like on any other target).  But the suggestion
was to use actual assembler code, not inline asm?


Segher


Re: [PATCH v2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-04-16 Thread Segher Boessenkool
On Thu, Apr 16, 2020 at 02:41:56PM +0200, Christophe Leroy wrote:
> Le 16/04/2020 à 00:37, Segher Boessenkool a écrit :
> >>+   __put_user_nocheck_goto((__typeof__(*(ptr)))(x), (ptr), 
> >>sizeof(*(ptr)), label)
> >
> >This line gets too long, can you break it up somehow?
> 
> This line has 86 chars.

(And your mail program has wrapped it ;-) )

> powerpc arch tolerates lines with up to 90 chars, see 
> arch/powerpc/tools/checkpatch.sh

I *tolerate* it as well, sure, but long lines are bad for readability.
Like, I noticed it because it wrapped :-)

That "90" thing is just dumb, we should get rid of it.  Sometimes you
can have long lines, if that is better than the alternatives.  There
does not need to be a ridiculous "rule" that is unhappy *both* ways!

(This is true for many things in checkpatch, btw...  Rules of thumb,
not rules).


Segher


Re: [PATCH v2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-04-15 Thread Segher Boessenkool
Hi!

On Wed, Apr 15, 2020 at 09:25:59AM +, Christophe Leroy wrote:
> +#define __put_user_goto(x, ptr, label) \
> + __put_user_nocheck_goto((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), 
> label)

This line gets too long, can you break it up somehow?

> +#define __put_user_asm_goto(x, addr, label, op)  \
> + asm volatile goto(  \
> + "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> + EX_TABLE(1b, %l2)   \
> + :   \
> + : "r" (x), "m" (*addr)  \
> + :   \
> + : label)

Same "%Un" problem as in the other patch.  You could use "m<>" here,
but maybe just dropping "%Un" is better.

> +#ifdef __powerpc64__
> +#define __put_user_asm2_goto(x, ptr, label)  \
> + __put_user_asm_goto(x, ptr, label, "std")
> +#else /* __powerpc64__ */
> +#define __put_user_asm2_goto(x, addr, label) \
> + asm volatile goto(  \
> + "1: stw%U1%X1 %0, %1\n" \
> + "2: stw%U1%X1 %L0, %L1\n"   \
> + EX_TABLE(1b, %l2)   \
> + EX_TABLE(2b, %l2)   \
> + :   \
> + : "r" (x), "m" (*addr)  \
> +     :   \
> + : label)
> +#endif /* __powerpc64__ */

Here, you should drop it for sure.

Rest looks fine.

Reviewed-by: Segher Boessenkool 


Segher


Re: [PATCH] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-04-15 Thread Segher Boessenkool
Hi!

On Wed, Apr 15, 2020 at 09:20:26AM +, Christophe Leroy wrote:
> At the time being, __put_user()/__get_user() and friends only use
> register indirect with immediate index addressing, with the index
> set to 0. Ex:
> 
>   lwz reg1, 0(reg2)

This is called a "D-form" instruction, or sometimes "offset addressing".
Don't talk about an "index", it confuses things, because the *other*
kind is called "indexed" already, also in the ISA docs!  (X-form, aka
indexed addressing, [reg+reg], where D-form does [reg+imm], and both
forms can do [reg]).

> Give the compiler the opportunity to use other adressing modes
> whenever possible, to get more optimised code.

Great :-)

> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -114,7 +114,7 @@ extern long __put_user_bad(void);
>   */
>  #define __put_user_asm(x, addr, err, op) \
>   __asm__ __volatile__(   \
> - "1: " op " %1,0(%2) # put_user\n"   \
> + "1: " op "%U2%X2 %1,%2  # put_user\n"   \
>   "2:\n"  \
>   ".section .fixup,\"ax\"\n"  \
>   "3: li %0,%3\n" \
> @@ -122,7 +122,7 @@ extern long __put_user_bad(void);
>   ".previous\n"   \
>   EX_TABLE(1b, 3b)\
>   : "=r" (err)\
> - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))

%Un on an "m" operand doesn't do much: you need to make it "m<>" if you
want pre-modify ("update") insns to be generated.  (You then will want
to make sure that operand is used in a way GCC can understand; since it
is used only once here, that works fine).

> @@ -130,8 +130,8 @@ extern long __put_user_bad(void);
>  #else /* __powerpc64__ */
>  #define __put_user_asm2(x, addr, err)\
>   __asm__ __volatile__(   \
> - "1: stw %1,0(%2)\n" \
> - "2: stw %1+1,4(%2)\n"   \
> + "1: stw%U2%X2 %1,%2\n"  \
> + "2: stw%U2%X2 %L1,%L2\n"\
>   "3:\n"  \
>   ".section .fixup,\"ax\"\n"  \
>   "4: li %0,%3\n" \
> @@ -140,7 +140,7 @@ extern long __put_user_bad(void);
>   EX_TABLE(1b, 4b)\
>   EX_TABLE(2b, 4b)\
>   : "=r" (err)        \
> - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))

Here, it doesn't work.  You don't want two consecutive update insns in
any case.  Easiest is to just not use "m<>", and then, don't use %Un
(which won't do anything, but it is confusing).

Same for the reads.

Rest looks fine, and update should be good with that fixed as said.

Reviewed-by: Segher Boessenkool 


Segher


Re: [PATCH v5 05/21] powerpc: Use a function for getting the instruction op code

2020-04-08 Thread Segher Boessenkool
Hi!

On Mon, Apr 06, 2020 at 06:09:20PM +1000, Jordan Niethe wrote:
> +static inline int ppc_inst_opcode(u32 x)
> +{
> + return x >> 26;
> +}

Maybe you should have "primary opcode" in this function name?


Segher


Re: [PATCH v5 18/21] powerpc64: Add prefixed instructions to instruction data type

2020-04-08 Thread Segher Boessenkool
On Mon, Apr 06, 2020 at 12:25:27PM +0200, Christophe Leroy wrote:
> > if (ppc_inst_prefixed(x) != ppc_inst_prefixed(y))
> > return false;
> > else if (ppc_inst_prefixed(x))
> > return !memcmp(, , sizeof(struct ppc_inst));
> 
> Are we sure memcmp() is a good candidate for the comparison ? Can we do 
> simpler ? Especially, I understood a prefixed instruction is a 64 bits 
> properly aligned instruction, can we do a simple u64 compare ? Or is GCC 
> intelligent enough to do that without calling memcmp() function which is 
> heavy ?

A prefixed insn is *not* 8-byte aligned, it is 4-byte aligned, fwiw.

memcmp() isn't as heavy as you fear, not with a non-ancient GCC at least.
But this could be written in a nicer way, sure :-)


Segher


Re: [PATCH 1/2] powerpc: Add base support for ISA v3.1

2020-04-03 Thread Segher Boessenkool
Hi!

On Fri, Apr 03, 2020 at 03:10:54PM +1100, Alistair Popple wrote:
> +#define   PCR_ARCH_300   0x10/* Architecture 3.00 */

It's called 3.0, not 3.00?


Segher


Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-03-31 Thread Segher Boessenkool
On Tue, Mar 31, 2020 at 08:56:23AM +0200, Christophe Leroy wrote:
> While we are at it, can we also remove the 601 ? This one is also full 
> of workarounds and diverges a bit from other 6xx.
> 
> I'm unable to find its end of life date, but it was on the market in 
> 1994, so I guess it must be outdated by more than 10-15 yr old now ?

There probably are still some people running Linux on 601 powermacs.


Segher


Re: [PATCH v2] powerpc/boot: Delete unneeded .globl _zimage_start

2020-03-27 Thread Segher Boessenkool
On Fri, Mar 27, 2020 at 09:50:54AM -0700, Fangrui Song wrote:
> We aim for compatibility with GNU in many aspects to make it easier for
> people to switch over. However, just because there is a subtle behavior
> in GNU toolchain does not mean we need to emulate that behavior.

It isn't subtle.  It is the explicit documented behaviour.

> With
> all due respect, there are a large quantity of legacy behaviors we don't
> want to support.

And it isn't legacy at all.  It is simply the defined semantics.

It is semantics that real code relies on (and has for ages) as well.


Segher


Re: [PATCH v2] powerpc xmon: use `dcbf` inplace of `dcbi` instruction for 64bit Book3S

2020-03-27 Thread Segher Boessenkool
On Fri, Mar 27, 2020 at 04:12:13PM +0100, Christophe Leroy wrote:
> Maybe you could also change invalidate_dcache_range():
> 
>   for (i = 0; i < size >> shift; i++, addr += bytes) {
>   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
>   dcbf(addr);
>   else
>   dcbi(addr);
>   }

But please note that flushing is pretty much the opposite from
invalidating (a flush (dcbf) makes sure that what is in the cache now
ends up in memory, while an invalidate (dcbi) makes sure it will *not*
end up in memory).  (Both will remove the addressed cache line from the
data caches).

So you cannot blindly replace them; in all cases you need to look and
see if it does what you need here.

(dcbi is much harder to use correctly -- it can race very easily -- so
in practice you will be fine most of the time; but be careful around
startup code and the like).


Segher


Re: [PATCH] x86: Alias memset to __builtin_memset.

2020-03-27 Thread Segher Boessenkool
Hi!

On Thu, Mar 26, 2020 at 01:38:39PM +0100, Clement Courbet wrote:
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
> 
>  #define JMP_BUF_LEN23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
> 
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

Pedantically, jmp_buf should be an array type.  But, this will probably
work fine, and it certainly is better than what we had before.

You could do
  typedef long jmp_buf[JMP_BUF_LEN];
perhaps?

Thanks,


Segher


Re: [PATCH v2] powerpc/boot: Delete unneeded .globl _zimage_start

2020-03-27 Thread Segher Boessenkool
On Thu, Mar 26, 2020 at 03:26:12PM -0700, Fangrui Song wrote:
> On 2020-03-26, Segher Boessenkool wrote:
> >On Wed, Mar 25, 2020 at 09:42:57AM -0700, Fangrui Song wrote:
> >>.globl sets the symbol binding to STB_GLOBAL while .weak sets the
> >>binding to STB_WEAK. GNU as let .weak override .globl since binutils-gdb
> >>5ca547dc2399a0a5d9f20626d4bf5547c3ccfddd (1996). Clang integrated
> >>assembler let the last win but it may error in the future.
> >
> >GNU AS works for more than just ELF.  The way the assembler language
> >is defined, it is not .weak overriding .globl -- instead, .weak sets a
> >symbol attribute.  On an existing symbol (but it creates on if there is
> >none yet).
> >
> >Clang is buggy if it does not allow valid (and perfectly normal)
> >assembler code like this.
> 
> https://sourceware.org/pipermail/binutils/2020-March/110399.html
> 
> Alan: "I think it is completely fine for you to make the llvm assembler
> error on inconsistent binding, or the last directive win.  Either of
> those behaviours is logical and good, but you quite possibly will run
> into a need to fix more user assembly.

This would be fine and consistent behaviour, of course.  But it is not
appropriate if you want to pretend to be compatible to GNU toolchains.

Which is exactly why you want this kernel patch at all.  And the kernel
can (in this case) accommodate your buggy assembler, sure, but are you
going to "fix" all other programs with this "problem" as well?


Segher


Re: [PATCH v2] powerpc/boot: Delete unneeded .globl _zimage_start

2020-03-26 Thread Segher Boessenkool
On Wed, Mar 25, 2020 at 09:42:57AM -0700, Fangrui Song wrote:
> .globl sets the symbol binding to STB_GLOBAL while .weak sets the
> binding to STB_WEAK. GNU as let .weak override .globl since binutils-gdb
> 5ca547dc2399a0a5d9f20626d4bf5547c3ccfddd (1996). Clang integrated
> assembler let the last win but it may error in the future.

GNU AS works for more than just ELF.  The way the assembler language
is defined, it is not .weak overriding .globl -- instead, .weak sets a
symbol attribute.  On an existing symbol (but it creates on if there is
none yet).

Clang is buggy if it does not allow valid (and perfectly normal)
assembler code like this.


Segher


Re: [PATCH] powerpc/boot: Delete unneeded .globl _zimage_start

2020-03-25 Thread Segher Boessenkool
On Tue, Mar 24, 2020 at 10:18:20PM -0700, Fangrui Song wrote:
> .globl sets the symbol binding to STB_GLOBAL while .weak sets the
> binding to STB_WEAK. They should not be used together. It is accidetal
> rather then intentional that GNU as let .weak override .globl while
> clang integrated assembler let the last win.

Nothing is "overridden".

The as manual says (.weak):

  This directive sets the weak attribute on the comma separated list of
  symbol 'names'.  If the symbols do not already exist, they will be
  created.

so this behaviour is obviously as intended (or was later documented in
any case), so LLVM has a bug to fix (whether you like this (much saner)
behaviour or not).


Segher


Re: [PATCH] powerpc xmon: drop the option `i` in cacheflush

2020-03-23 Thread Segher Boessenkool
On Mon, Mar 23, 2020 at 04:55:48PM +0530, Balamuruhan S wrote:
> Data Cache Block Invalidate (dcbi) instruction implemented in 32-bit
> designs prior to PowerPC architecture version 2.01 and got obsolete
> from version 2.01.

It was added back in 2.03.  It also exists in 64-bit designs (using
category embedded), in 2.07 still even.


Segher


Re: [PATCH] arch/powerpc/64: Avoid isync in flush_dcache_range

2020-03-20 Thread Segher Boessenkool
On Fri, Mar 20, 2020 at 08:38:42PM +0530, Aneesh Kumar K.V wrote:
> On 3/20/20 8:35 PM, Segher Boessenkool wrote:
> >On Fri, Mar 20, 2020 at 04:02:42PM +0530, Aneesh Kumar K.V wrote:
> >>As per ISA and isync is only needed on instruction cache
> >>block invalidate. Remove the same from dcache invalidate.
> >
> >Is that true on older CPUs?
> >
> 
> That is what I found by checking with hardware team.

Oh, the comment right before this function says "does not invalidat
the corresponding insncache blocks", so this looks fine, sorry for not
looking closely enough before.

> One thing i was not 
> able to get full confirmation about was the usage of 'sync' before 'dcbf'.

Yeah, this looks like something that would matter on some implementations.
Would it make anything measurably faster if you would remove that sync?


Segher


Re: [PATCH] arch/powerpc/64: Avoid isync in flush_dcache_range

2020-03-20 Thread Segher Boessenkool
On Fri, Mar 20, 2020 at 04:02:42PM +0530, Aneesh Kumar K.V wrote:
> As per ISA and isync is only needed on instruction cache
> block invalidate. Remove the same from dcache invalidate.

Is that true on older CPUs?


Segher


Re: [PATCH 12/15] powerpc/watchpoint: Prepare handler to handle more than one watcnhpoint

2020-03-18 Thread Segher Boessenkool
On Wed, Mar 18, 2020 at 12:44:52PM +0100, Christophe Leroy wrote:
> Le 18/03/2020 à 12:35, Michael Ellerman a écrit :
> >Christophe Leroy  writes:
> >>Le 09/03/2020 à 09:58, Ravi Bangoria a écrit :
> >>>Currently we assume that we have only one watchpoint supported by hw.
> >>>Get rid of that assumption and use dynamic loop instead. This should
> >>>make supporting more watchpoints very easy.
> >>
> >>I think using 'we' is to be avoided in commit message.
> >
> >Hmm, is it?
> >
> >I use 'we' all the time. Which doesn't mean it's correct, but I think it
> >reads OK.
> >
> >cheers
> 
> From 
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html :
> 
> Describe your changes in imperative mood, e.g. “make xyzzy do frotz” 
> instead of “[This patch] makes xyzzy do frotz” or “[I] changed xyzzy 
> to do frotz”, as if you are giving orders to the codebase to change its 
> behaviour.

That is what is there already?  "Get rid of ...".

You cannot describe the current situation with an imperative.


Segher


Re: [PATCH 02/15] powerpc/watchpoint: Add SPRN macros for second DAWR

2020-03-18 Thread Segher Boessenkool
On Tue, Mar 17, 2020 at 11:16:34AM +0100, Christophe Leroy wrote:
> 
> 
> Le 09/03/2020 à 09:57, Ravi Bangoria a écrit :
> >Future Power architecture is introducing second DAWR. Add SPRN_ macros
> >for the same.
> 
> I'm not sure this is called 'macros'. For me a macro is something more 
> complex.

It is called "macros" in the C standard, and in common usage as well.
"Object-like macros", as opposed to "function-like macros": there are
no arguments.

> For me those are 'constants'.

That would be more like "static const" in C since 1990 ;-)


Segher


Re: [PATCH 00/15] powerpc/watchpoint: Preparation for more than one watchpoint

2020-03-16 Thread Segher Boessenkool
On Mon, Mar 16, 2020 at 04:05:01PM +0100, Christophe Leroy wrote:
> Some book3s (e300 family for instance, I think G2 as well) already have 
> a DABR2 in addition to DABR.

The original "G2" (meaning 603 and 604) do not have DABR2.  The newer
"G2" (meaning e300) does have it.  e500 and e600 do not have it either.

Hope I got that right ;-)


Segher


  1   2   3   4   5   6   7   8   9   10   >