Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 03:31:53PM +, David Laight wrote:
> > >   asm volatile ("" : "+r" (eax));
> > >   // So here eax must contain the value set by the "x" instructions.
> > 
> > No, the register eax will contain the value of the eax variable.  In the
> > asm; it might well be there before or after the asm as well, but none of
> > that is guaranteed.
> 
> Perhaps not 'guaranteed', but very unlikely to be wrong.
> It doesn't give gcc much scope for not generating the desired code.

Wanna bet?  :-)

Correct is correct.  Anything else is not.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Wed, Sep 09, 2020 at 02:33:36PM -0700, Linus Torvalds wrote:
> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
>  wrote:
> >
> > It will not work like this in GCC, no.  The LLVM people know about that.
> > I do not know why they insist on pushing this, being incompatible and
> > everything.
> 
> Umm. Since they'd be the ones supporting this, *gcc* would be the
> incompatible one, not clang.

This breaks the basic requirements of asm goto.

> So I'd phrase it differently. If gcc is planning on doing some
> different model for asm goto with outputs, that would be the
> incompatible case.

If we will do asm goto with outputs, the asm will still be a jump
instruction!  (It is not in LLVM!)

We probably *can* make asm goto have outputs (jump instructions can have
outputs just fine!  Just output reloads on jump instructions are hard,
because not always they are *possible*; but for asm goto it should be
fine).

Doing as LLVM does, and making the asm a "trapping" instruction, makes
it not a jump insn, and opens up whole new cans of worms (including
inferior code quality).  Since it has very different semantics, and we
might want to keep the semantics of asm goto as well anyway, this should
be called something different ("asm break" or "asm __anything" for
example).

It would be nice if they talked to us about it, too.  LLVM claims it
implements the GCC inline asm extension.  It already only is compatible
for the simplest of cases, but this would be much worse still :-(

> and honestly, (b) is actually inferior for the error cases, even if to
> a compiler person it might feel like the "RightThing(tm)" to do.
> Because when an exception happens, the outputs simply won't be
> initialized.

Sure, that is fine, and quite possible useful, but it is not the same as
asm goto.  asm goto is not some exception handling construct: it is a
jump instruction.

> Anyway, for either of those cases, the kernel won't care either way.
> We'll have to support the non-goto case for many years even if
> everybody were to magically implement it today, so it's not like this
> is a "you have to do it" thing.

Yes.

I'm just annoyed because of all the extra work created by people not
communicating.


Segher


RE: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread David Laight



> -Original Message-
> From: Segher Boessenkool 
> Sent: 10 September 2020 16:21
> To: David Laight 
> Cc: 'Christophe Leroy' ; 'Linus Torvalds' 
>  foundation.org>; linux-arch ; Kees Cook 
> ; the
> arch/x86 maintainers ; Nick Desaulniers 
> ; Linux Kernel
> Mailing List ; Alexey Dobriyan 
> ; Luis Chamberlain
> ; Al Viro ; linux-fsdevel 
> ;
> linuxppc-dev ; Christoph Hellwig 
> Subject: Re: remove the last set_fs() in common code, and remove it for x86 
> and powerpc v3
> 
> On Thu, Sep 10, 2020 at 12:26:53PM +, David Laight wrote:
> > Actually this is pretty sound:
> > __label__ label;
> > register int eax asm ("eax");
> > // Ensure eax can't be reloaded from anywhere
> > // In particular it can't be reloaded after the asm goto line
> > asm volatile ("" : "=r" (eax));
> 
> This asm is fine.  It says it writes the "eax" variable, which lives in
> the eax register *in that asm* (so *not* guaranteed after it!).
> 
> > // Provided gcc doesn't save eax here...
> > asm volatile goto ("x" ::: "eax" : label);
> 
> So this is incorrect.

>From the other email:

> It is neither input nor output operand here!  Only *then* is a local
> register asm guaranteed to be in the given reg: as input or output to an
> inline asm.

Ok, so adding '"r" (eax)' to the input section helps a bit.

> > // ... and reload the saved value here.
> > // The input value here will be that modified by the 'asm goto'.
> > // Since this modifies eax it can't be moved before the 'asm goto'.
> > asm volatile ("" : "+r" (eax));
> > // So here eax must contain the value set by the "x" instructions.
> 
> No, the register eax will contain the value of the eax variable.  In the
> asm; it might well be there before or after the asm as well, but none of
> that is guaranteed.

Perhaps not 'guaranteed', but very unlikely to be wrong.
It doesn't give gcc much scope for not generating the desired code.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 12:26:53PM +, David Laight wrote:
> Actually this is pretty sound:
>   __label__ label;
>   register int eax asm ("eax");
>   // Ensure eax can't be reloaded from anywhere
>   // In particular it can't be reloaded after the asm goto line
>   asm volatile ("" : "=r" (eax));

This asm is fine.  It says it writes the "eax" variable, which lives in
the eax register *in that asm* (so *not* guaranteed after it!).

>   // Provided gcc doesn't save eax here...
>   asm volatile goto ("x" ::: "eax" : label);

So this is incorrect.

>   // ... and reload the saved value here.
>   // The input value here will be that modified by the 'asm goto'.
>   // Since this modifies eax it can't be moved before the 'asm goto'.
>   asm volatile ("" : "+r" (eax));
>   // So here eax must contain the value set by the "x" instructions.

No, the register eax will contain the value of the eax variable.  In the
asm; it might well be there before or after the asm as well, but none of
that is guaranteed.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 09:26:28AM +, David Laight wrote:
> From: Christophe Leroy
> > Sent: 10 September 2020 09:14
> > 
> > Le 10/09/2020 à 10:04, David Laight a écrit :
> > > From: Linus Torvalds
> > >> Sent: 09 September 2020 22:34
> > >> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
> > >>  wrote:
> > >>>
> > >>> It will not work like this in GCC, no.  The LLVM people know about that.
> > >>> I do not know why they insist on pushing this, being incompatible and
> > >>> everything.
> > >>
> > >> Umm. Since they'd be the ones supporting this, *gcc* would be the
> > >> incompatible one, not clang.
> > >
> > > I had an 'interesting' idea.
> > >
> > > Can you use a local asm register variable as an input and output to
> > > an 'asm volatile goto' statement?
> > >
> > > Well you can - but is it guaranteed to work :-)
> > >
> > 
> > With gcc at least it should work according to
> > https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
> > 
> > They even explicitely tell: "The only supported use for this feature is
> > to specify registers for input and output operands when calling Extended
> > asm "
> 
> A quick test isn't good
> 
> int bar(char *z)
> {
> __label__ label;
> register int eax asm ("eax") = 6;
> asm volatile goto (" mov $1, %%eax" ::: "eax" : label);
> 
> label:
> return eax;
> }

It is neither input nor output operand here!  Only *then* is a local
register asm guaranteed to be in the given reg: as input or output to an
inline asm.


Segher


RE: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread David Laight
From: David Laight
> Sent: 10 September 2020 10:26
...
> > > I had an 'interesting' idea.
> > >
> > > Can you use a local asm register variable as an input and output to
> > > an 'asm volatile goto' statement?
> > >
> > > Well you can - but is it guaranteed to work :-)
> > >
> >
> > With gcc at least it should work according to
> > https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
> >
> > They even explicitely tell: "The only supported use for this feature is
> > to specify registers for input and output operands when calling Extended
> > asm "
> 
> A quick test isn't good
> 
> int bar(char *z)
> {
> __label__ label;
> register int eax asm ("eax") = 6;
> asm volatile goto (" mov $1, %%eax" ::: "eax" : label);
> label:
> return eax;
> }
> 
> 0040 :
>   40:   b8 01 00 00 00  mov$0x1,%eax
>   45:   b8 06 00 00 00  mov$0x6,%eax
>   4a:   c3  retq
> 
> although adding:
> asm volatile ("" : "+r" (eax));
> either side of the 'asm volatile goto' does fix it.

Actually this is pretty sound:
__label__ label;
register int eax asm ("eax");
// Ensure eax can't be reloaded from anywhere
// In particular it can't be reloaded after the asm goto line
asm volatile ("" : "=r" (eax));
// Provided gcc doesn't save eax here...
asm volatile goto ("x" ::: "eax" : label);
// ... and reload the saved value here.
// The input value here will be that modified by the 'asm goto'.
// Since this modifies eax it can't be moved before the 'asm goto'.
asm volatile ("" : "+r" (eax));
// So here eax must contain the value set by the "x" instructions.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


RE: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread David Laight
From: Christophe Leroy
> Sent: 10 September 2020 09:14
> 
> Le 10/09/2020 à 10:04, David Laight a écrit :
> > From: Linus Torvalds
> >> Sent: 09 September 2020 22:34
> >> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
> >>  wrote:
> >>>
> >>> It will not work like this in GCC, no.  The LLVM people know about that.
> >>> I do not know why they insist on pushing this, being incompatible and
> >>> everything.
> >>
> >> Umm. Since they'd be the ones supporting this, *gcc* would be the
> >> incompatible one, not clang.
> >
> > I had an 'interesting' idea.
> >
> > Can you use a local asm register variable as an input and output to
> > an 'asm volatile goto' statement?
> >
> > Well you can - but is it guaranteed to work :-)
> >
> 
> With gcc at least it should work according to
> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
> 
> They even explicitely tell: "The only supported use for this feature is
> to specify registers for input and output operands when calling Extended
> asm "

A quick test isn't good

int bar(char *z)
{
__label__ label;
register int eax asm ("eax") = 6;
asm volatile goto (" mov $1, %%eax" ::: "eax" : label);

label:
return eax;
}

0040 :
  40:   b8 01 00 00 00  mov$0x1,%eax
  45:   b8 06 00 00 00  mov$0x6,%eax
  4a:   c3  retq

although adding:
asm volatile ("" : "+r" (eax));
either side of the 'asm volatile goto' does fix it.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread Christophe Leroy




Le 10/09/2020 à 10:04, David Laight a écrit :

From: Linus Torvalds

Sent: 09 September 2020 22:34
On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
 wrote:


It will not work like this in GCC, no.  The LLVM people know about that.
I do not know why they insist on pushing this, being incompatible and
everything.


Umm. Since they'd be the ones supporting this, *gcc* would be the
incompatible one, not clang.


I had an 'interesting' idea.

Can you use a local asm register variable as an input and output to
an 'asm volatile goto' statement?

Well you can - but is it guaranteed to work :-)



With gcc at least it should work according to 
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html


They even explicitely tell: "The only supported use for this feature is 
to specify registers for input and output operands when calling Extended 
asm "


Christophe


RE: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-10 Thread David Laight
From: Linus Torvalds
> Sent: 09 September 2020 22:34
> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
>  wrote:
> >
> > It will not work like this in GCC, no.  The LLVM people know about that.
> > I do not know why they insist on pushing this, being incompatible and
> > everything.
> 
> Umm. Since they'd be the ones supporting this, *gcc* would be the
> incompatible one, not clang.

I had an 'interesting' idea.

Can you use a local asm register variable as an input and output to
an 'asm volatile goto' statement?

Well you can - but is it guaranteed to work :-)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-09 Thread Linus Torvalds
On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
 wrote:
>
> It will not work like this in GCC, no.  The LLVM people know about that.
> I do not know why they insist on pushing this, being incompatible and
> everything.

Umm. Since they'd be the ones supporting this, *gcc* would be the
incompatible one, not clang.

Like it or not, clang is becoming a major kernel compiler. It's
already basically used for all android uses afaik.

So I'd phrase it differently. If gcc is planning on doing some
different model for asm goto with outputs, that would be the
incompatible case.

I'm not sure how gcc could do it differently. The only possible
difference I see is

 (a) not doing it at all

 (b) doing the "all goto targets have the outputs" case

and honestly, (b) is actually inferior for the error cases, even if to
a compiler person it might feel like the "RightThing(tm)" to do.
Because when an exception happens, the outputs simply won't be
initialized.

Anyway, for either of those cases, the kernel won't care either way.
We'll have to support the non-goto case for many years even if
everybody were to magically implement it today, so it's not like this
is a "you have to do it" thing.

   Linus


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-09 Thread Segher Boessenkool
On Wed, Sep 09, 2020 at 10:31:34AM -0700, Linus Torvalds wrote:
> And apparently there are people working on this on the gcc side too,
> so it won't just be clang-specific. Nor kernel-specific in that Nick
> tells me some other projects are looking at using that asm goto with
> outputs too.

It will not work like this in GCC, no.  The LLVM people know about that.
I do not know why they insist on pushing this, being incompatible and
everything.


Segher


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-09 Thread Linus Torvalds
On Thu, Sep 3, 2020 at 7:28 AM Al Viro  wrote:
>
> I can live with this series; do you want that in vfs.git#for-next?

Well, it's apparently there now (at least it's in your base.set_fs
branch, I didn't check actual -next).

So this is just a heads-up that I plan to merge the "asm goto" changes
on top of this during 5.10. Nick did the patch to make my patch-set
work either with or without the actual asm goto support, and I've been
running it privately now for several months.

And apparently there are people working on this on the gcc side too,
so it won't just be clang-specific. Nor kernel-specific in that Nick
tells me some other projects are looking at using that asm goto with
outputs too.

Anyway, the actual patch to use asm goto with outputs is fairly small
and not that interesting to people (since no released compiler
supports it), but part of the infrastructure to make it tiny is to
just get rid of the inlined "__get_user()" and "__put_user()" stuff.
I've ranted against those functions for a few years by now, so part of
this is to stop inlining them and make people think they are "good",
but part of it is also that those macros and inline functions are the
main remaining ones that mess with this all.

I'm attaching the two __get_user/__put_user patches here in case
anybody cares, but these are the pre-rebased ones, I'll make them work
with the new world order as it happens. The main change is:

 (a) unify around a common special calling convention:
   - %bx is clobbered
   - %cx contains the user address on input, and the error value on output
   - %ax/%dx contains the actual value (input for put, output for get,
of course)

 (b) unify around using just a "call", using the model that
get/put_user already did.
   - use "*_nocheck" for the double-underscore versions
   - this still has to use inline asm because the calling convention is odd
   - otherwise basically just a "call __{get,put}_user_[nocheck_]X"
where X is the size.

IOW, we unify around one single calling convention., and one single
model for actually getting things done.

I still want to remove the double-underscore versions entirely some
day - they have absolutely zero advantages compared to the full "do
address_ok as part of the operation" - but that's a separate thing. At
least they can be unified.

And the reason for this all is obviously that now *only* the
"unsafe_{get,put}_user()" cases with the error label output are the
"fast inlined" cases. They are the only ones that _can_ be done
quickly inline, since the slow clac/stac is not part of them. Plus
they already have that unified usage model of the error label, even if
unsafe_get_user() currently does it manually because "asm goto" with
outputs doesn't work in existing compilers.

Comments?

I suspect people won't care, but I thought I'd post these so that
there won't be any surprises during the next merge window when I apply
them after merging the set_fs() removal branch..

 Linus
From 52c7574a0d15722df52158a3d766803662d9a6ff Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Wed, 8 Apr 2020 12:50:01 -0700
Subject: [PATCH 1/6] x86: Make __get_user() generate an out-of-line call

Instead of inlining the whole stac/lfence/mov/clac sequence (which also
requires individual exception table entries and several asm instruction
alternatives entries), just generate "call __get_user_nocheck_X" for the
__get_user() cases.

We can use all the same infrastructure that we already do for the
regular "get_user()", and the end result is simpler source code, and
much simpler code generation.

It also measn that when I introduce asm goto with input for
"unsafe_get_user()", there are no nasty interactions with the
__get_user() code.

Cc: Al Viro 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Peter Zijlstra 
Signed-off-by: Linus Torvalds 
---
 arch/x86/include/asm/uaccess.h | 128 ++---
 arch/x86/lib/getuser.S |  60 
 2 files changed, 114 insertions(+), 74 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index ecefaffd15d4..cf5a3f61db3b 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -96,25 +96,14 @@ static inline bool pagefault_disabled(void);
 	likely(!__range_not_ok(addr, size, user_addr_max()));		\
 })
 
-/*
- * These are the main single-value transfer routines.  They automatically
- * use the right size if we just have the right pointer type.
- *
- * This gets kind of ugly. We want to return _two_ values in "get_user()"
- * and yet we don't want to do any pointers, because that is too much
- * of a performance impact. Thus we have a few rather ugly macros here,
- * and hide all the ugliness from the user.
- *
- * The "__xxx" versions of the user access functions are versions that
- * do not verify the address space, that must have been done previously
- * with a separate "access_ok()" call (this is used when we do multiple
- * accesse

RE: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-05 Thread David Laight
From: Christophe Leroy
> Sent: 05 September 2020 08:16
> 
> Le 04/09/2020 à 23:01, David Laight a écrit :
> > From: Alexey Dobriyan
> >> Sent: 04 September 2020 18:58
...
> > What is this strange %fs register you are talking about.
> > Figure 2-4 only has CS, DS, SS and ES.
> >
> 
> Intel added registers FS and GS in the i386

I know, I've got both the 'iAPX 286 Programmer's Reference Manual'
and the '80386 Programmer's Reference Manual' on my shelf.

I don't have the 8088 book though - which I used in 1982.

The old books are a lot easier to read if, for instance,
you are trying to work out how to back and forth to real mode
to do bios calls.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-05 Thread Christophe Leroy




Le 04/09/2020 à 23:01, David Laight a écrit :

From: Alexey Dobriyan

Sent: 04 September 2020 18:58

On Fri, Sep 04, 2020 at 08:00:24AM +0200, Ingo Molnar wrote:

* Christoph Hellwig  wrote:

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.


Cool! For the x86 bits:

   Acked-by: Ingo Molnar 


set_fs() is older than some kernel hackers!

$ cd linux-0.11/
$ find . -type f -name '*.h' | xargs grep -e set_fs -w -n -A3
./include/asm/segment.h:61:extern inline void set_fs(unsigned long val)
./include/asm/segment.h-62-{
./include/asm/segment.h-63- __asm__("mov %0,%%fs"::"a" ((unsigned 
short) val));
./include/asm/segment.h-64-}


What is this strange %fs register you are talking about.
Figure 2-4 only has CS, DS, SS and ES.



Intel added registers FS and GS in the i386

Christophe


RE: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-04 Thread David Laight
From: Alexey Dobriyan
> Sent: 04 September 2020 18:58
> 
> On Fri, Sep 04, 2020 at 08:00:24AM +0200, Ingo Molnar wrote:
> > * Christoph Hellwig  wrote:
> > > this series removes the last set_fs() used to force a kernel address
> > > space for the uaccess code in the kernel read/write/splice code, and then
> > > stops implementing the address space overrides entirely for x86 and
> > > powerpc.
> >
> > Cool! For the x86 bits:
> >
> >   Acked-by: Ingo Molnar 
> 
> set_fs() is older than some kernel hackers!
> 
>   $ cd linux-0.11/
>   $ find . -type f -name '*.h' | xargs grep -e set_fs -w -n -A3
>   ./include/asm/segment.h:61:extern inline void set_fs(unsigned long val)
>   ./include/asm/segment.h-62-{
>   ./include/asm/segment.h-63- __asm__("mov %0,%%fs"::"a" ((unsigned 
> short) val));
>   ./include/asm/segment.h-64-}

What is this strange %fs register you are talking about.
Figure 2-4 only has CS, DS, SS and ES.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-04 Thread Linus Torvalds
On Fri, Sep 4, 2020 at 10:58 AM Alexey Dobriyan  wrote:
>
> set_fs() is older than some kernel hackers!
>
> $ cd linux-0.11/
> $ find . -type f -name '*.h' | xargs grep -e set_fs -w -n -A3

Oh, it's older than that. It was there (as set_fs) in 0.10, and may
even predate that. But sadly, I don't have tar-balls for 0.02 and
0.03, so can't check.

The actual use of %fs as the user space segment is already there in
0.01, but there was no 'set_fs()'. That was a simpler and more direct
time, and "get_fs()" looked like this back then:

  #define _fs() ({ \
  register unsigned short __res; \
  __asm__("mov %%fs,%%ax":"=a" (__res):); \
  __res;})

and all the setting was basically part of the kernel entry asm and. Lovely.

 Linus


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-04 Thread Alexey Dobriyan
On Fri, Sep 04, 2020 at 08:00:24AM +0200, Ingo Molnar wrote:
> * Christoph Hellwig  wrote:
> > this series removes the last set_fs() used to force a kernel address
> > space for the uaccess code in the kernel read/write/splice code, and then
> > stops implementing the address space overrides entirely for x86 and
> > powerpc.
> 
> Cool! For the x86 bits:
> 
>   Acked-by: Ingo Molnar 

set_fs() is older than some kernel hackers!

$ cd linux-0.11/
$ find . -type f -name '*.h' | xargs grep -e set_fs -w -n -A3
./include/asm/segment.h:61:extern inline void set_fs(unsigned long val)
./include/asm/segment.h-62-{
./include/asm/segment.h-63- __asm__("mov %0,%%fs"::"a" ((unsigned 
short) val));
./include/asm/segment.h-64-}


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Ingo Molnar


* Christoph Hellwig  wrote:

> Hi all,
> 
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.

Cool! For the x86 bits:

  Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Linus Torvalds
On Thu, Sep 3, 2020 at 7:22 AM Christoph Hellwig  wrote:
>
> [Note to Linus: I'd like to get this into linux-next rather earlier
> than later.  Do you think it is ok to add this tree to linux-next?]

This whole series looks really good to me now, with each patch looking
like a clear cleanup on its own.

So ack on the whole series, and yes, please get this into linux-next
and ready for 5.10. Whether through Al or something else.

Thanks,

   Linus


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Christoph Hellwig
On Thu, Sep 03, 2020 at 03:36:29PM +0100, Al Viro wrote:
> FWIW, vfs.git#for-next is always a merge of independent branches; I don't
> put stuff directly into #for-next - too easy to lose that way.
> 
> IOW, that would be something like #base.set_fs, included into #for-next
> merge set.  And I've no problem with never-rebased branches...
> 
> Where in the mainline are the most recent prereqs of this series?

I can't think of anything past -rc1, but I haven't actually checked.


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Al Viro
On Thu, Sep 03, 2020 at 04:30:03PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 03, 2020 at 03:28:03PM +0100, Al Viro wrote:
> > On Thu, Sep 03, 2020 at 04:22:28PM +0200, Christoph Hellwig wrote:
> > 
> > > Besides x86 and powerpc I plan to eventually convert all other
> > > architectures, although this will be a slow process, starting with the
> > > easier ones once the infrastructure is merged.  The process to convert
> > > architectures is roughtly:
> > > 
> > >  (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
> > >  (2) implement __get_kernel_nofault and __put_kernel_nofault
> > >  (3) remove the arch specific address limitation functionality
> > 
> > The one to really watch out for is sparc; I have something in that
> > direction, will resurrect as soon as I'm done with eventpoll analysis...
> > 
> > I can live with this series; do you want that in vfs.git#for-next?
> 
> Either that or a separate tree is fine with me.  It would be good to
> eventually have a non-rebased stable tree so that other arch trees
> can work from it, though.

FWIW, vfs.git#for-next is always a merge of independent branches; I don't
put stuff directly into #for-next - too easy to lose that way.

IOW, that would be something like #base.set_fs, included into #for-next
merge set.  And I've no problem with never-rebased branches...

Where in the mainline are the most recent prereqs of this series?


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Christoph Hellwig
On Thu, Sep 03, 2020 at 03:28:03PM +0100, Al Viro wrote:
> On Thu, Sep 03, 2020 at 04:22:28PM +0200, Christoph Hellwig wrote:
> 
> > Besides x86 and powerpc I plan to eventually convert all other
> > architectures, although this will be a slow process, starting with the
> > easier ones once the infrastructure is merged.  The process to convert
> > architectures is roughtly:
> > 
> >  (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
> >  (2) implement __get_kernel_nofault and __put_kernel_nofault
> >  (3) remove the arch specific address limitation functionality
> 
> The one to really watch out for is sparc; I have something in that
> direction, will resurrect as soon as I'm done with eventpoll analysis...
> 
> I can live with this series; do you want that in vfs.git#for-next?

Either that or a separate tree is fine with me.  It would be good to
eventually have a non-rebased stable tree so that other arch trees
can work from it, though.


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Al Viro
On Thu, Sep 03, 2020 at 04:22:28PM +0200, Christoph Hellwig wrote:

> Besides x86 and powerpc I plan to eventually convert all other
> architectures, although this will be a slow process, starting with the
> easier ones once the infrastructure is merged.  The process to convert
> architectures is roughtly:
> 
>  (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
>  (2) implement __get_kernel_nofault and __put_kernel_nofault
>  (3) remove the arch specific address limitation functionality

The one to really watch out for is sparc; I have something in that
direction, will resurrect as soon as I'm done with eventpoll analysis...

I can live with this series; do you want that in vfs.git#for-next?


remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Christoph Hellwig
Hi all,

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.

[Note to Linus: I'd like to get this into linux-next rather earlier
than later.  Do you think it is ok to add this tree to linux-next?]

The file system part has been posted a few times, and the read/write side
has been pretty much unchanced.  For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them.  The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it.  All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.

Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged.  The process to convert
architectures is roughtly:

 (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
 (2) implement __get_kernel_nofault and __put_kernel_nofault
 (3) remove the arch specific address limitation functionality

Changes since v2:
 - add back the patch to support splice through read_iter/write iter
   on /proc/sys/*
 - entirely remove the tests that depend on set_fs.  Note that for
   lkdtm the maintainer (Kees) disagrees with this request from Linus
 - fix a wrong check in the powerpc access_ok, and drop a few spurious
   cleanups there

Changes since v1:
 - drop the patch to remove the non-iter ops for /dev/zero and
   /dev/null as they caused a performance regression
 - don't enable user access in __get_kernel on powerpc
 - xfail the set_fs() based lkdtm tests

Diffstat:


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-02 Thread Christoph Hellwig
On Tue, Sep 01, 2020 at 06:25:12PM +0100, Al Viro wrote:
> On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:
> 
> > 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero
> 
> Interesting...  Could you get an instruction-level profile inside 
> iov_iter_zero(),
> along with the disassembly of that sucker?

So the interesting thing here is with that none of these code paths
should have changed at all, and the biggest items on the profile look
the same modulo some minor reordering.


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 19:25, Al Viro a écrit :

On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:


 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero


Interesting...  Could you get an instruction-level profile inside 
iov_iter_zero(),
along with the disassembly of that sucker?



As a comparison, hereunder is the perf annotate of the 5.9-rc2 without 
the series:


 Percent |	Source code & Disassembly of vmlinux for cpu-clock (2581 
samples)

-
 :
 :
 :
 :  Disassembly of section .text:
 :
 :  c02cbb80 :
 :  iov_iter_zero():
3.22 :c02cbb80:   stwur1,-80(r1)
3.25 :c02cbb84:   stw r30,72(r1)
0.00 :c02cbb88:   mr  r30,r4
2.91 :c02cbb8c:   stw r31,76(r1)
0.00 :c02cbb90:   mr  r31,r3
0.19 :c02cbb94:   stw r27,60(r1)
 :  iov_iter_type():
1.82 :c02cbb98:   lwz r10,0(r4)
0.54 :c02cbb9c:   rlwinm  r9,r10,0,0,30
 :  iov_iter_zero():
1.98 :c02cbba0:   cmpwi   r9,32
0.00 :c02cbba4:   lwz r9,624(r2)
0.35 :c02cbba8:   stw r9,28(r1)
0.00 :c02cbbac:   li  r9,0
0.00 :c02cbbb0:   beq c02cbd00 
2.67 :c02cbbb4:   lwz r9,8(r4)
1.98 :c02cbbb8:   cmplw   r9,r3
0.00 :c02cbbbc:   mr  r27,r9
0.00 :c02cbbc0:   bgt c02cbce8 
0.31 :c02cbbc4:   cmpwi   r9,0
0.00 :c02cbbc8:   beq c02cbcbc 
3.22 :c02cbbcc:   andi.   r8,r10,16
1.70 :c02cbbd0:   lwz r31,4(r30)
0.00 :c02cbbd4:   bne c02cbe10 
0.31 :c02cbbd8:   andi.   r8,r10,8
0.00 :c02cbbdc:   bne c02cbf64 
1.82 :c02cbbe0:   andi.   r10,r10,64
0.00 :c02cbbe4:   bne c02cc080 
0.27 :c02cbbe8:   stw r29,68(r1)
1.94 :c02cbbec:   stw r28,64(r1)
1.98 :c02cbbf0:   lwz r28,12(r30)
0.31 :c02cbbf4:   lwz r7,4(r28)
2.13 :c02cbbf8:   subfr29,r31,r7
1.78 :c02cbbfc:   cmplw   r29,r27
0.08 :c02cbc00:   bgt c02cbcf8 
   28.24 :c02cbc04:   cmpwi   r29,0
0.00 :c02cbc08:   beq c02cc08c 
2.01 :c02cbc0c:   lwz r3,0(r28)
3.10 :c02cbc10:   lwz r10,1208(r2)
0.00 :c02cbc14:   add r3,r3,r31
 :  __access_ok():
0.00 :c02cbc18:   cmplw   r3,r10
0.00 :c02cbc1c:   bgt c02cbc7c 
3.37 :c02cbc20:   subfr10,r3,r10
0.00 :c02cbc24:   addir8,r29,-1
3.14 :c02cbc28:   cmplw   r8,r10
0.08 :c02cbc2c:   mflrr0
0.00 :c02cbc30:   stw r0,84(r1)
0.00 :c02cbc34:   bgt c02cbd40 
 :  clear_user():
0.00 :c02cbc38:   mr  r4,r29
2.40 :c02cbc3c:   bl  c001a428 <__arch_clear_user>
 :  iov_iter_zero():
1.55 :c02cbc40:   add r31,r31,r29
0.00 :c02cbc44:   cmpwi   r3,0
1.94 :c02cbc48:   subfr29,r29,r27
0.00 :c02cbc4c:   subfr31,r3,r31
0.00 :c02cbc50:   add r29,r29,r3
0.00 :c02cbc54:   beq c02cc0ac 
0.00 :c02cbc58:   lwz r9,8(r30)
0.00 :c02cbc5c:   subfr10,r27,r29
0.00 :c02cbc60:   lwz r0,84(r1)
0.00 :c02cbc64:   subfr27,r29,r27
0.00 :c02cbc68:   add r9,r10,r9
0.00 :c02cbc6c:   lwz r7,4(r28)
0.00 :c02cbc70:   lwz r10,12(r30)
0.00 :c02cbc74:   mtlrr0
0.00 :c02cbc78:   b   c02cbc84 
 :  __access_ok():
0.00 :c02cbc7c:   li  r27,0
0.00 :c02cbc80:   mr  r10,r28
 :  iov_iter_zero():
0.00 :c02cbc84:   cmplw   r31,r7
0.00 :c02cbc88:   bne c02cbc94 
0.93 :c02cbc8c:   addir28,r28,8
0.00 :c02cbc90:   li  r31,0
1.28 :c02cbc94:   lwz r8,16(r30)
0.00 :c02cbc98:   subfr10,r10,r28
1.05 :c02cbc9c:   srawi   r10,r10,3
0.00 :c02cbca0:   stw r28,12(r30)
0.00 :c02cbca4:   subfr10,r10,r8
0.93 :c02cbca8:   stw r10,16(r30)
0.04 :c02cbcac:   lwz r28,64(r1)
0.00 :c02cbcb0:   lwz r29,68(r1)
1.05 :c02cbcb4:   stw r9,8(r30)
0.00 :c02cbcb8: 

Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 19:25, Al Viro a écrit :

On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:


 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero


Interesting...  Could you get an instruction-level profile inside 
iov_iter_zero(),
along with the disassembly of that sucker?



Output of perf annotate:


 Percent |	Source code & Disassembly of vmlinux for cpu-clock (3579 
samples)

-
 :
 :
 :
 :  Disassembly of section .text:
 :
 :  c02cb3a4 :
 :  iov_iter_zero():
2.24 :c02cb3a4:   stwur1,-80(r1)
0.31 :c02cb3a8:   stw r30,72(r1)
0.00 :c02cb3ac:   mr  r30,r4
0.11 :c02cb3b0:   stw r31,76(r1)
0.00 :c02cb3b4:   mr  r31,r3
1.06 :c02cb3b8:   stw r27,60(r1)
 :  iov_iter_type():
0.03 :c02cb3bc:   lwz r10,0(r4)
0.06 :c02cb3c0:   rlwinm  r9,r10,0,0,30
 :  iov_iter_zero():
0.03 :c02cb3c4:   cmpwi   r9,32
0.00 :c02cb3c8:   lwz r9,624(r2)
2.15 :c02cb3cc:   stw r9,28(r1)
0.00 :c02cb3d0:   li  r9,0
0.00 :c02cb3d4:   beq c02cb520 
0.14 :c02cb3d8:   lwz r9,8(r4)
0.08 :c02cb3dc:   cmplw   r9,r3
0.00 :c02cb3e0:   mr  r27,r9
0.03 :c02cb3e4:   bgt c02cb4fc 
1.34 :c02cb3e8:   cmpwi   r9,0
0.00 :c02cb3ec:   beq c02cb4d0 
0.11 :c02cb3f0:   andi.   r8,r10,16
0.17 :c02cb3f4:   lwz r31,4(r30)
1.79 :c02cb3f8:   bne c02cb61c 
0.00 :c02cb3fc:   andi.   r8,r10,8
0.06 :c02cb400:   bne c02cb770 
0.22 :c02cb404:   andi.   r10,r10,64
0.03 :c02cb408:   bne c02cb88c 
0.11 :c02cb40c:   stw r29,68(r1)
1.59 :c02cb410:   stw r28,64(r1)
0.03 :c02cb414:   lwz r28,12(r30)
0.00 :c02cb418:   lwz r7,4(r28)
1.87 :c02cb41c:   subfr29,r31,r7
0.28 :c02cb420:   cmplw   r29,r27
0.03 :c02cb424:   bgt c02cb50c 
0.03 :c02cb428:   cmpwi   r29,0
0.00 :c02cb42c:   beq c02cb898 
1.34 :c02cb430:   lwz r3,0(r28)
 :  __access_ok():
0.00 :c02cb434:   lis r10,-16384
 :  iov_iter_zero():
0.36 :c02cb438:   add r3,r3,r31
 :  __access_ok():
0.03 :c02cb43c:   cmplw   r3,r10
1.79 :c02cb440:   bge c02cb514 
   13.19 :c02cb444:   subfr10,r3,r10
 :  clear_user():
0.00 :c02cb448:   cmplw   r29,r10
4.41 :c02cb44c:   mflrr0
0.00 :c02cb450:   stw r0,84(r1)
0.00 :c02cb454:   bgt c02cb8c4 
0.00 :c02cb458:   mr  r4,r29
0.00 :c02cb45c:   bl  c001a41c <__arch_clear_user>
 :  iov_iter_zero():
0.70 :c02cb460:   add r31,r31,r29
0.00 :c02cb464:   cmpwi   r3,0
   17.13 :c02cb468:   subfr29,r29,r27
0.00 :c02cb46c:   subfr31,r3,r31
1.20 :c02cb470:   add r29,r29,r3
0.00 :c02cb474:   beq c02cb8b8 
0.00 :c02cb478:   lwz r9,8(r30)
0.00 :c02cb47c:   subfr10,r27,r29
0.00 :c02cb480:   lwz r0,84(r1)
0.00 :c02cb484:   subfr27,r29,r27
0.00 :c02cb488:   add r9,r10,r9
0.00 :c02cb48c:   lwz r7,4(r28)
0.00 :c02cb490:   lwz r10,12(r30)
0.00 :c02cb494:   mtlrr0
1.65 :c02cb498:   cmplw   r31,r7
   14.61 :c02cb49c:   bne c02cb4a8 
1.65 :c02cb4a0:   addir28,r28,8
0.00 :c02cb4a4:   li  r31,0
   14.92 :c02cb4a8:   lwz r8,16(r30)
0.00 :c02cb4ac:   subfr10,r10,r28
1.12 :c02cb4b0:   srawi   r10,r10,3
0.56 :c02cb4b4:   stw r28,12(r30)
0.00 :c02cb4b8:   subfr10,r10,r8
1.23 :c02cb4bc:   stw r10,16(r30)
0.00 :c02cb4c0:   lwz r28,64(r1)
0.56 :c02cb4c4:   lwz r29,68(r1)
0.00 :c02cb4c8:   stw r9,8(r30)
2.12 :c02cb4cc:   stw r31,4(r30)
0.00 :c02cb4d0:   lwz r9,28(r1)
0.61 :c02cb4d4:   lwz r10,624(r2)
0.00 :c02cb4d8:   xor.r9,r9,r10
0.00 :c02cb4dc:   li  r10,0
0.00 :c02cb4e0:   b

Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Matthew Wilcox
On Tue, Sep 01, 2020 at 06:25:12PM +0100, Al Viro wrote:
> On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:
> 
> > 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero
> 
> Interesting...  Could you get an instruction-level profile inside 
> iov_iter_zero(),
> along with the disassembly of that sucker?

Also, does [1] make any difference?  Probably not since it's translating
O flags into IOCB flags instead of RWF flags into IOCB flags.  I wonder
if there's a useful trick we can play here ... something like:

static inline int iocb_flags(struct file *file)
{
int res = 0;
if (likely(!file->f_flags & O_APPEND | O_DIRECT | O_DSYNC | __O_SYNC)) 
&& !IS_SYNC(file->f_mapping->host))
return res;
if (file->f_flags & O_APPEND)
res |= IOCB_APPEND;
if (file->f_flags & O_DIRECT)
res |= IOCB_DIRECT;
if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
res |= IOCB_DSYNC;
if (file->f_flags & __O_SYNC)
res |= IOCB_SYNC;
return res;
}

Can we do something like force O_DSYNC to be set if the inode IS_SYNC()
at the time of open?  Or is setting the sync bit on the inode required
to affect currently-open files?

[1] 
https://lore.kernel.org/linux-fsdevel/95de7ce4-9254-39f1-304f-4455f66bf...@kernel.dk/
 


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Al Viro
On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:

> 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero

Interesting...  Could you get an instruction-level profile inside 
iov_iter_zero(),
along with the disassembly of that sucker?


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Christophe Leroy

Hi Christoph,

Le 27/08/2020 à 17:00, Christoph Hellwig a écrit :

Hi all,

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.

The file system part has been posted a few times, and the read/write side
has been pretty much unchanced.  For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them.  The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it.  All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.

Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged.  The process to convert
architectures is roughtly:

  (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
  (2) implement __get_kernel_nofault and __put_kernel_nofault
  (3) remove the arch specific address limitation functionality

Changes since v1:
  - drop the patch to remove the non-iter ops for /dev/zero and
/dev/null as they caused a performance regression
  - don't enable user access in __get_kernel on powerpc
  - xfail the set_fs() based lkdtm tests

Diffstat:




I'm still sceptic with the results I get.

With 5.9-rc2:

root@vgoippro:~# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 5.585880 seconds, 91.7MB/s
real0m 5.59s
user0m 1.40s
sys 0m 4.19s


With your series:

root@vgoippro:/tmp# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 7.780540 seconds, 65.8MB/s
real0m 7.79s
user0m 2.12s
sys 0m 5.66s




Top of perf report of a standard perf record:

With 5.9-rc2:

20.31%  dd   [kernel.kallsyms]  [k] __arch_clear_user
 8.37%  dd   [kernel.kallsyms]  [k] transfer_to_syscall
 7.37%  dd   [kernel.kallsyms]  [k] __fsnotify_parent
 6.95%  dd   [kernel.kallsyms]  [k] iov_iter_zero
 5.72%  dd   [kernel.kallsyms]  [k] new_sync_read
 4.87%  dd   [kernel.kallsyms]  [k] vfs_write
 4.47%  dd   [kernel.kallsyms]  [k] vfs_read
 3.07%  dd   [kernel.kallsyms]  [k] ksys_write
 2.77%  dd   [kernel.kallsyms]  [k] ksys_read
 2.65%  dd   [kernel.kallsyms]  [k] __fget_light
 2.37%  dd   [kernel.kallsyms]  [k] __fdget_pos
 2.35%  dd   [kernel.kallsyms]  [k] memset
 1.53%  dd   [kernel.kallsyms]  [k] rw_verify_area
 1.52%  dd   [kernel.kallsyms]  [k] read_iter_zero

With your series:
19.60%  dd   [kernel.kallsyms]  [k] __arch_clear_user
10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero
 9.50%  dd   [kernel.kallsyms]  [k] vfs_write
 8.97%  dd   [kernel.kallsyms]  [k] __fsnotify_parent
 5.46%  dd   [kernel.kallsyms]  [k] transfer_to_syscall
 5.42%  dd   [kernel.kallsyms]  [k] vfs_read
 3.58%  dd   [kernel.kallsyms]  [k] ksys_read
 2.84%  dd   [kernel.kallsyms]  [k] read_iter_zero
 2.24%  dd   [kernel.kallsyms]  [k] ksys_write
 1.80%  dd   [kernel.kallsyms]  [k] __fget_light
 1.34%  dd   [kernel.kallsyms]  [k] __fdget_pos
 0.91%  dd   [kernel.kallsyms]  [k] memset
 0.91%  dd   [kernel.kallsyms]  [k] rw_verify_area

Christophe


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-08-27 Thread Christoph Hellwig
> Diffstat:

Actually no diffstat here as David Howells pointed out.  Here we go:

 arch/Kconfig   |3 
 arch/alpha/Kconfig |1 
 arch/arc/Kconfig   |1 
 arch/arm/Kconfig   |1 
 arch/arm64/Kconfig |1 
 arch/c6x/Kconfig   |1 
 arch/csky/Kconfig  |1 
 arch/h8300/Kconfig |1 
 arch/hexagon/Kconfig   |1 
 arch/ia64/Kconfig  |1 
 arch/m68k/Kconfig  |1 
 arch/microblaze/Kconfig|1 
 arch/mips/Kconfig  |1 
 arch/nds32/Kconfig |1 
 arch/nios2/Kconfig |1 
 arch/openrisc/Kconfig  |1 
 arch/parisc/Kconfig|1 
 arch/powerpc/include/asm/processor.h   |7 -
 arch/powerpc/include/asm/thread_info.h |5 -
 arch/powerpc/include/asm/uaccess.h |   78 ---
 arch/powerpc/kernel/signal.c   |3 
 arch/powerpc/lib/sstep.c   |6 -
 arch/riscv/Kconfig |1 
 arch/s390/Kconfig  |1 
 arch/sh/Kconfig|1 
 arch/sparc/Kconfig |1 
 arch/um/Kconfig|1 
 arch/x86/ia32/ia32_aout.c  |1 
 arch/x86/include/asm/page_32_types.h   |   11 ++
 arch/x86/include/asm/page_64_types.h   |   38 +
 arch/x86/include/asm/processor.h   |   60 ---
 arch/x86/include/asm/thread_info.h |2 
 arch/x86/include/asm/uaccess.h |   26 --
 arch/x86/kernel/asm-offsets.c  |3 
 arch/x86/lib/getuser.S |   28 ---
 arch/x86/lib/putuser.S |   21 +++--
 arch/xtensa/Kconfig|1 
 drivers/misc/lkdtm/bugs.c  |4 +
 drivers/misc/lkdtm/usercopy.c  |4 +
 fs/read_write.c|   69 ++---
 fs/splice.c|  130 +++--
 include/linux/fs.h |2 
 include/linux/uaccess.h|   18 
 lib/test_bitmap.c  |   10 ++
 44 files changed, 235 insertions(+), 316 deletions(-)


remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-08-27 Thread Christoph Hellwig
Hi all,

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.

The file system part has been posted a few times, and the read/write side
has been pretty much unchanced.  For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them.  The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it.  All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.

Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged.  The process to convert
architectures is roughtly:

 (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
 (2) implement __get_kernel_nofault and __put_kernel_nofault
 (3) remove the arch specific address limitation functionality

Changes since v1:
 - drop the patch to remove the non-iter ops for /dev/zero and
   /dev/null as they caused a performance regression
 - don't enable user access in __get_kernel on powerpc
 - xfail the set_fs() based lkdtm tests

Diffstat:


iter and normal ops on /dev/zero & co, was Re: remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 09:16:59AM +0200, Christophe Leroy wrote:
> I made a test with only the first patch of your series: That's definitely 
> the culprit. With only that patch applies, the duration is 6.64 seconds, 
> that's a 25% degradation.

For the record: the first patch is:

 mem: remove duplicate ops for /dev/zero and /dev/null

So these micro-optimizations matter at least for some popular
benchmarks.  It would be easy to drop, but that means we either:

 - can't support kernel_read/write on these files, which should not
   matter

or
 
 - have to drop the check for both ops being present

Al, what do you think?


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-19 Thread Christophe Leroy




Le 18/08/2020 à 20:23, Christophe Leroy a écrit :



Le 18/08/2020 à 20:05, Christoph Hellwig a écrit :

On Tue, Aug 18, 2020 at 07:46:22PM +0200, Christophe Leroy wrote:

I gave it a go on my powerpc mpc832x. I tested it on top of my newest
series that reworks the 32 bits signal handlers (see
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278) with 


the microbenchmark test used is that series.

With KUAP activated, on top of signal32 rework, performance is 
boosted as
system time for the microbenchmark goes from 1.73s down to 1.56s, 
that is

10% quicker

Surprisingly, with the kernel as is today without my signal's series, 
your
series degrades performance slightly (from 2.55s to 2.64s ie 3.5% 
slower).



I also observe, in both cases, a degradation on

dd if=/dev/zero of=/dev/null count=1M

Without your series, it runs in 5.29 seconds.
With your series, it runs in 5.82 seconds, that is 10% more time.


That's pretty strage, I wonder if some kernel text cache line
effects come into play here?

The kernel access side is only used in slow path code, so it should
not make a difference, and the uaccess code is simplified and should be
(marginally) faster.

Btw, was this with the __{get,put}_user_allowed cockup that you noticed
fixed?



Yes it is with the __get_user_size() replaced by __get_user_size_allowed().


I made a test with only the first patch of your series: That's 
definitely the culprit. With only that patch applies, the duration is 
6.64 seconds, that's a 25% degradation.


A perf record provides the following without the patch:
41.91%  dd   [kernel.kallsyms]  [k] __arch_clear_user
 7.02%  dd   [kernel.kallsyms]  [k] vfs_read
 6.86%  dd   [kernel.kallsyms]  [k] new_sync_read
 6.68%  dd   [kernel.kallsyms]  [k] iov_iter_zero
 6.03%  dd   [kernel.kallsyms]  [k] transfer_to_syscall
 3.39%  dd   [kernel.kallsyms]  [k] memset
 3.07%  dd   [kernel.kallsyms]  [k] __fsnotify_parent
 2.68%  dd   [kernel.kallsyms]  [k] ksys_read
 2.09%  dd   [kernel.kallsyms]  [k] read_iter_zero
 2.01%  dd   [kernel.kallsyms]  [k] __fget_light
 1.84%  dd   [kernel.kallsyms]  [k] __fdget_pos
 1.35%  dd   [kernel.kallsyms]  [k] rw_verify_area
 1.32%  dd   libc-2.23.so   [.] __GI___libc_write
 1.21%  dd   [kernel.kallsyms]  [k] vfs_write
...
 0.03%  dd   [kernel.kallsyms]  [k] write_null

And the following with the patch:

15.54%  dd   [kernel.kallsyms]  [k] __arch_clear_user
 9.17%  dd   [kernel.kallsyms]  [k] vfs_read
 6.54%  dd   [kernel.kallsyms]  [k] new_sync_write
 6.31%  dd   [kernel.kallsyms]  [k] transfer_to_syscall
 6.29%  dd   [kernel.kallsyms]  [k] __fsnotify_parent
 6.20%  dd   [kernel.kallsyms]  [k] new_sync_read
 5.47%  dd   [kernel.kallsyms]  [k] memset
 5.13%  dd   [kernel.kallsyms]  [k] vfs_write
 4.44%  dd   [kernel.kallsyms]  [k] iov_iter_zero
 2.95%  dd   [kernel.kallsyms]  [k] write_iter_null
 2.82%  dd   [kernel.kallsyms]  [k] ksys_read
 2.46%  dd   [kernel.kallsyms]  [k] __fget_light
 2.34%  dd   libc-2.23.so   [.] __GI___libc_read
 1.89%  dd   [kernel.kallsyms]  [k] iov_iter_advance
 1.76%  dd   [kernel.kallsyms]  [k] __fdget_pos
 1.65%  dd   [kernel.kallsyms]  [k] rw_verify_area
 1.63%  dd   [kernel.kallsyms]  [k] read_iter_zero
 1.60%  dd   [kernel.kallsyms]  [k] iov_iter_init
 1.22%  dd   [kernel.kallsyms]  [k] ksys_write
 1.14%  dd   libc-2.23.so   [.] __GI___libc_write

Christophe



Christophe


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-18 Thread Christophe Leroy




Le 18/08/2020 à 20:05, Christoph Hellwig a écrit :

On Tue, Aug 18, 2020 at 07:46:22PM +0200, Christophe Leroy wrote:

I gave it a go on my powerpc mpc832x. I tested it on top of my newest
series that reworks the 32 bits signal handlers (see
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278) with
the microbenchmark test used is that series.

With KUAP activated, on top of signal32 rework, performance is boosted as
system time for the microbenchmark goes from 1.73s down to 1.56s, that is
10% quicker

Surprisingly, with the kernel as is today without my signal's series, your
series degrades performance slightly (from 2.55s to 2.64s ie 3.5% slower).


I also observe, in both cases, a degradation on

dd if=/dev/zero of=/dev/null count=1M

Without your series, it runs in 5.29 seconds.
With your series, it runs in 5.82 seconds, that is 10% more time.


That's pretty strage, I wonder if some kernel text cache line
effects come into play here?

The kernel access side is only used in slow path code, so it should
not make a difference, and the uaccess code is simplified and should be
(marginally) faster.

Btw, was this with the __{get,put}_user_allowed cockup that you noticed
fixed?



Yes it is with the __get_user_size() replaced by __get_user_size_allowed().

Christophe


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-18 Thread Christoph Hellwig
On Tue, Aug 18, 2020 at 07:46:22PM +0200, Christophe Leroy wrote:
> I gave it a go on my powerpc mpc832x. I tested it on top of my newest 
> series that reworks the 32 bits signal handlers (see 
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278) with 
> the microbenchmark test used is that series.
>
> With KUAP activated, on top of signal32 rework, performance is boosted as 
> system time for the microbenchmark goes from 1.73s down to 1.56s, that is 
> 10% quicker
>
> Surprisingly, with the kernel as is today without my signal's series, your 
> series degrades performance slightly (from 2.55s to 2.64s ie 3.5% slower).
>
>
> I also observe, in both cases, a degradation on
>
>   dd if=/dev/zero of=/dev/null count=1M
>
> Without your series, it runs in 5.29 seconds.
> With your series, it runs in 5.82 seconds, that is 10% more time.

That's pretty strage, I wonder if some kernel text cache line
effects come into play here?

The kernel access side is only used in slow path code, so it should
not make a difference, and the uaccess code is simplified and should be
(marginally) faster.

Btw, was this with the __{get,put}_user_allowed cockup that you noticed
fixed?


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-18 Thread Christophe Leroy




Le 17/08/2020 à 09:32, Christoph Hellwig a écrit :

Hi all,

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.

The file system part has been posted a few times, and the read/write side
has been pretty much unchanced.  For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them.  The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it.  All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.


I like this series.

I gave it a go on my powerpc mpc832x. I tested it on top of my newest 
series that reworks the 32 bits signal handlers (see 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278) 
with the microbenchmark test used is that series.


With KUAP activated, on top of signal32 rework, performance is boosted 
as system time for the microbenchmark goes from 1.73s down to 1.56s, 
that is 10% quicker


Surprisingly, with the kernel as is today without my signal's series, 
your series degrades performance slightly (from 2.55s to 2.64s ie 3.5% 
slower).



I also observe, in both cases, a degradation on

dd if=/dev/zero of=/dev/null count=1M

Without your series, it runs in 5.29 seconds.
With your series, it runs in 5.82 seconds, that is 10% more time.

Christophe




Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged.  The process to convert
architectures is roughtly:

  - ensure there is no set_fs(KERNEL_DS) left in arch specific code
  - implement __get_kernel_nofault and __put_kernel_nofault
  - remove the arch specific address limitation functionality

Diffstat:
  arch/Kconfig   |3
  arch/alpha/Kconfig |1
  arch/arc/Kconfig   |1
  arch/arm/Kconfig   |1
  arch/arm64/Kconfig |1
  arch/c6x/Kconfig   |1
  arch/csky/Kconfig  |1
  arch/h8300/Kconfig |1
  arch/hexagon/Kconfig   |1
  arch/ia64/Kconfig  |1
  arch/m68k/Kconfig  |1
  arch/microblaze/Kconfig|1
  arch/mips/Kconfig  |1
  arch/nds32/Kconfig |1
  arch/nios2/Kconfig |1
  arch/openrisc/Kconfig  |1
  arch/parisc/Kconfig|1
  arch/powerpc/include/asm/processor.h   |7 -
  arch/powerpc/include/asm/thread_info.h |5 -
  arch/powerpc/include/asm/uaccess.h |   78 ---
  arch/powerpc/kernel/signal.c   |3
  arch/powerpc/lib/sstep.c   |6 -
  arch/riscv/Kconfig |1
  arch/s390/Kconfig  |1
  arch/sh/Kconfig|1
  arch/sparc/Kconfig |1
  arch/um/Kconfig|1
  arch/x86/ia32/ia32_aout.c  |1
  arch/x86/include/asm/page_32_types.h   |   11 ++
  arch/x86/include/asm/page_64_types.h   |   38 +
  arch/x86/include/asm/processor.h   |   60 ---
  arch/x86/include/asm/thread_info.h |2
  arch/x86/include/asm/uaccess.h |   26 --
  arch/x86/kernel/asm-offsets.c  |3
  arch/x86/lib/getuser.S |   28 ---
  arch/x86/lib/putuser.S |   21 +++--
  arch/xtensa/Kconfig|1
  drivers/char/mem.c |   16 
  drivers/misc/lkdtm/bugs.c  |2
  drivers/misc/lkdtm/core.c  |4 +
  drivers/misc/lkdtm/usercopy.c  |2
  fs/read_write.c|   69 ++---
  fs/splice.c|  130 
+++--
  include/linux/fs.h |2
  include/linux/uaccess.h|   18 
  lib/test_bitmap.c  |   10 ++
  46 files changed, 235 insertions(+), 332 deletions(-)



Re: remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-17 Thread Christoph Hellwig
Adding Linus as I forgot to add him to the patch bomb, sorry..

On Mon, Aug 17, 2020 at 09:32:01AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.
> 
> The file system part has been posted a few times, and the read/write side
> has been pretty much unchanced.  For splice this series drops the
> conversion of the seq_file and sysctl code to the iter ops, and thus loses
> the splice support for them.  The reasons for that is that it caused a lot
> of churn for not much use - splice for these small files really isn't much
> of a win, even if existing userspace uses it.  All callers I found do the
> proper fallback, but if this turns out to be an issue the conversion can
> be resurrected.
> 
> Besides x86 and powerpc I plan to eventually convert all other
> architectures, although this will be a slow process, starting with the
> easier ones once the infrastructure is merged.  The process to convert
> architectures is roughtly:
> 
>  - ensure there is no set_fs(KERNEL_DS) left in arch specific code
>  - implement __get_kernel_nofault and __put_kernel_nofault
>  - remove the arch specific address limitation functionality
> 
> Diffstat:
>  arch/Kconfig   |3 
>  arch/alpha/Kconfig |1 
>  arch/arc/Kconfig   |1 
>  arch/arm/Kconfig   |1 
>  arch/arm64/Kconfig |1 
>  arch/c6x/Kconfig   |1 
>  arch/csky/Kconfig  |1 
>  arch/h8300/Kconfig |1 
>  arch/hexagon/Kconfig   |1 
>  arch/ia64/Kconfig  |1 
>  arch/m68k/Kconfig  |1 
>  arch/microblaze/Kconfig|1 
>  arch/mips/Kconfig  |1 
>  arch/nds32/Kconfig |1 
>  arch/nios2/Kconfig |1 
>  arch/openrisc/Kconfig  |1 
>  arch/parisc/Kconfig|1 
>  arch/powerpc/include/asm/processor.h   |7 -
>  arch/powerpc/include/asm/thread_info.h |5 -
>  arch/powerpc/include/asm/uaccess.h |   78 ---
>  arch/powerpc/kernel/signal.c   |3 
>  arch/powerpc/lib/sstep.c   |6 -
>  arch/riscv/Kconfig |1 
>  arch/s390/Kconfig  |1 
>  arch/sh/Kconfig|1 
>  arch/sparc/Kconfig |1 
>  arch/um/Kconfig|1 
>  arch/x86/ia32/ia32_aout.c  |1 
>  arch/x86/include/asm/page_32_types.h   |   11 ++
>  arch/x86/include/asm/page_64_types.h   |   38 +
>  arch/x86/include/asm/processor.h   |   60 ---
>  arch/x86/include/asm/thread_info.h |2 
>  arch/x86/include/asm/uaccess.h |   26 --
>  arch/x86/kernel/asm-offsets.c  |3 
>  arch/x86/lib/getuser.S |   28 ---
>  arch/x86/lib/putuser.S |   21 +++--
>  arch/xtensa/Kconfig|1 
>  drivers/char/mem.c |   16 
>  drivers/misc/lkdtm/bugs.c  |2 
>  drivers/misc/lkdtm/core.c  |4 +
>  drivers/misc/lkdtm/usercopy.c  |2 
>  fs/read_write.c|   69 ++---
>  fs/splice.c|  130 
> +++--
>  include/linux/fs.h |2 
>  include/linux/uaccess.h|   18 
>  lib/test_bitmap.c  |   10 ++
>  46 files changed, 235 insertions(+), 332 deletions(-)
---end quoted text---


remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-17 Thread Christoph Hellwig
Hi all,

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.

The file system part has been posted a few times, and the read/write side
has been pretty much unchanced.  For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them.  The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it.  All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.

Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged.  The process to convert
architectures is roughtly:

 - ensure there is no set_fs(KERNEL_DS) left in arch specific code
 - implement __get_kernel_nofault and __put_kernel_nofault
 - remove the arch specific address limitation functionality

Diffstat:
 arch/Kconfig   |3 
 arch/alpha/Kconfig |1 
 arch/arc/Kconfig   |1 
 arch/arm/Kconfig   |1 
 arch/arm64/Kconfig |1 
 arch/c6x/Kconfig   |1 
 arch/csky/Kconfig  |1 
 arch/h8300/Kconfig |1 
 arch/hexagon/Kconfig   |1 
 arch/ia64/Kconfig  |1 
 arch/m68k/Kconfig  |1 
 arch/microblaze/Kconfig|1 
 arch/mips/Kconfig  |1 
 arch/nds32/Kconfig |1 
 arch/nios2/Kconfig |1 
 arch/openrisc/Kconfig  |1 
 arch/parisc/Kconfig|1 
 arch/powerpc/include/asm/processor.h   |7 -
 arch/powerpc/include/asm/thread_info.h |5 -
 arch/powerpc/include/asm/uaccess.h |   78 ---
 arch/powerpc/kernel/signal.c   |3 
 arch/powerpc/lib/sstep.c   |6 -
 arch/riscv/Kconfig |1 
 arch/s390/Kconfig  |1 
 arch/sh/Kconfig|1 
 arch/sparc/Kconfig |1 
 arch/um/Kconfig|1 
 arch/x86/ia32/ia32_aout.c  |1 
 arch/x86/include/asm/page_32_types.h   |   11 ++
 arch/x86/include/asm/page_64_types.h   |   38 +
 arch/x86/include/asm/processor.h   |   60 ---
 arch/x86/include/asm/thread_info.h |2 
 arch/x86/include/asm/uaccess.h |   26 --
 arch/x86/kernel/asm-offsets.c  |3 
 arch/x86/lib/getuser.S |   28 ---
 arch/x86/lib/putuser.S |   21 +++--
 arch/xtensa/Kconfig|1 
 drivers/char/mem.c |   16 
 drivers/misc/lkdtm/bugs.c  |2 
 drivers/misc/lkdtm/core.c  |4 +
 drivers/misc/lkdtm/usercopy.c  |2 
 fs/read_write.c|   69 ++---
 fs/splice.c|  130 +++--
 include/linux/fs.h |2 
 include/linux/uaccess.h|   18 
 lib/test_bitmap.c  |   10 ++
 46 files changed, 235 insertions(+), 332 deletions(-)