Re: [git pull] uaccess-related bits of vfs.git

2017-05-14 Thread Al Viro
On Sun, May 14, 2017 at 08:13:56PM +0200, Ingo Molnar wrote:

> I'd say that the CLAC/STAC addition pretty much killed any argument in favor 
> of 
> "optimized" __get_user() code, so I'd be very happy to see these interfaces 
> gone 
> altogether.

You and everybody else - these interfaces suck.  If anything, we want paired
brackets around a series of accesses instead of a single check in front of it.

> So as far as x86 usage goes:
> 
>   Acked-by: Ingo Molnar 

Umm...  Could you elaborate the situation with xen/page.h stuff?  I don't
see any obvious reasons that would guaratee that addresses passed to
__get_user() and __put_user() there would match the set_fs() state.

It might very well be true, but it's not obvious from that code...

BTW, does anybody have a suggestion regarding a test load that would hit
wait4/waitid as hard as possible?  I've turned sys_wait4/sys_waitid into
long kernel_wait4(pid_t upid, int *stat_addr, int options, struct rusage *ru)
and
static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
int options, struct rusage *ru)
(with struct waitid_info {
pid_t pid;
uid_t uid;
int status;   
int why;
};), so that all copying to userland is done in sys_wait4() and friends.
It seems to survive testing without any noticable slowdowns, but that's
just LTP and xfstests - and a bug in my earlier version of that was _not_
caught by the LTP side; xfstests caught it...  So any extra tests (both
for correctness and timing) would be very much appreciated...


Re: [git pull] uaccess-related bits of vfs.git

2017-05-14 Thread Al Viro
On Sun, May 14, 2017 at 08:13:56PM +0200, Ingo Molnar wrote:

> I'd say that the CLAC/STAC addition pretty much killed any argument in favor 
> of 
> "optimized" __get_user() code, so I'd be very happy to see these interfaces 
> gone 
> altogether.

You and everybody else - these interfaces suck.  If anything, we want paired
brackets around a series of accesses instead of a single check in front of it.

> So as far as x86 usage goes:
> 
>   Acked-by: Ingo Molnar 

Umm...  Could you elaborate the situation with xen/page.h stuff?  I don't
see any obvious reasons that would guaratee that addresses passed to
__get_user() and __put_user() there would match the set_fs() state.

It might very well be true, but it's not obvious from that code...

BTW, does anybody have a suggestion regarding a test load that would hit
wait4/waitid as hard as possible?  I've turned sys_wait4/sys_waitid into
long kernel_wait4(pid_t upid, int *stat_addr, int options, struct rusage *ru)
and
static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
int options, struct rusage *ru)
(with struct waitid_info {
pid_t pid;
uid_t uid;
int status;   
int why;
};), so that all copying to userland is done in sys_wait4() and friends.
It seems to survive testing without any noticable slowdowns, but that's
just LTP and xfstests - and a bug in my earlier version of that was _not_
caught by the LTP side; xfstests caught it...  So any extra tests (both
for correctness and timing) would be very much appreciated...


Re: [git pull] uaccess-related bits of vfs.git

2017-05-14 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Fri, May 12, 2017 at 11:57 PM, Al Viro  wrote:
> >
> > First, some stats: there's a thousand-odd callers of __get_user().  Out of
> > those, about 70% are in arch/, mostly in sigframe-related code.
> 
> Sure. And they can be trivially converted, and none of them should care at 
> all.
> 
> > IOW, we have
> > * most of users in arch/* (heavily dominated by signal-related code,
> > both loads and stores).  Those need careful massage; maybe unsafe-based
> > solution, maybe something else, but it's obviously per-architecture work
> > and these paths are sensitive.
> 
> Why are they sensitive?
> 
> Why not just do this:
> 
>   git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86
> :^arch/x86/include/asm/uaccess.h
> | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g'
> 
> which converts all the x86 uses in one go.
> 
> Anybody who *relies* on not checking the address_limit is so broken as
> to be not even funny. And anything that is so performance-sensitive
> that anybody can even measure the effect of the above we can convert
> later.

I'd say that the CLAC/STAC addition pretty much killed any argument in favor of 
"optimized" __get_user() code, so I'd be very happy to see these interfaces 
gone 
altogether.

So as far as x86 usage goes:

  Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [git pull] uaccess-related bits of vfs.git

2017-05-14 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Fri, May 12, 2017 at 11:57 PM, Al Viro  wrote:
> >
> > First, some stats: there's a thousand-odd callers of __get_user().  Out of
> > those, about 70% are in arch/, mostly in sigframe-related code.
> 
> Sure. And they can be trivially converted, and none of them should care at 
> all.
> 
> > IOW, we have
> > * most of users in arch/* (heavily dominated by signal-related code,
> > both loads and stores).  Those need careful massage; maybe unsafe-based
> > solution, maybe something else, but it's obviously per-architecture work
> > and these paths are sensitive.
> 
> Why are they sensitive?
> 
> Why not just do this:
> 
>   git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86
> :^arch/x86/include/asm/uaccess.h
> | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g'
> 
> which converts all the x86 uses in one go.
> 
> Anybody who *relies* on not checking the address_limit is so broken as
> to be not even funny. And anything that is so performance-sensitive
> that anybody can even measure the effect of the above we can convert
> later.

I'd say that the CLAC/STAC addition pretty much killed any argument in favor of 
"optimized" __get_user() code, so I'd be very happy to see these interfaces 
gone 
altogether.

So as far as x86 usage goes:

  Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 01:52:29PM -0700, Linus Torvalds wrote:
> I wouldn't change the existing "memdup_user()" interface itself, but
> if there really are users that can validly pass in a maxbyte value,
> why not add a new helper:
> 
>   void *memdup_user_limit(userptr, nmember, nsize, maxsize);
> 
> and then have
> 
>   #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1)
> 
> or something. I definitely see a couple of memdup_user() people who do
> that "num*size" multiplication by hand, and it's very easy to get
> wrong and have an overflow.
> 
> And for a kvmalloc/kvfree() interface, you *definitely* want that
> maxsize thing, since it absolutely needs an upper limit.

*nod*

Speaking of insanities around open-coded memdup_user()...  Enjoy:

ias_opt = kmalloc(sizeof(struct irda_ias_set), GFP_ATOMIC);
if (ias_opt == NULL) {
err = -ENOMEM;
goto out;
}

/* Copy query to the driver. */
if (copy_from_user(ias_opt, optval, optlen)) {

Can't have it block, sir, has to be GFP_ATOMIC...  Whaddya mean, "what
if copy_from_user() blocks?"  As far as I can see, that came in circa
2.4.6, when local sturct irda_ias_set got switched to dynamic allocation...


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 01:52:29PM -0700, Linus Torvalds wrote:
> I wouldn't change the existing "memdup_user()" interface itself, but
> if there really are users that can validly pass in a maxbyte value,
> why not add a new helper:
> 
>   void *memdup_user_limit(userptr, nmember, nsize, maxsize);
> 
> and then have
> 
>   #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1)
> 
> or something. I definitely see a couple of memdup_user() people who do
> that "num*size" multiplication by hand, and it's very easy to get
> wrong and have an overflow.
> 
> And for a kvmalloc/kvfree() interface, you *definitely* want that
> maxsize thing, since it absolutely needs an upper limit.

*nod*

Speaking of insanities around open-coded memdup_user()...  Enjoy:

ias_opt = kmalloc(sizeof(struct irda_ias_set), GFP_ATOMIC);
if (ias_opt == NULL) {
err = -ENOMEM;
goto out;
}

/* Copy query to the driver. */
if (copy_from_user(ias_opt, optval, optlen)) {

Can't have it block, sir, has to be GFP_ATOMIC...  Whaddya mean, "what
if copy_from_user() blocks?"  As far as I can see, that came in circa
2.4.6, when local sturct irda_ias_set got switched to dynamic allocation...


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
On Sat, May 13, 2017 at 1:37 PM, Al Viro  wrote:
>
> That's a valid point and it might apply to memdup_user() callers out there.
> Potential variants:
> * add an explicit upper bound on the size and turn that into
> memdup_user() (and check that all memdup_user() callers are bounded).
> * have memdup_user() itself pass __GFP_NOWARN.
> * add kvmemdup_user() that would use kvmalloc() (with its callers
> expected to use kvfree()); see who else might benefit from conversion.

All of the above sound reasonable.

I wouldn't change the existing "memdup_user()" interface itself, but
if there really are users that can validly pass in a maxbyte value,
why not add a new helper:

  void *memdup_user_limit(userptr, nmember, nsize, maxsize);

and then have

  #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1)

or something. I definitely see a couple of memdup_user() people who do
that "num*size" multiplication by hand, and it's very easy to get
wrong and have an overflow.

And for a kvmalloc/kvfree() interface, you *definitely* want that
maxsize thing, since it absolutely needs an upper limit.

 Linus


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
On Sat, May 13, 2017 at 1:37 PM, Al Viro  wrote:
>
> That's a valid point and it might apply to memdup_user() callers out there.
> Potential variants:
> * add an explicit upper bound on the size and turn that into
> memdup_user() (and check that all memdup_user() callers are bounded).
> * have memdup_user() itself pass __GFP_NOWARN.
> * add kvmemdup_user() that would use kvmalloc() (with its callers
> expected to use kvfree()); see who else might benefit from conversion.

All of the above sound reasonable.

I wouldn't change the existing "memdup_user()" interface itself, but
if there really are users that can validly pass in a maxbyte value,
why not add a new helper:

  void *memdup_user_limit(userptr, nmember, nsize, maxsize);

and then have

  #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1)

or something. I definitely see a couple of memdup_user() people who do
that "num*size" multiplication by hand, and it's very easy to get
wrong and have an overflow.

And for a kvmalloc/kvfree() interface, you *definitely* want that
maxsize thing, since it absolutely needs an upper limit.

 Linus


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 10:32:31PM +0200, Geert Uytterhoeven wrote:

> You better run that one through linux-spi, to avoid conflicts, cfr.
> https://patchwork.kernel.org/patch/9714993/

What I'm going to do is never-rebased #for-spi (well, never after -rc1)
merged into #work.uaccess and proposed for pull to linux-spi.

The local tree is a mess right now - bloody lot of branches, huge unsorted
pile, etc.  This was from #unsorted, actually - should've picked the one
from #for-spi (a couple of brainos fixed in the version there)...


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 10:32:31PM +0200, Geert Uytterhoeven wrote:

> You better run that one through linux-spi, to avoid conflicts, cfr.
> https://patchwork.kernel.org/patch/9714993/

What I'm going to do is never-rebased #for-spi (well, never after -rc1)
merged into #work.uaccess and proposed for pull to linux-spi.

The local tree is a mess right now - bloody lot of branches, huge unsorted
pile, etc.  This was from #unsorted, actually - should've picked the one
from #for-spi (a couple of brainos fixed in the version there)...


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote:
> From: Linus Torvalds 
> Date: Tue, 24 Mar 2015 10:42:18 -0700
> >
> > So I'd suggest we should just do a wholesale replacement of
> > __copy_to/from_user() with the non-underlined cases. Then, we could
> > switch insividual ones back - with reasoning of why they matter, and
> > with pointers to how it does access_ok() two lines before.
> >
> > We should probably even consider looking at __get_user/__put_user().
> > Few of them are actually performance-critical.
> 
> Look at that date. It's over two years ago. In the intervening two
> years, how many of those conversions have happened?

Speaking of killing that kind of crap off: there was a question left from the
last cycle that hadn't been sorted out.

SCTP does this in a couple of places:
/* Check the user passed a healthy pointer.  */
if (unlikely(!access_ok(VERIFY_READ, addrs, addrs_size)))
return -EFAULT;

/* Alloc space for the address array in kernel memory.  */
kaddrs = kmalloc(addrs_size, GFP_USER | __GFP_NOWARN);
if (unlikely(!kaddrs))
return -ENOMEM;

if (__copy_from_user(kaddrs, addrs, addrs_size)) {
kfree(kaddrs);
return -EFAULT;
}
instead of memdup_user().  Part of the rationale is pretty weak (access_ok()
as sanity check to prevent user-triggerable attempts to allocate too much -
it still can trivially trigger 2G, so it's not worth much), part is more
interesting.  Namely, that whining into the syslog shouldn't be that easy
to trigger.

That's a valid point and it might apply to memdup_user() callers out there.
Potential variants:
* add an explicit upper bound on the size and turn that into
memdup_user() (and check that all memdup_user() callers are bounded).
* have memdup_user() itself pass __GFP_NOWARN.
* add kvmemdup_user() that would use kvmalloc() (with its callers
expected to use kvfree()); see who else might benefit from conversion.

Preferences?


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote:
> From: Linus Torvalds 
> Date: Tue, 24 Mar 2015 10:42:18 -0700
> >
> > So I'd suggest we should just do a wholesale replacement of
> > __copy_to/from_user() with the non-underlined cases. Then, we could
> > switch insividual ones back - with reasoning of why they matter, and
> > with pointers to how it does access_ok() two lines before.
> >
> > We should probably even consider looking at __get_user/__put_user().
> > Few of them are actually performance-critical.
> 
> Look at that date. It's over two years ago. In the intervening two
> years, how many of those conversions have happened?

Speaking of killing that kind of crap off: there was a question left from the
last cycle that hadn't been sorted out.

SCTP does this in a couple of places:
/* Check the user passed a healthy pointer.  */
if (unlikely(!access_ok(VERIFY_READ, addrs, addrs_size)))
return -EFAULT;

/* Alloc space for the address array in kernel memory.  */
kaddrs = kmalloc(addrs_size, GFP_USER | __GFP_NOWARN);
if (unlikely(!kaddrs))
return -ENOMEM;

if (__copy_from_user(kaddrs, addrs, addrs_size)) {
kfree(kaddrs);
return -EFAULT;
}
instead of memdup_user().  Part of the rationale is pretty weak (access_ok()
as sanity check to prevent user-triggerable attempts to allocate too much -
it still can trivially trigger 2G, so it's not worth much), part is more
interesting.  Namely, that whining into the syslog shouldn't be that easy
to trigger.

That's a valid point and it might apply to memdup_user() callers out there.
Potential variants:
* add an explicit upper bound on the size and turn that into
memdup_user() (and check that all memdup_user() callers are bounded).
* have memdup_user() itself pass __GFP_NOWARN.
* add kvmemdup_user() that would use kvmalloc() (with its callers
expected to use kvfree()); see who else might benefit from conversion.

Preferences?


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Geert Uytterhoeven
Hi Al,

On Sat, May 13, 2017 at 10:08 PM, Al Viro  wrote:
> On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote:
>
>> FWIW, just this cycle (this one I remembered off-hand, there might be
>> more):
>
> And looking through my queue (will be pushed to -next as soon as -rc1 goes
> out):
>
> commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09
> Author: Al Viro 
> Date:   Thu Apr 20 15:47:34 2017 -0400
>
> spidev: quit messing with access_ok()
>
> Signed-off-by: Al Viro 
>
> diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
> index 9e2e099baf8c..8dd22de5e3b5 100644
> --- a/drivers/spi/spidev.c
> +++ b/drivers/spi/spidev.c

You better run that one through linux-spi, to avoid conflicts, cfr.
https://patchwork.kernel.org/patch/9714993/

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Geert Uytterhoeven
Hi Al,

On Sat, May 13, 2017 at 10:08 PM, Al Viro  wrote:
> On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote:
>
>> FWIW, just this cycle (this one I remembered off-hand, there might be
>> more):
>
> And looking through my queue (will be pushed to -next as soon as -rc1 goes
> out):
>
> commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09
> Author: Al Viro 
> Date:   Thu Apr 20 15:47:34 2017 -0400
>
> spidev: quit messing with access_ok()
>
> Signed-off-by: Al Viro 
>
> diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
> index 9e2e099baf8c..8dd22de5e3b5 100644
> --- a/drivers/spi/spidev.c
> +++ b/drivers/spi/spidev.c

You better run that one through linux-spi, to avoid conflicts, cfr.
https://patchwork.kernel.org/patch/9714993/

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote:

> FWIW, just this cycle (this one I remembered off-hand, there might be
> more):

And looking through my queue (will be pushed to -next as soon as -rc1 goes
out):

commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09
Author: Al Viro 
Date:   Thu Apr 20 15:47:34 2017 -0400

spidev: quit messing with access_ok()

Signed-off-by: Al Viro 

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 9e2e099baf8c..8dd22de5e3b5 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -254,10 +254,6 @@ static int spidev_message(struct spidev_data *spidev,
goto done;
}
k_tmp->rx_buf = rx_buf;
-   if (!access_ok(VERIFY_WRITE, (u8 __user *)
-   (uintptr_t) u_tmp->rx_buf,
-   u_tmp->len))
-   goto done;
rx_buf += k_tmp->len;
}
if (u_tmp->tx_buf) {
@@ -305,7 +301,7 @@ static int spidev_message(struct spidev_data *spidev,
rx_buf = spidev->rx_buffer;
for (n = n_xfers, u_tmp = u_xfers; n; n--, u_tmp++) {
if (u_tmp->rx_buf) {
-   if (__copy_to_user((u8 __user *)
+   if (copy_to_user((u8 __user *)
(uintptr_t) u_tmp->rx_buf, rx_buf,
u_tmp->len)) {
status = -EFAULT;
@@ -325,8 +321,7 @@ static struct spi_ioc_transfer *
 spidev_get_ioc_message(unsigned int cmd, struct spi_ioc_transfer __user *u_ioc,
unsigned *n_ioc)
 {
-   struct spi_ioc_transfer *ioc;
-   u32 tmp;
+   u32 size;
 
/* Check type, command number and direction */
if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC
@@ -334,22 +329,15 @@ spidev_get_ioc_message(unsigned int cmd, struct 
spi_ioc_transfer __user *u_ioc,
|| _IOC_DIR(cmd) != _IOC_WRITE)
return ERR_PTR(-ENOTTY);
 
-   tmp = _IOC_SIZE(cmd);
+   size = _IOC_SIZE(cmd);
if ((tmp % sizeof(struct spi_ioc_transfer)) != 0)
return ERR_PTR(-EINVAL);
-   *n_ioc = tmp / sizeof(struct spi_ioc_transfer);
+   *n_ioc = size / sizeof(struct spi_ioc_transfer);
if (*n_ioc == 0)
return NULL;
 
/* copy into scratch area */
-   ioc = kmalloc(tmp, GFP_KERNEL);
-   if (!ioc)
-   return ERR_PTR(-ENOMEM);
-   if (__copy_from_user(ioc, u_ioc, tmp)) {
-   kfree(ioc);
-   return ERR_PTR(-EFAULT);
-   }
-   return ioc;
+   return memdup_user(u_ioc, size);
 }
 
 static long
@@ -367,19 +355,6 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned 
long arg)
if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC)
return -ENOTTY;
 
-   /* Check access direction once here; don't repeat below.
-* IOC_DIR is from the user perspective, while access_ok is
-* from the kernel perspective; so they look reversed.
-*/
-   if (_IOC_DIR(cmd) & _IOC_READ)
-   err = !access_ok(VERIFY_WRITE,
-   (void __user *)arg, _IOC_SIZE(cmd));
-   if (err == 0 && _IOC_DIR(cmd) & _IOC_WRITE)
-   err = !access_ok(VERIFY_READ,
-   (void __user *)arg, _IOC_SIZE(cmd));
-   if (err)
-   return -EFAULT;
-
/* guard against device removal before, or while,
 * we issue this ioctl.
 */
@@ -402,31 +377,31 @@ spidev_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
switch (cmd) {
/* read requests */
case SPI_IOC_RD_MODE:
-   retval = __put_user(spi->mode & SPI_MODE_MASK,
+   retval = put_user(spi->mode & SPI_MODE_MASK,
(__u8 __user *)arg);
break;
case SPI_IOC_RD_MODE32:
-   retval = __put_user(spi->mode & SPI_MODE_MASK,
+   retval = put_user(spi->mode & SPI_MODE_MASK,
(__u32 __user *)arg);
break;
case SPI_IOC_RD_LSB_FIRST:
-   retval = __put_user((spi->mode & SPI_LSB_FIRST) ?  1 : 0,
+   retval = put_user((spi->mode & SPI_LSB_FIRST) ?  1 : 0,
(__u8 __user *)arg);
break;
case SPI_IOC_RD_BITS_PER_WORD:
-   retval = __put_user(spi->bits_per_word, (__u8 __user *)arg);
+   retval = put_user(spi->bits_per_word, (__u8 __user *)arg);
break;
case SPI_IOC_RD_MAX_SPEED_HZ:
-   retval = __put_user(spidev->speed_hz, (__u32 __user *)arg);
+   retval = put_user(spidev->speed_hz, (__u32 

Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote:

> FWIW, just this cycle (this one I remembered off-hand, there might be
> more):

And looking through my queue (will be pushed to -next as soon as -rc1 goes
out):

commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09
Author: Al Viro 
Date:   Thu Apr 20 15:47:34 2017 -0400

spidev: quit messing with access_ok()

Signed-off-by: Al Viro 

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 9e2e099baf8c..8dd22de5e3b5 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -254,10 +254,6 @@ static int spidev_message(struct spidev_data *spidev,
goto done;
}
k_tmp->rx_buf = rx_buf;
-   if (!access_ok(VERIFY_WRITE, (u8 __user *)
-   (uintptr_t) u_tmp->rx_buf,
-   u_tmp->len))
-   goto done;
rx_buf += k_tmp->len;
}
if (u_tmp->tx_buf) {
@@ -305,7 +301,7 @@ static int spidev_message(struct spidev_data *spidev,
rx_buf = spidev->rx_buffer;
for (n = n_xfers, u_tmp = u_xfers; n; n--, u_tmp++) {
if (u_tmp->rx_buf) {
-   if (__copy_to_user((u8 __user *)
+   if (copy_to_user((u8 __user *)
(uintptr_t) u_tmp->rx_buf, rx_buf,
u_tmp->len)) {
status = -EFAULT;
@@ -325,8 +321,7 @@ static struct spi_ioc_transfer *
 spidev_get_ioc_message(unsigned int cmd, struct spi_ioc_transfer __user *u_ioc,
unsigned *n_ioc)
 {
-   struct spi_ioc_transfer *ioc;
-   u32 tmp;
+   u32 size;
 
/* Check type, command number and direction */
if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC
@@ -334,22 +329,15 @@ spidev_get_ioc_message(unsigned int cmd, struct 
spi_ioc_transfer __user *u_ioc,
|| _IOC_DIR(cmd) != _IOC_WRITE)
return ERR_PTR(-ENOTTY);
 
-   tmp = _IOC_SIZE(cmd);
+   size = _IOC_SIZE(cmd);
if ((tmp % sizeof(struct spi_ioc_transfer)) != 0)
return ERR_PTR(-EINVAL);
-   *n_ioc = tmp / sizeof(struct spi_ioc_transfer);
+   *n_ioc = size / sizeof(struct spi_ioc_transfer);
if (*n_ioc == 0)
return NULL;
 
/* copy into scratch area */
-   ioc = kmalloc(tmp, GFP_KERNEL);
-   if (!ioc)
-   return ERR_PTR(-ENOMEM);
-   if (__copy_from_user(ioc, u_ioc, tmp)) {
-   kfree(ioc);
-   return ERR_PTR(-EFAULT);
-   }
-   return ioc;
+   return memdup_user(u_ioc, size);
 }
 
 static long
@@ -367,19 +355,6 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned 
long arg)
if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC)
return -ENOTTY;
 
-   /* Check access direction once here; don't repeat below.
-* IOC_DIR is from the user perspective, while access_ok is
-* from the kernel perspective; so they look reversed.
-*/
-   if (_IOC_DIR(cmd) & _IOC_READ)
-   err = !access_ok(VERIFY_WRITE,
-   (void __user *)arg, _IOC_SIZE(cmd));
-   if (err == 0 && _IOC_DIR(cmd) & _IOC_WRITE)
-   err = !access_ok(VERIFY_READ,
-   (void __user *)arg, _IOC_SIZE(cmd));
-   if (err)
-   return -EFAULT;
-
/* guard against device removal before, or while,
 * we issue this ioctl.
 */
@@ -402,31 +377,31 @@ spidev_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
switch (cmd) {
/* read requests */
case SPI_IOC_RD_MODE:
-   retval = __put_user(spi->mode & SPI_MODE_MASK,
+   retval = put_user(spi->mode & SPI_MODE_MASK,
(__u8 __user *)arg);
break;
case SPI_IOC_RD_MODE32:
-   retval = __put_user(spi->mode & SPI_MODE_MASK,
+   retval = put_user(spi->mode & SPI_MODE_MASK,
(__u32 __user *)arg);
break;
case SPI_IOC_RD_LSB_FIRST:
-   retval = __put_user((spi->mode & SPI_LSB_FIRST) ?  1 : 0,
+   retval = put_user((spi->mode & SPI_LSB_FIRST) ?  1 : 0,
(__u8 __user *)arg);
break;
case SPI_IOC_RD_BITS_PER_WORD:
-   retval = __put_user(spi->bits_per_word, (__u8 __user *)arg);
+   retval = put_user(spi->bits_per_word, (__u8 __user *)arg);
break;
case SPI_IOC_RD_MAX_SPEED_HZ:
-   retval = __put_user(spidev->speed_hz, (__u32 __user *)arg);
+   retval = put_user(spidev->speed_hz, (__u32 __user *)arg);
break;
 
/* 

Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote:
> > We should probably even consider looking at __get_user/__put_user().
> > Few of them are actually performance-critical.
> 
> Look at that date. It's over two years ago. In the intervening two
> years, how many of those conversions have happened?
> 
> Here's a hint: it's a very very round number.

FWIW, just this cycle (this one I remembered off-hand, there might be
more):

commit edb88cef0570914375d461107759cf0d6d677ed5
Author: Arnd Bergmann 
Date:   Sat Apr 22 00:02:31 2017 +0200

scsi: pmcraid: use normal copy_from_user

As pointed out by Al Viro for my previous series, the driver has no need
to call access_ok() and __copy_from_user()/__copy_to_user(). Changing
it to regular copy_from_user()/copy_to_user() simplifies the code without
any real downsides, making it less error-prone at best.

This patch by itself also addresses the warning about the access_ok()
macro on MIPS, but both fixes improve the code, so ideally we apply
them both.

Signed-off-by: Arnd Bergmann 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Martin K. Petersen 



Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote:
> > We should probably even consider looking at __get_user/__put_user().
> > Few of them are actually performance-critical.
> 
> Look at that date. It's over two years ago. In the intervening two
> years, how many of those conversions have happened?
> 
> Here's a hint: it's a very very round number.

FWIW, just this cycle (this one I remembered off-hand, there might be
more):

commit edb88cef0570914375d461107759cf0d6d677ed5
Author: Arnd Bergmann 
Date:   Sat Apr 22 00:02:31 2017 +0200

scsi: pmcraid: use normal copy_from_user

As pointed out by Al Viro for my previous series, the driver has no need
to call access_ok() and __copy_from_user()/__copy_to_user(). Changing
it to regular copy_from_user()/copy_to_user() simplifies the code without
any real downsides, making it less error-prone at best.

This patch by itself also addresses the warning about the access_ok()
macro on MIPS, but both fixes improve the code, so ideally we apply
them both.

Signed-off-by: Arnd Bergmann 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Martin K. Petersen 



Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 08:11:42PM +0100, Al Viro wrote:
> It's not impossible to review; the problem is in figuring out which codepaths
> are hot enough to worry about them.  And I'm even more convinced that
> bulk search-and-replace is the right approach here; we probably _can_ get
^- not
apologies for sloppy editing...

> rid of that shit this cycle (or, at least, reduce its use to very few places
> in arch/*), but that'll require some cooperation from architecture 
> maintainers.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 08:11:42PM +0100, Al Viro wrote:
> It's not impossible to review; the problem is in figuring out which codepaths
> are hot enough to worry about them.  And I'm even more convinced that
> bulk search-and-replace is the right approach here; we probably _can_ get
^- not
apologies for sloppy editing...

> rid of that shit this cycle (or, at least, reduce its use to very few places
> in arch/*), but that'll require some cooperation from architecture 
> maintainers.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote:
> On Sat, May 13, 2017 at 11:04 AM, Al Viro  wrote:
> >
> >
> > My point is, this stuff needs looking at.
> 
> No.
> 
> You don't have a point. I've tried to explain that there's no
> performance difference, and there is no way in hell that the current
> "__" versions are better.
> 
> So what's the point in looking? All you are ever going to come up with
> is theoretical "this might" cases.

Umm...  Just during this subthread - a couple of likely breakages in
xen (which would need looking into, right?) and "oopsen on alpha would
stop actually printing code", which is better to spot early.  The first
one might be theoretical, but it's worth giving heads-up to xen folks;
I'm fairly sure that this spot will break.  The second is not theoretical
at all - it *will* break.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote:
> On Sat, May 13, 2017 at 11:04 AM, Al Viro  wrote:
> >
> >
> > My point is, this stuff needs looking at.
> 
> No.
> 
> You don't have a point. I've tried to explain that there's no
> performance difference, and there is no way in hell that the current
> "__" versions are better.
> 
> So what's the point in looking? All you are ever going to come up with
> is theoretical "this might" cases.

Umm...  Just during this subthread - a couple of likely breakages in
xen (which would need looking into, right?) and "oopsen on alpha would
stop actually printing code", which is better to spot early.  The first
one might be theoretical, but it's worth giving heads-up to xen folks;
I'm fairly sure that this spot will break.  The second is not theoretical
at all - it *will* break.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 07:26:56PM +0100, Al Viro wrote:
> BTW, even in arch/* they tend to nest.  E.g. arch/alpha has 133 callers
> total.  Distribution by files:
>  35 arch/alpha/kernel/osf_sys.c
>  92 arch/alpha/kernel/signal.c
>   1 arch/alpha/kernel/traps.c
>   4 arch/alpha/lib/csum_partial_copy.c
>   1 arch/alpha/mm/fault.c

Another large pile is on sparc (208 callers).  Except that on sparc64
access_ok() is constant 1, which reduces it to 42 callers.
3 arch/sparc/kernel/ptrace_32.c (all in arch_ptrace())
   31 arch/sparc/kernel/signal_32.c (5 in do_sigreturn(),
 6 in do_rt_sigreturn(),
 8 in setup_frame(),
 11 in setup_rt_frame(),
 1 in do_sys_sigstack())
7 arch/sparc/kernel/sigutil_32.c (2 in save_fpu_state(),
  2 in restore_fpu_state(),
  2 in save_rwin_state(),
  1 in restore_rwin_state()
1 arch/sparc/mm/fault_32.c (in compute_si_addr())

The last one ignores -EFAULT, BTW - not sure what should it have done
in that case, though.  It's calculation of ->si_addr on SIGSEGV and SIGBUS
in case of data access SIGSEGV or SIGBUS; it wants to peek into insn to
figure out the effective address.  arch_ptrace() one is zeroing several
fields in userland struct fps for PTRACE_GETFPREGS.  Could've been
put_user() (or clear_user(), for that matter); we'd just done
copy_regset_to_user() on adjacent addresses, so the lack of access_ok() is not
a security hole there).  Everything else is in sigframe handling...

Other large piles are on mips, ppc and itanic.  parisc is next, but there
access_ok() is constant 1 as well.  Same for s390 and m68k.  nios2 and
unicore32 are a bit under 80 callers each and the next are tile (~60),
sh (~50) and then it drops off fast.

It's not impossible to review; the problem is in figuring out which codepaths
are hot enough to worry about them.  And I'm even more convinced that
bulk search-and-replace is the right approach here; we probably _can_ get
rid of that shit this cycle (or, at least, reduce its use to very few places
in arch/*), but that'll require some cooperation from architecture maintainers.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 07:26:56PM +0100, Al Viro wrote:
> BTW, even in arch/* they tend to nest.  E.g. arch/alpha has 133 callers
> total.  Distribution by files:
>  35 arch/alpha/kernel/osf_sys.c
>  92 arch/alpha/kernel/signal.c
>   1 arch/alpha/kernel/traps.c
>   4 arch/alpha/lib/csum_partial_copy.c
>   1 arch/alpha/mm/fault.c

Another large pile is on sparc (208 callers).  Except that on sparc64
access_ok() is constant 1, which reduces it to 42 callers.
3 arch/sparc/kernel/ptrace_32.c (all in arch_ptrace())
   31 arch/sparc/kernel/signal_32.c (5 in do_sigreturn(),
 6 in do_rt_sigreturn(),
 8 in setup_frame(),
 11 in setup_rt_frame(),
 1 in do_sys_sigstack())
7 arch/sparc/kernel/sigutil_32.c (2 in save_fpu_state(),
  2 in restore_fpu_state(),
  2 in save_rwin_state(),
  1 in restore_rwin_state()
1 arch/sparc/mm/fault_32.c (in compute_si_addr())

The last one ignores -EFAULT, BTW - not sure what should it have done
in that case, though.  It's calculation of ->si_addr on SIGSEGV and SIGBUS
in case of data access SIGSEGV or SIGBUS; it wants to peek into insn to
figure out the effective address.  arch_ptrace() one is zeroing several
fields in userland struct fps for PTRACE_GETFPREGS.  Could've been
put_user() (or clear_user(), for that matter); we'd just done
copy_regset_to_user() on adjacent addresses, so the lack of access_ok() is not
a security hole there).  Everything else is in sigframe handling...

Other large piles are on mips, ppc and itanic.  parisc is next, but there
access_ok() is constant 1 as well.  Same for s390 and m68k.  nios2 and
unicore32 are a bit under 80 callers each and the next are tile (~60),
sh (~50) and then it drops off fast.

It's not impossible to review; the problem is in figuring out which codepaths
are hot enough to worry about them.  And I'm even more convinced that
bulk search-and-replace is the right approach here; we probably _can_ get
rid of that shit this cycle (or, at least, reduce its use to very few places
in arch/*), but that'll require some cooperation from architecture maintainers.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
On Sat, May 13, 2017 at 11:04 AM, Al Viro  wrote:
>
>
> My point is, this stuff needs looking at.

No.

You don't have a point. I've tried to explain that there's no
performance difference, and there is no way in hell that the current
"__" versions are better.

So what's the point in looking? All you are ever going to come up with
is theoretical "this might" cases.

The only sane forward is to just get rid of them. At that point, you
*may* end up actually noticing something, but if you do, it's not a
"you might" issue.

There's literally no upside to this "needs looking at". There are only
downsides.

And one major downside of your "needs looking at". is this one:

From: Linus Torvalds 
Date: Tue, 24 Mar 2015 10:42:18 -0700
>
> So I'd suggest we should just do a wholesale replacement of
> __copy_to/from_user() with the non-underlined cases. Then, we could
> switch insividual ones back - with reasoning of why they matter, and
> with pointers to how it does access_ok() two lines before.
>
> We should probably even consider looking at __get_user/__put_user().
> Few of them are actually performance-critical.

Look at that date. It's over two years ago. In the intervening two
years, how many of those conversions have happened?

Here's a hint: it's a very very round number.

 Linus


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
On Sat, May 13, 2017 at 11:04 AM, Al Viro  wrote:
>
>
> My point is, this stuff needs looking at.

No.

You don't have a point. I've tried to explain that there's no
performance difference, and there is no way in hell that the current
"__" versions are better.

So what's the point in looking? All you are ever going to come up with
is theoretical "this might" cases.

The only sane forward is to just get rid of them. At that point, you
*may* end up actually noticing something, but if you do, it's not a
"you might" issue.

There's literally no upside to this "needs looking at". There are only
downsides.

And one major downside of your "needs looking at". is this one:

From: Linus Torvalds 
Date: Tue, 24 Mar 2015 10:42:18 -0700
>
> So I'd suggest we should just do a wholesale replacement of
> __copy_to/from_user() with the non-underlined cases. Then, we could
> switch insividual ones back - with reasoning of why they matter, and
> with pointers to how it does access_ok() two lines before.
>
> We should probably even consider looking at __get_user/__put_user().
> Few of them are actually performance-critical.

Look at that date. It's over two years ago. In the intervening two
years, how many of those conversions have happened?

Here's a hint: it's a very very round number.

 Linus


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 07:04:13PM +0100, Al Viro wrote:

> My point is, this stuff needs looking at.  Even this quick look in arch/x86
> has shown several fairly different classes of that stuff, probably needing
> different approaches.  And that - on an architecture that had tons of TLC
> around signal delivery; I'm not saying that result is optimal (asm-goto sounds
> potentially useful there), but it had a lot of attention given to it...

BTW, even in arch/* they tend to nest.  E.g. arch/alpha has 133 callers
total.  Distribution by files:
 35 arch/alpha/kernel/osf_sys.c
 92 arch/alpha/kernel/signal.c
  1 arch/alpha/kernel/traps.c
  4 arch/alpha/lib/csum_partial_copy.c
  1 arch/alpha/mm/fault.c
Distribution by functions:
  1 osf_getdomainname() [1]
  2 osf_sigstack()
  2 get_tv32()
  2 put_tv32()
  4 get_it32()
  4 put_it32()
  2 osf_select()
 18 osf_wait4() [2]
  6 osf_sigaction()
 34 restore_sigcontext()
  1 do_sigreturn()
 42 setup_sigcontext()
  3 setup_frame()
  6 setup_rt_frame()
  1 dik_show_code() [3]
  2 csum_partial_cfu_aligned()
  2 csum_partial_cfu_src_aligned()
  1 do_page_fault() [4]

[1] insane, BTW - should be strnlen() + copy_to_user(); should report -EFAULT
on failure, while we are at it.
[2] with fairly disgusting use of set_fs() in the mix.
[3] would break with get_user() - it's oopser fetching code to printk.
[4] this:
/* As of EV6, a load into $31/$f31 is a prefetch, and never faults
   (or is suppressed by the PALcode).  Support that for older CPUs
   by ignoring such an instruction.  */
if (cause == 0) {
unsigned int insn;
__get_user(insn, (unsigned int __user *)regs->pc);
if ((insn >> 21 & 0x1f) == 0x1f &&
/* ldq ldl ldt lds ldg ldf ldwu ldbu */
(1ul << (insn >> 26) & 0x30f1400ul)) {
regs->pc += 4;
return;
}
}


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 07:04:13PM +0100, Al Viro wrote:

> My point is, this stuff needs looking at.  Even this quick look in arch/x86
> has shown several fairly different classes of that stuff, probably needing
> different approaches.  And that - on an architecture that had tons of TLC
> around signal delivery; I'm not saying that result is optimal (asm-goto sounds
> potentially useful there), but it had a lot of attention given to it...

BTW, even in arch/* they tend to nest.  E.g. arch/alpha has 133 callers
total.  Distribution by files:
 35 arch/alpha/kernel/osf_sys.c
 92 arch/alpha/kernel/signal.c
  1 arch/alpha/kernel/traps.c
  4 arch/alpha/lib/csum_partial_copy.c
  1 arch/alpha/mm/fault.c
Distribution by functions:
  1 osf_getdomainname() [1]
  2 osf_sigstack()
  2 get_tv32()
  2 put_tv32()
  4 get_it32()
  4 put_it32()
  2 osf_select()
 18 osf_wait4() [2]
  6 osf_sigaction()
 34 restore_sigcontext()
  1 do_sigreturn()
 42 setup_sigcontext()
  3 setup_frame()
  6 setup_rt_frame()
  1 dik_show_code() [3]
  2 csum_partial_cfu_aligned()
  2 csum_partial_cfu_src_aligned()
  1 do_page_fault() [4]

[1] insane, BTW - should be strnlen() + copy_to_user(); should report -EFAULT
on failure, while we are at it.
[2] with fairly disgusting use of set_fs() in the mix.
[3] would break with get_user() - it's oopser fetching code to printk.
[4] this:
/* As of EV6, a load into $31/$f31 is a prefetch, and never faults
   (or is suppressed by the PALcode).  Support that for older CPUs
   by ignoring such an instruction.  */
if (cause == 0) {
unsigned int insn;
__get_user(insn, (unsigned int __user *)regs->pc);
if ((insn >> 21 & 0x1f) == 0x1f &&
/* ldq ldl ldt lds ldg ldf ldwu ldbu */
(1ul << (insn >> 26) & 0x30f1400ul)) {
regs->pc += 4;
return;
}
}


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 10:18:22AM -0700, Linus Torvalds wrote:

> > x86 actually tends to use get_user_ex/put_user_ex instead.
> 
> Yes. Because anybody who *really* cared about performance will have
> converted already. The real cost has been stac/clac for a few years
> now.

On x86.  Which has (not counting arch/x86/um, which definitely needs
a careful look - there __..._user() is the same as ..._user() and costly
as hell) all of 12 callers of __get_user() and 31 callers of __put_user().
More than a half of the latter - in cp_stat64() and I would at least try
and see if "convert on stack, then copy_to_user" is better for that one.
Other than that, all we have is:

arch/x86/entry/common.c:372:__get_user(*(u32 *)>bp,
arch/x86/ia32/ia32_signal.c:124:if (__get_user(set.sig[0], 
>sc.oldmask)
arch/x86/ia32/ia32_signal.c:275:if (__put_user(sig, >sig))
arch/x86/include/asm/xen/page.h:86: return __put_user(val, (unsigned long 
__user *)addr);
arch/x86/include/asm/xen/page.h:91: return __get_user(*val, (unsigned long 
__user *)addr);
arch/x86/kernel/fpu/signal.c:100:   err |= __get_user(xfeatures, (__u32 
*)>header.xfeatures);
arch/x86/kernel/fpu/signal.c:115:   err |= __put_user(xfeatures, (__u32 
*)>header.xfeatures);
arch/x86/kernel/fpu/signal.c:46:if (__get_user(magic2, (__u32 __user 
*)(fpstate + fx_sw->xstate_size))
arch/x86/kernel/fpu/signal.c:66:__put_user(xsave->i387.swd, 
>status) ||
arch/x86/kernel/fpu/signal.c:67:__put_user(X86_FXSR_MAGIC, 
>magic))
arch/x86/kernel/fpu/signal.c:72:if (__get_user(swd, >swd) 
|| __put_user(swd, >status))
arch/x86/kernel/fpu/signal.c:72:if (__get_user(swd, >swd) 
|| __put_user(swd, >status))
arch/x86/kernel/fpu/signal.c:93:err |= __put_user(FP_XSTATE_MAGIC2,
arch/x86/kernel/ptrace.c:1043:  if (__put_user(word, u++))
arch/x86/kernel/ptrace.c:1070:  ret = __get_user(word, u++);
arch/x86/kernel/ptrace.c:472:   if (__put_user(getreg(target, 
pos), u++))
arch/x86/kernel/ptrace.c:499:   ret = __get_user(word, u++);
arch/x86/kernel/signal.c:326:   if (__put_user(sig, >sig))
arch/x86/kernel/signal.c:347:   err |= __put_user(restorer, >pretcode);
arch/x86/kernel/signal.c:356:   err |= __put_user(*((u64 *)), (u64 
*)frame->retcode);
arch/x86/kernel/signal.c:613:   if (__get_user(set.sig[0], >sc.oldmask) 
|| (_NSIG_WORDS > 1
arch/x86/kernel/signal.c:647:   if (__get_user(uc_flags, >uc.uc_flags))
arch/x86/kernel/signal.c:870:   if (__get_user(uc_flags, >uc.uc_flags))
arch/x86/lib/csum-wrappers_64.c:103:*errp = 
__put_user(val16, (__u16 __user *)dst);
arch/x86/lib/csum-wrappers_64.c:45: if (__get_user(val16, 
(const __u16 __user *)src))

A bunch of strays in signal.c (extending ..._ex use might be a good idea)
xen_safe_{write,read}_ulong() (might very well break from adding access_ok() -
or be security holes; I'm not familiar enough with that code to tell) and
csum_partial_copy_{to,from}_user() - those would need to extend stac/clac
pairs already there and switch to unsafe_...

Plus several loops in ptrace - might or might not be sensitive, no idea.

Plus do_fast_syscall_32().  Might be sensitive, Andy would be the guy to
talk to, AFAICS.

> And some of the existing cases are just disgusting. There's no excuse
> for compat code or for ioctl to use the "__" versions. That seems to
> be the bulk of those uses too.

Sure.

My point is, this stuff needs looking at.  Even this quick look in arch/x86
has shown several fairly different classes of that stuff, probably needing
different approaches.  And that - on an architecture that had tons of TLC
around signal delivery; I'm not saying that result is optimal (asm-goto sounds
potentially useful there), but it had a lot of attention given to it...


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 10:18:22AM -0700, Linus Torvalds wrote:

> > x86 actually tends to use get_user_ex/put_user_ex instead.
> 
> Yes. Because anybody who *really* cared about performance will have
> converted already. The real cost has been stac/clac for a few years
> now.

On x86.  Which has (not counting arch/x86/um, which definitely needs
a careful look - there __..._user() is the same as ..._user() and costly
as hell) all of 12 callers of __get_user() and 31 callers of __put_user().
More than a half of the latter - in cp_stat64() and I would at least try
and see if "convert on stack, then copy_to_user" is better for that one.
Other than that, all we have is:

arch/x86/entry/common.c:372:__get_user(*(u32 *)>bp,
arch/x86/ia32/ia32_signal.c:124:if (__get_user(set.sig[0], 
>sc.oldmask)
arch/x86/ia32/ia32_signal.c:275:if (__put_user(sig, >sig))
arch/x86/include/asm/xen/page.h:86: return __put_user(val, (unsigned long 
__user *)addr);
arch/x86/include/asm/xen/page.h:91: return __get_user(*val, (unsigned long 
__user *)addr);
arch/x86/kernel/fpu/signal.c:100:   err |= __get_user(xfeatures, (__u32 
*)>header.xfeatures);
arch/x86/kernel/fpu/signal.c:115:   err |= __put_user(xfeatures, (__u32 
*)>header.xfeatures);
arch/x86/kernel/fpu/signal.c:46:if (__get_user(magic2, (__u32 __user 
*)(fpstate + fx_sw->xstate_size))
arch/x86/kernel/fpu/signal.c:66:__put_user(xsave->i387.swd, 
>status) ||
arch/x86/kernel/fpu/signal.c:67:__put_user(X86_FXSR_MAGIC, 
>magic))
arch/x86/kernel/fpu/signal.c:72:if (__get_user(swd, >swd) 
|| __put_user(swd, >status))
arch/x86/kernel/fpu/signal.c:72:if (__get_user(swd, >swd) 
|| __put_user(swd, >status))
arch/x86/kernel/fpu/signal.c:93:err |= __put_user(FP_XSTATE_MAGIC2,
arch/x86/kernel/ptrace.c:1043:  if (__put_user(word, u++))
arch/x86/kernel/ptrace.c:1070:  ret = __get_user(word, u++);
arch/x86/kernel/ptrace.c:472:   if (__put_user(getreg(target, 
pos), u++))
arch/x86/kernel/ptrace.c:499:   ret = __get_user(word, u++);
arch/x86/kernel/signal.c:326:   if (__put_user(sig, >sig))
arch/x86/kernel/signal.c:347:   err |= __put_user(restorer, >pretcode);
arch/x86/kernel/signal.c:356:   err |= __put_user(*((u64 *)), (u64 
*)frame->retcode);
arch/x86/kernel/signal.c:613:   if (__get_user(set.sig[0], >sc.oldmask) 
|| (_NSIG_WORDS > 1
arch/x86/kernel/signal.c:647:   if (__get_user(uc_flags, >uc.uc_flags))
arch/x86/kernel/signal.c:870:   if (__get_user(uc_flags, >uc.uc_flags))
arch/x86/lib/csum-wrappers_64.c:103:*errp = 
__put_user(val16, (__u16 __user *)dst);
arch/x86/lib/csum-wrappers_64.c:45: if (__get_user(val16, 
(const __u16 __user *)src))

A bunch of strays in signal.c (extending ..._ex use might be a good idea)
xen_safe_{write,read}_ulong() (might very well break from adding access_ok() -
or be security holes; I'm not familiar enough with that code to tell) and
csum_partial_copy_{to,from}_user() - those would need to extend stac/clac
pairs already there and switch to unsafe_...

Plus several loops in ptrace - might or might not be sensitive, no idea.

Plus do_fast_syscall_32().  Might be sensitive, Andy would be the guy to
talk to, AFAICS.

> And some of the existing cases are just disgusting. There's no excuse
> for compat code or for ioctl to use the "__" versions. That seems to
> be the bulk of those uses too.

Sure.

My point is, this stuff needs looking at.  Even this quick look in arch/x86
has shown several fairly different classes of that stuff, probably needing
different approaches.  And that - on an architecture that had tons of TLC
around signal delivery; I'm not saying that result is optimal (asm-goto sounds
potentially useful there), but it had a lot of attention given to it...


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
On Sat, May 13, 2017 at 10:00 AM, Al Viro  wrote:
> On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote:
>> > IOW, we have
>> > * most of users in arch/* (heavily dominated by signal-related 
>> > code,
>> > both loads and stores).  Those need careful massage; maybe unsafe-based
>> > solution, maybe something else, but it's obviously per-architecture work
>> > and these paths are sensitive.
>>
>> Why are they sensitive?
>
> Because there we do have tons of back-to-back __get_user() (on sigreturn())
> or __put_user() (on signal setup).

It doesn't actually matter. Regular "put_user()" doesn't generate
noticeably worse code.

It's not a normal function call, it's still an inline asm - so it
basically generates the exact same code as __xyz_user(), except it's a
call to a fixed copy of the code.

So the only difference tends to be
 (a) the extra call/ret instructions, possibly frame pointers
 (b) fixed registers
 (c) the added addr_limit checks

but (a) is cheap, and (b) isn't a big deal since with "asm volatile"
you can't re-order those things against each other anyway. So (b) ends
up being approximately the cost of "one extra lea instruction" for the
address generation that would otherwise be in the load/store
instruction.

And (c) is actually a reason *for* doing it. We've had bugs due to
people not getting it right.

So there really is basically no performance downside.  Even with
consecutive uses.  The code that uses a function call might even be
smaller (ie the 1.7kB savings isn't necessarily all in the out-of-line
exception handling cases: the stac/clac instructions are six bytes for
each use, so it more than makes up for the call instruction).

> x86 actually tends to use get_user_ex/put_user_ex instead.

Yes. Because anybody who *really* cared about performance will have
converted already. The real cost has been stac/clac for a few years
now.

So we pretty much know that none of the remaining users are really all
that critical. Certainly not "five cycles" kind of critical.

>> Anybody who *relies* on not checking the address_limit is so broken as
>> to be not even funny.
>
> Except for open-coded probe_kernel_read() somewhere in arch/*; I have not
> read through those 700+ callers, so I don't know if there's any such.

.. and those need to be fixed.

But I'm not seeing the point in wasting valuable human time in reading
through thousands of cases, when we can just automate it and see if
anything breaks.

Because it will break in a *safe* direction. You'll get an error from
bad uses that shouldn't have worked to begin with.

And some of the existing cases are just disgusting. There's no excuse
for compat code or for ioctl to use the "__" versions. That seems to
be the bulk of those uses too.

 Linus


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
On Sat, May 13, 2017 at 10:00 AM, Al Viro  wrote:
> On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote:
>> > IOW, we have
>> > * most of users in arch/* (heavily dominated by signal-related 
>> > code,
>> > both loads and stores).  Those need careful massage; maybe unsafe-based
>> > solution, maybe something else, but it's obviously per-architecture work
>> > and these paths are sensitive.
>>
>> Why are they sensitive?
>
> Because there we do have tons of back-to-back __get_user() (on sigreturn())
> or __put_user() (on signal setup).

It doesn't actually matter. Regular "put_user()" doesn't generate
noticeably worse code.

It's not a normal function call, it's still an inline asm - so it
basically generates the exact same code as __xyz_user(), except it's a
call to a fixed copy of the code.

So the only difference tends to be
 (a) the extra call/ret instructions, possibly frame pointers
 (b) fixed registers
 (c) the added addr_limit checks

but (a) is cheap, and (b) isn't a big deal since with "asm volatile"
you can't re-order those things against each other anyway. So (b) ends
up being approximately the cost of "one extra lea instruction" for the
address generation that would otherwise be in the load/store
instruction.

And (c) is actually a reason *for* doing it. We've had bugs due to
people not getting it right.

So there really is basically no performance downside.  Even with
consecutive uses.  The code that uses a function call might even be
smaller (ie the 1.7kB savings isn't necessarily all in the out-of-line
exception handling cases: the stac/clac instructions are six bytes for
each use, so it more than makes up for the call instruction).

> x86 actually tends to use get_user_ex/put_user_ex instead.

Yes. Because anybody who *really* cared about performance will have
converted already. The real cost has been stac/clac for a few years
now.

So we pretty much know that none of the remaining users are really all
that critical. Certainly not "five cycles" kind of critical.

>> Anybody who *relies* on not checking the address_limit is so broken as
>> to be not even funny.
>
> Except for open-coded probe_kernel_read() somewhere in arch/*; I have not
> read through those 700+ callers, so I don't know if there's any such.

.. and those need to be fixed.

But I'm not seeing the point in wasting valuable human time in reading
through thousands of cases, when we can just automate it and see if
anything breaks.

Because it will break in a *safe* direction. You'll get an error from
bad uses that shouldn't have worked to begin with.

And some of the existing cases are just disgusting. There's no excuse
for compat code or for ioctl to use the "__" versions. That seems to
be the bulk of those uses too.

 Linus


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 06:00:56PM +0100, Al Viro wrote:

> > But I don't see the excuse for not just doing it. If nobody notices,
> > it's an obvious improvement. And if somebody *does* notice, we know
> > how to do it properly with unsafe_xyz_user(), because "__xyz_user()"
> > most definitely isn't it.
> 
> I think we ought to actually look through those places - there are few
> enough of them (outside of arch/, that is) and stac/clac overhead is
> not the only problem they tend to have.

PS: just to make it clear - I do _not_ propose to keep that shit around
indefinitely; I want __get_user()/__put_user() gone by the end of that
work.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 06:00:56PM +0100, Al Viro wrote:

> > But I don't see the excuse for not just doing it. If nobody notices,
> > it's an obvious improvement. And if somebody *does* notice, we know
> > how to do it properly with unsafe_xyz_user(), because "__xyz_user()"
> > most definitely isn't it.
> 
> I think we ought to actually look through those places - there are few
> enough of them (outside of arch/, that is) and stac/clac overhead is
> not the only problem they tend to have.

PS: just to make it clear - I do _not_ propose to keep that shit around
indefinitely; I want __get_user()/__put_user() gone by the end of that
work.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote:
> > IOW, we have
> > * most of users in arch/* (heavily dominated by signal-related code,
> > both loads and stores).  Those need careful massage; maybe unsafe-based
> > solution, maybe something else, but it's obviously per-architecture work
> > and these paths are sensitive.
> 
> Why are they sensitive?

Because there we do have tons of back-to-back __get_user() (on sigreturn())
or __put_user() (on signal setup).

> Why not just do this:
> 
>   git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86
> :^arch/x86/include/asm/uaccess.h
> | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g'
> 
> which converts all the x86 uses in one go.

x86 actually tends to use get_user_ex/put_user_ex instead.

> Anybody who *relies* on not checking the address_limit is so broken as
> to be not even funny.

Except for open-coded probe_kernel_read() somewhere in arch/*; I have not
read through those 700+ callers, so I don't know if there's any such.
Sure, those won't be performance-sensitive.

> And anything that is so performance-sensitive
> that anybody can even measure the effect of the above we can convert
> later.

> Sure, do it in pieces (eg each architecture separately, then
> "drivers", then "networking", then whatever, until all done), just so
> that *if* something actually depends on semantics (and that really
> shouldn't be the case), we have at least some information from a
> bisect.
>
> But I don't see the excuse for not just doing it. If nobody notices,
> it's an obvious improvement. And if somebody *does* notice, we know
> how to do it properly with unsafe_xyz_user(), because "__xyz_user()"
> most definitely isn't it.

I think we ought to actually look through those places - there are few
enough of them (outside of arch/, that is) and stac/clac overhead is
not the only problem they tend to have.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote:
> > IOW, we have
> > * most of users in arch/* (heavily dominated by signal-related code,
> > both loads and stores).  Those need careful massage; maybe unsafe-based
> > solution, maybe something else, but it's obviously per-architecture work
> > and these paths are sensitive.
> 
> Why are they sensitive?

Because there we do have tons of back-to-back __get_user() (on sigreturn())
or __put_user() (on signal setup).

> Why not just do this:
> 
>   git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86
> :^arch/x86/include/asm/uaccess.h
> | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g'
> 
> which converts all the x86 uses in one go.

x86 actually tends to use get_user_ex/put_user_ex instead.

> Anybody who *relies* on not checking the address_limit is so broken as
> to be not even funny.

Except for open-coded probe_kernel_read() somewhere in arch/*; I have not
read through those 700+ callers, so I don't know if there's any such.
Sure, those won't be performance-sensitive.

> And anything that is so performance-sensitive
> that anybody can even measure the effect of the above we can convert
> later.

> Sure, do it in pieces (eg each architecture separately, then
> "drivers", then "networking", then whatever, until all done), just so
> that *if* something actually depends on semantics (and that really
> shouldn't be the case), we have at least some information from a
> bisect.
>
> But I don't see the excuse for not just doing it. If nobody notices,
> it's an obvious improvement. And if somebody *does* notice, we know
> how to do it properly with unsafe_xyz_user(), because "__xyz_user()"
> most definitely isn't it.

I think we ought to actually look through those places - there are few
enough of them (outside of arch/, that is) and stac/clac overhead is
not the only problem they tend to have.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 02:05:27PM +0200, Adam Borowski wrote:

> As someone from the peanuts gallery, I took a look for __put_user() in my
> usual haunt, drivers/tty/vt/
> 
> * use 1: con_[gs]et_trans_*():
>   Copies a linear array of 256 bytes/shorts, one by one.
>   The obvious patch has 9 insertions(+), 22 deletions(-).
> 
> * use 2: con_[gs]et_unimap():
>   Ditto, up to 65535*2 shorts, also in a nice linear array.
> 
> * use 3: tioclinux():
>   Does a __put into a place that was checked only for read.  This got me
>   frightened as it initially looked like something that can allow an user to
>   write where they shouldn't.  Fortunately, it turns out the first argument
>   to access_ok() is ignored on every single architecture -- why does it even
>   exist then?  I imagined it's there for some odd arch that allows writes
>   when in privileged mode, but unlike what the docs say, VERIFY_WRITE is
>   exactly same as VERIFY_READ.

It's a remnant of old kludge that never properly worked in the first place.
access_ok() should have been called userland_range() - that's all it
checks and that's all it *can* check.

As it is, each of those __get_user() can bloody well yield -EFAULT.  Despite
having "passed" access_ok().  Again, the only thing access_ok() checks is
that (on architecture with shared address space for userland and kernel)
the addresses given are on the userland side.  That's _it_ - they can
be unmapped, mmapped to broken floppy, whatever; you'll find out when
you try to actually copy bytes from from it.

What the kludge used to attempt was "let's check that we are not trying
to copy into read-only mmapped area - 80386 MMU is fucked in head and won't
fault on such stores in ring 0".  It had always been racy.  Look:
thread A: going to copy something to user-supplied address.  Do access_ok().
thread A: looks like it's mapped writable, let's go ahead and copy
thread A: do something blocking before actually doing __put_user() or
__copy_to_user() or whatever it's going to be.
thread B: munmap(), mmap() something read-only here.
thread A: get to actual __put_user()/__copy_to_user().
80386: hey, it's ring 0, no need to look at the write protect bit in page
tables.  What can go wrong, anyway?

You can't move any non-static checks to access_ok().  On any architecture.
Anything that could change between access_ok() and actual copying can't
be checked in access_ok().  As it is, access_ok() has actively misleading
calling conventions:
* the name implies that having passed access_ok() you don't have
to worry about EFAULT
* 'direction' argument of that thing reinforces that impression
*and* has to be produced by the caller.  Most simply pass a constant,
which immediately gets dropped (as an aside, take a look at 4b4554f6d - it's
amusing), but in some cases it's calculated elsewhere and carefully passed
through several levels of call chain.  Only to be discarded by access_ok(),
of course...

> Ie, every use in this sample is wrong.  I suspect the rest of the kernel
> should be similar.

Looking through vt...

* con_set_trans_old(): copy_from_user() + loop for doing or with
  UNI_DIRECT_BASE.  Almost certainly will be faster that way - on *any*
  architecture.
* con_get_trans_old(): copy_to_user() would be an obvious optimization.
* con_set_trans_new(): copy_from_user().
* con_get_trans_new(): copy_to_user().
* con_set_unimap(): memdup_user() instead of the entire kmalloc_array +
loop copying the sucker member by member.  With ushort ct you don't need
overflow-related logics of kmalloc_array() anyway...
* con_get_unimap(): copy_to_user() + put_user() (for uct)
* set_selection(): just copy_from_user() into local struct tiocl_selection.
* tioclinux(): use put_user().  Yes, you will repeat the same check twice.
Once per ioctl(), that involves enough work to make that "recalculate
access_ok() once for no reason" non-issue on any architecture.
* vt_ioctl(): turn that ushort ll,cc,vlin,clin,vcol,ccol; into
  struct vt_consize size; or something like that and use copy_from_user()

And AFAICS you can lose each and every access_ok() call in there.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Sat, May 13, 2017 at 02:05:27PM +0200, Adam Borowski wrote:

> As someone from the peanuts gallery, I took a look for __put_user() in my
> usual haunt, drivers/tty/vt/
> 
> * use 1: con_[gs]et_trans_*():
>   Copies a linear array of 256 bytes/shorts, one by one.
>   The obvious patch has 9 insertions(+), 22 deletions(-).
> 
> * use 2: con_[gs]et_unimap():
>   Ditto, up to 65535*2 shorts, also in a nice linear array.
> 
> * use 3: tioclinux():
>   Does a __put into a place that was checked only for read.  This got me
>   frightened as it initially looked like something that can allow an user to
>   write where they shouldn't.  Fortunately, it turns out the first argument
>   to access_ok() is ignored on every single architecture -- why does it even
>   exist then?  I imagined it's there for some odd arch that allows writes
>   when in privileged mode, but unlike what the docs say, VERIFY_WRITE is
>   exactly same as VERIFY_READ.

It's a remnant of old kludge that never properly worked in the first place.
access_ok() should have been called userland_range() - that's all it
checks and that's all it *can* check.

As it is, each of those __get_user() can bloody well yield -EFAULT.  Despite
having "passed" access_ok().  Again, the only thing access_ok() checks is
that (on architecture with shared address space for userland and kernel)
the addresses given are on the userland side.  That's _it_ - they can
be unmapped, mmapped to broken floppy, whatever; you'll find out when
you try to actually copy bytes from from it.

What the kludge used to attempt was "let's check that we are not trying
to copy into read-only mmapped area - 80386 MMU is fucked in head and won't
fault on such stores in ring 0".  It had always been racy.  Look:
thread A: going to copy something to user-supplied address.  Do access_ok().
thread A: looks like it's mapped writable, let's go ahead and copy
thread A: do something blocking before actually doing __put_user() or
__copy_to_user() or whatever it's going to be.
thread B: munmap(), mmap() something read-only here.
thread A: get to actual __put_user()/__copy_to_user().
80386: hey, it's ring 0, no need to look at the write protect bit in page
tables.  What can go wrong, anyway?

You can't move any non-static checks to access_ok().  On any architecture.
Anything that could change between access_ok() and actual copying can't
be checked in access_ok().  As it is, access_ok() has actively misleading
calling conventions:
* the name implies that having passed access_ok() you don't have
to worry about EFAULT
* 'direction' argument of that thing reinforces that impression
*and* has to be produced by the caller.  Most simply pass a constant,
which immediately gets dropped (as an aside, take a look at 4b4554f6d - it's
amusing), but in some cases it's calculated elsewhere and carefully passed
through several levels of call chain.  Only to be discarded by access_ok(),
of course...

> Ie, every use in this sample is wrong.  I suspect the rest of the kernel
> should be similar.

Looking through vt...

* con_set_trans_old(): copy_from_user() + loop for doing or with
  UNI_DIRECT_BASE.  Almost certainly will be faster that way - on *any*
  architecture.
* con_get_trans_old(): copy_to_user() would be an obvious optimization.
* con_set_trans_new(): copy_from_user().
* con_get_trans_new(): copy_to_user().
* con_set_unimap(): memdup_user() instead of the entire kmalloc_array +
loop copying the sucker member by member.  With ushort ct you don't need
overflow-related logics of kmalloc_array() anyway...
* con_get_unimap(): copy_to_user() + put_user() (for uct)
* set_selection(): just copy_from_user() into local struct tiocl_selection.
* tioclinux(): use put_user().  Yes, you will repeat the same check twice.
Once per ioctl(), that involves enough work to make that "recalculate
access_ok() once for no reason" non-issue on any architecture.
* vt_ioctl(): turn that ushort ll,cc,vlin,clin,vcol,ccol; into
  struct vt_consize size; or something like that and use copy_from_user()

And AFAICS you can lose each and every access_ok() call in there.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
Oops.

*Really* adding the x86 guys now.

Blush.

Linus

On Sat, May 13, 2017 at 9:15 AM, Linus Torvalds
 wrote:
> On Fri, May 12, 2017 at 11:57 PM, Al Viro  wrote:
>>
>> First, some stats: there's a thousand-odd callers of __get_user().  Out of
>> those, about 70% are in arch/, mostly in sigframe-related code.
>
> Sure. And they can be trivially converted, and none of them should care at 
> all.
>
>> IOW, we have
>> * most of users in arch/* (heavily dominated by signal-related code,
>> both loads and stores).  Those need careful massage; maybe unsafe-based
>> solution, maybe something else, but it's obviously per-architecture work
>> and these paths are sensitive.
>
> Why are they sensitive?
>
> Why not just do this:
>
>   git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86
> :^arch/x86/include/asm/uaccess.h
> | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g'
>
> which converts all the x86 uses in one go.
>
> Anybody who *relies* on not checking the address_limit is so broken as
> to be not even funny. And anything that is so performance-sensitive
> that anybody can even measure the effect of the above we can convert
> later.
>
> Sure, do it in pieces (eg each architecture separately, then
> "drivers", then "networking", then whatever, until all done), just so
> that *if* something actually depends on semantics (and that really
> shouldn't be the case), we have at least some information from a
> bisect.
>
> But I don't see the excuse for not just doing it. If nobody notices,
> it's an obvious improvement. And if somebody *does* notice, we know
> how to do it properly with unsafe_xyz_user(), because "__xyz_user()"
> most definitely isn't it.
>
> An example of something that *should* be converted is
> "csum_partial_copy_from_user()", but it really does need to use
> "user_access_begin()" and friends, because right now it's using
> stac/clac for each 16-bit word. That's *expensive*, and it's expensive
> whether you use __get_user() or get_user().
>
> Adding x86 people just to see how they react to the above "patch".
>
> For me, in my fairly minimal personal config, the above cleanup patch
> shrinks the text size of the resulting kernel by 1.7kB. Yes, most of
> it is the out-of-line code, but still..
>
> Interestingly, the signal handling code didn't change at all. It looks
> like only the 32-bit code uses __put_user() for the frame setup. The
> regular code uses put_user_try/put_user_catch, which is the
> x86-specific early try at the unsafe versions, but it would actually
> be improved by using "unsafe_put_user()" and my patch to make that use
> "asm goto".
>
>Linus
>
> PS. That "patch" depends on modern git - with older versions of git
> you need to do the path negation with ":!pattern", and then you need
> to quote it too since '!' is special for shell.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
Oops.

*Really* adding the x86 guys now.

Blush.

Linus

On Sat, May 13, 2017 at 9:15 AM, Linus Torvalds
 wrote:
> On Fri, May 12, 2017 at 11:57 PM, Al Viro  wrote:
>>
>> First, some stats: there's a thousand-odd callers of __get_user().  Out of
>> those, about 70% are in arch/, mostly in sigframe-related code.
>
> Sure. And they can be trivially converted, and none of them should care at 
> all.
>
>> IOW, we have
>> * most of users in arch/* (heavily dominated by signal-related code,
>> both loads and stores).  Those need careful massage; maybe unsafe-based
>> solution, maybe something else, but it's obviously per-architecture work
>> and these paths are sensitive.
>
> Why are they sensitive?
>
> Why not just do this:
>
>   git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86
> :^arch/x86/include/asm/uaccess.h
> | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g'
>
> which converts all the x86 uses in one go.
>
> Anybody who *relies* on not checking the address_limit is so broken as
> to be not even funny. And anything that is so performance-sensitive
> that anybody can even measure the effect of the above we can convert
> later.
>
> Sure, do it in pieces (eg each architecture separately, then
> "drivers", then "networking", then whatever, until all done), just so
> that *if* something actually depends on semantics (and that really
> shouldn't be the case), we have at least some information from a
> bisect.
>
> But I don't see the excuse for not just doing it. If nobody notices,
> it's an obvious improvement. And if somebody *does* notice, we know
> how to do it properly with unsafe_xyz_user(), because "__xyz_user()"
> most definitely isn't it.
>
> An example of something that *should* be converted is
> "csum_partial_copy_from_user()", but it really does need to use
> "user_access_begin()" and friends, because right now it's using
> stac/clac for each 16-bit word. That's *expensive*, and it's expensive
> whether you use __get_user() or get_user().
>
> Adding x86 people just to see how they react to the above "patch".
>
> For me, in my fairly minimal personal config, the above cleanup patch
> shrinks the text size of the resulting kernel by 1.7kB. Yes, most of
> it is the out-of-line code, but still..
>
> Interestingly, the signal handling code didn't change at all. It looks
> like only the 32-bit code uses __put_user() for the frame setup. The
> regular code uses put_user_try/put_user_catch, which is the
> x86-specific early try at the unsafe versions, but it would actually
> be improved by using "unsafe_put_user()" and my patch to make that use
> "asm goto".
>
>Linus
>
> PS. That "patch" depends on modern git - with older versions of git
> you need to do the path negation with ":!pattern", and then you need
> to quote it too since '!' is special for shell.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
On Fri, May 12, 2017 at 11:57 PM, Al Viro  wrote:
>
> First, some stats: there's a thousand-odd callers of __get_user().  Out of
> those, about 70% are in arch/, mostly in sigframe-related code.

Sure. And they can be trivially converted, and none of them should care at all.

> IOW, we have
> * most of users in arch/* (heavily dominated by signal-related code,
> both loads and stores).  Those need careful massage; maybe unsafe-based
> solution, maybe something else, but it's obviously per-architecture work
> and these paths are sensitive.

Why are they sensitive?

Why not just do this:

  git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86
:^arch/x86/include/asm/uaccess.h
| xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g'

which converts all the x86 uses in one go.

Anybody who *relies* on not checking the address_limit is so broken as
to be not even funny. And anything that is so performance-sensitive
that anybody can even measure the effect of the above we can convert
later.

Sure, do it in pieces (eg each architecture separately, then
"drivers", then "networking", then whatever, until all done), just so
that *if* something actually depends on semantics (and that really
shouldn't be the case), we have at least some information from a
bisect.

But I don't see the excuse for not just doing it. If nobody notices,
it's an obvious improvement. And if somebody *does* notice, we know
how to do it properly with unsafe_xyz_user(), because "__xyz_user()"
most definitely isn't it.

An example of something that *should* be converted is
"csum_partial_copy_from_user()", but it really does need to use
"user_access_begin()" and friends, because right now it's using
stac/clac for each 16-bit word. That's *expensive*, and it's expensive
whether you use __get_user() or get_user().

Adding x86 people just to see how they react to the above "patch".

For me, in my fairly minimal personal config, the above cleanup patch
shrinks the text size of the resulting kernel by 1.7kB. Yes, most of
it is the out-of-line code, but still..

Interestingly, the signal handling code didn't change at all. It looks
like only the 32-bit code uses __put_user() for the frame setup. The
regular code uses put_user_try/put_user_catch, which is the
x86-specific early try at the unsafe versions, but it would actually
be improved by using "unsafe_put_user()" and my patch to make that use
"asm goto".

   Linus

PS. That "patch" depends on modern git - with older versions of git
you need to do the path negation with ":!pattern", and then you need
to quote it too since '!' is special for shell.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Linus Torvalds
On Fri, May 12, 2017 at 11:57 PM, Al Viro  wrote:
>
> First, some stats: there's a thousand-odd callers of __get_user().  Out of
> those, about 70% are in arch/, mostly in sigframe-related code.

Sure. And they can be trivially converted, and none of them should care at all.

> IOW, we have
> * most of users in arch/* (heavily dominated by signal-related code,
> both loads and stores).  Those need careful massage; maybe unsafe-based
> solution, maybe something else, but it's obviously per-architecture work
> and these paths are sensitive.

Why are they sensitive?

Why not just do this:

  git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86
:^arch/x86/include/asm/uaccess.h
| xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g'

which converts all the x86 uses in one go.

Anybody who *relies* on not checking the address_limit is so broken as
to be not even funny. And anything that is so performance-sensitive
that anybody can even measure the effect of the above we can convert
later.

Sure, do it in pieces (eg each architecture separately, then
"drivers", then "networking", then whatever, until all done), just so
that *if* something actually depends on semantics (and that really
shouldn't be the case), we have at least some information from a
bisect.

But I don't see the excuse for not just doing it. If nobody notices,
it's an obvious improvement. And if somebody *does* notice, we know
how to do it properly with unsafe_xyz_user(), because "__xyz_user()"
most definitely isn't it.

An example of something that *should* be converted is
"csum_partial_copy_from_user()", but it really does need to use
"user_access_begin()" and friends, because right now it's using
stac/clac for each 16-bit word. That's *expensive*, and it's expensive
whether you use __get_user() or get_user().

Adding x86 people just to see how they react to the above "patch".

For me, in my fairly minimal personal config, the above cleanup patch
shrinks the text size of the resulting kernel by 1.7kB. Yes, most of
it is the out-of-line code, but still..

Interestingly, the signal handling code didn't change at all. It looks
like only the 32-bit code uses __put_user() for the frame setup. The
regular code uses put_user_try/put_user_catch, which is the
x86-specific early try at the unsafe versions, but it would actually
be improved by using "unsafe_put_user()" and my patch to make that use
"asm goto".

   Linus

PS. That "patch" depends on modern git - with older versions of git
you need to do the path negation with ":!pattern", and then you need
to quote it too since '!' is special for shell.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Brian Gerst
On Sat, May 13, 2017 at 8:05 AM, Adam Borowski  wrote:
> On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote:
>> On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote:
>> > But the *first* thing I'd like to do would be to just get rid of
>> > __get_user/__put_user as a thing. It really does generate nasty code,
>> > and we might as well just make it do that function call.
>>
>> But IMO the first step is not to convert __get_user()/__put_user() to
>> aliases of get_user()/put_user() - it's to get rid of their infestations
>> outside of arch/*.  They are concentrated in relatively few places, so
>> we can deal with those individually and then just fucking ban their use
>> outside of arch/*.  That's easy enough to watch for - simple git grep will 
>> do,
>> so anything creeping back will be immediately spotted.
>
> As someone from the peanuts gallery, I took a look for __put_user() in my
> usual haunt, drivers/tty/vt/
>
> * use 1: con_[gs]et_trans_*():
>   Copies a linear array of 256 bytes/shorts, one by one.
>   The obvious patch has 9 insertions(+), 22 deletions(-).
>
> * use 2: con_[gs]et_unimap():
>   Ditto, up to 65535*2 shorts, also in a nice linear array.
>
> * use 3: tioclinux():
>   Does a __put into a place that was checked only for read.  This got me
>   frightened as it initially looked like something that can allow an user to
>   write where they shouldn't.  Fortunately, it turns out the first argument
>   to access_ok() is ignored on every single architecture -- why does it even
>   exist then?  I imagined it's there for some odd arch that allows writes
>   when in privileged mode, but unlike what the docs say, VERIFY_WRITE is
>   exactly same as VERIFY_READ.

The original i386 (no longer supported) ignored the write-protect bit
when in supervisor mode.

--
Brian Gerst


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Brian Gerst
On Sat, May 13, 2017 at 8:05 AM, Adam Borowski  wrote:
> On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote:
>> On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote:
>> > But the *first* thing I'd like to do would be to just get rid of
>> > __get_user/__put_user as a thing. It really does generate nasty code,
>> > and we might as well just make it do that function call.
>>
>> But IMO the first step is not to convert __get_user()/__put_user() to
>> aliases of get_user()/put_user() - it's to get rid of their infestations
>> outside of arch/*.  They are concentrated in relatively few places, so
>> we can deal with those individually and then just fucking ban their use
>> outside of arch/*.  That's easy enough to watch for - simple git grep will 
>> do,
>> so anything creeping back will be immediately spotted.
>
> As someone from the peanuts gallery, I took a look for __put_user() in my
> usual haunt, drivers/tty/vt/
>
> * use 1: con_[gs]et_trans_*():
>   Copies a linear array of 256 bytes/shorts, one by one.
>   The obvious patch has 9 insertions(+), 22 deletions(-).
>
> * use 2: con_[gs]et_unimap():
>   Ditto, up to 65535*2 shorts, also in a nice linear array.
>
> * use 3: tioclinux():
>   Does a __put into a place that was checked only for read.  This got me
>   frightened as it initially looked like something that can allow an user to
>   write where they shouldn't.  Fortunately, it turns out the first argument
>   to access_ok() is ignored on every single architecture -- why does it even
>   exist then?  I imagined it's there for some odd arch that allows writes
>   when in privileged mode, but unlike what the docs say, VERIFY_WRITE is
>   exactly same as VERIFY_READ.

The original i386 (no longer supported) ignored the write-protect bit
when in supervisor mode.

--
Brian Gerst


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Adam Borowski
On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote:
> On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote:
> > But the *first* thing I'd like to do would be to just get rid of
> > __get_user/__put_user as a thing. It really does generate nasty code,
> > and we might as well just make it do that function call.
> 
> But IMO the first step is not to convert __get_user()/__put_user() to
> aliases of get_user()/put_user() - it's to get rid of their infestations
> outside of arch/*.  They are concentrated in relatively few places, so
> we can deal with those individually and then just fucking ban their use
> outside of arch/*.  That's easy enough to watch for - simple git grep will do,
> so anything creeping back will be immediately spotted.

As someone from the peanuts gallery, I took a look for __put_user() in my
usual haunt, drivers/tty/vt/

* use 1: con_[gs]et_trans_*():
  Copies a linear array of 256 bytes/shorts, one by one.
  The obvious patch has 9 insertions(+), 22 deletions(-).

* use 2: con_[gs]et_unimap():
  Ditto, up to 65535*2 shorts, also in a nice linear array.

* use 3: tioclinux():
  Does a __put into a place that was checked only for read.  This got me
  frightened as it initially looked like something that can allow an user to
  write where they shouldn't.  Fortunately, it turns out the first argument
  to access_ok() is ignored on every single architecture -- why does it even
  exist then?  I imagined it's there for some odd arch that allows writes
  when in privileged mode, but unlike what the docs say, VERIFY_WRITE is
  exactly same as VERIFY_READ.

Ie, every use in this sample is wrong.  I suspect the rest of the kernel
should be similar.


Meow!
-- 
Don't be racist.  White, amber or black, all beers should be judged based
solely on their merits.  Heck, even if occasionally a cider applies for a
beer's job, why not?
On the other hand, corpo lager is not a race.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Adam Borowski
On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote:
> On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote:
> > But the *first* thing I'd like to do would be to just get rid of
> > __get_user/__put_user as a thing. It really does generate nasty code,
> > and we might as well just make it do that function call.
> 
> But IMO the first step is not to convert __get_user()/__put_user() to
> aliases of get_user()/put_user() - it's to get rid of their infestations
> outside of arch/*.  They are concentrated in relatively few places, so
> we can deal with those individually and then just fucking ban their use
> outside of arch/*.  That's easy enough to watch for - simple git grep will do,
> so anything creeping back will be immediately spotted.

As someone from the peanuts gallery, I took a look for __put_user() in my
usual haunt, drivers/tty/vt/

* use 1: con_[gs]et_trans_*():
  Copies a linear array of 256 bytes/shorts, one by one.
  The obvious patch has 9 insertions(+), 22 deletions(-).

* use 2: con_[gs]et_unimap():
  Ditto, up to 65535*2 shorts, also in a nice linear array.

* use 3: tioclinux():
  Does a __put into a place that was checked only for read.  This got me
  frightened as it initially looked like something that can allow an user to
  write where they shouldn't.  Fortunately, it turns out the first argument
  to access_ok() is ignored on every single architecture -- why does it even
  exist then?  I imagined it's there for some odd arch that allows writes
  when in privileged mode, but unlike what the docs say, VERIFY_WRITE is
  exactly same as VERIFY_READ.

Ie, every use in this sample is wrong.  I suspect the rest of the kernel
should be similar.


Meow!
-- 
Don't be racist.  White, amber or black, all beers should be judged based
solely on their merits.  Heck, even if occasionally a cider applies for a
beer's job, why not?
On the other hand, corpo lager is not a race.


Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote:
> So I should have asked earlier, but I was feeling rather busy during
> the early merge window..

> So I'm actually more interested to hear if you have any pending work
> on cleaning up the __get/put_user() mess we have. Is that on your
> radar at all?

Yes, it is.

> In particular, right now, both __get_user() and __put_user() are nasty
> and broken interfaces.
>
> It *used* to be that they were designed to just generate a single
> instruction. That's not what they currently do at all, due to the
> whole SMAP/PAN on x86 and arm.
> 
> For example, on x86, right now a single __put_user() call ends up
> generating something like

[snip]
 
> But even that small thing is rather debatable from a performance
> angle: the actual cost of __put_user() these days is not the function
> call, but the clac/stac (on modern x86) and the PAN bit games (on
> arm).
> 
> So I'm actually inclined to just say "We should make
> __get_user/__put_user() just be aliases for the normal
> get_user/put_user() model".

> which is more boilerplate, but ends up generating much better code.
> And for "unsafe_put_user()" in particular, we actually could teach gcc
> to use "asm goto" to really improve code generation. I have patches
> for that if you want to look.
> 
> I'm attaching an example patch for "filldir()" that does that modern
> thing. It almost had the right form as-is anyway, and under some loads
> (eg "find") filldir actually shows up in profiles too.\

> But the *first* thing I'd like to do would be to just get rid of
> __get_user/__put_user as a thing. It really does generate nasty code,
> and we might as well just make it do that function call.
> 
> Because what we do now isn't right. If we care about performance, the
> "__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And
> if you don't care about performance, you should use the regular
> xyz_user() functions that do an ok job by just calling the right-sized
> helper function instead of inlining the crud.
> 
> Hmm?

First, some stats: there's a thousand-odd callers of __get_user().  Out of
those, about 70% are in arch/, mostly in sigframe-related code.  Take
a look at the output of
git grep -n -w __get_user|grep -v '^arch' | sed -e 's/:.*//'|uniq -c|sort -nr
 55 drivers/gpu/drm/drm_ioc32.c
 43 drivers/staging/comedi/comedi_compat32.c
 35 kernel/compat.c
 27 net/compat.c
 27 block/compat_ioctl.c
 15 drivers/usb/core/devio.c
 13 drivers/char/ipmi/ipmi_devintf.c
 11 kernel/signal.c
 10 fs/fcntl.c
  8 ipc/compat.c
  8 drivers/video/fbdev/sbuslib.c
  7 net/socket.c
  7 drivers/gpu/drm/mga/mga_ioc32.c
  6 fs/select.c
  6 drivers/tty/vt/vt_ioctl.c
  5 drivers/tty/vt/selection.c
  5 drivers/spi/spidev.c
  5 drivers/pci/proc.c
  4 kernel/ptrace.c
  4 ipc/compat_mq.c
  4 drivers/tty/vt/consolemap.c
  3 sound/oss/sys_timer.c
  3 drivers/media/usb/uvc/uvc_v4l2.c
  3 drivers/macintosh/ans-lcd.c
and then we have a smallish bunch of files with one or two callers.  For
__put_user() we have ~1800 callers total, ~1100 of them in arch/* and
the rest goes like this:
 73 drivers/gpu/drm/drm_ioc32.c
 66 ipc/compat.c
 58 drivers/gpu/drm/radeon/radeon_ioc32.c
 55 block/compat_ioctl.c
 52 kernel/compat.c
 49 kernel/signal.c
 43 drivers/staging/comedi/comedi_compat32.c
 31 drivers/gpu/drm/r128/r128_ioc32.c
 30 drivers/gpu/drm/mga/mga_ioc32.c
 27 fs/signalfd.c
 25 fs/readdir.c
 24 fs/statfs.c
 19 kernel/sys.c
 17 net/compat.c
 11 drivers/scsi/sg.c
 10 fs/fcntl.c
  8 sound/core/pcm_native.c
  8 fs/binfmt_flat.c
  7 fs/binfmt_elf_fdpic.c
  6 net/socket.c
  6 drivers/char/ipmi/ipmi_devintf.c
  5 sound/oss/sys_timer.c
  5 fs/binfmt_elf.c
  5 drivers/video/fbdev/sbuslib.c
  5 drivers/tty/vt/consolemap.c
  5 drivers/spi/spidev.c
  5 drivers/pci/proc.c
  4 kernel/ptrace.c
  4 ipc/compat_mq.c
  3 fs/select.c
  3 drivers/tty/vt/vt.c
  ...

IOW, we have
* most of users in arch/* (heavily dominated by signal-related code,
both loads and stores).  Those need careful massage; maybe unsafe-based
solution, maybe something else, but it's obviously per-architecture work
and these paths are sensitive.
* few places outside of arch/* where these are abused; absolute
majority are in ioctl compat code and they are _bad_.  Bad on x86, bad on
s390, etc.  I'm not sure if switch to unsafe is the right solution for
those, actually - depends on how we end up dealing with compat ioctls of
that sort.  Might be better to do bulk copy to/from userland, combined with
conversion from 32bit to native kernel-side and passing pointers to kernel
objects to functions doing actual work.  Really depends upon the details.
* some places in fairly common codepaths where __get_user() and
__put_user() are being played 

Re: [git pull] uaccess-related bits of vfs.git

2017-05-13 Thread Al Viro
On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote:
> So I should have asked earlier, but I was feeling rather busy during
> the early merge window..

> So I'm actually more interested to hear if you have any pending work
> on cleaning up the __get/put_user() mess we have. Is that on your
> radar at all?

Yes, it is.

> In particular, right now, both __get_user() and __put_user() are nasty
> and broken interfaces.
>
> It *used* to be that they were designed to just generate a single
> instruction. That's not what they currently do at all, due to the
> whole SMAP/PAN on x86 and arm.
> 
> For example, on x86, right now a single __put_user() call ends up
> generating something like

[snip]
 
> But even that small thing is rather debatable from a performance
> angle: the actual cost of __put_user() these days is not the function
> call, but the clac/stac (on modern x86) and the PAN bit games (on
> arm).
> 
> So I'm actually inclined to just say "We should make
> __get_user/__put_user() just be aliases for the normal
> get_user/put_user() model".

> which is more boilerplate, but ends up generating much better code.
> And for "unsafe_put_user()" in particular, we actually could teach gcc
> to use "asm goto" to really improve code generation. I have patches
> for that if you want to look.
> 
> I'm attaching an example patch for "filldir()" that does that modern
> thing. It almost had the right form as-is anyway, and under some loads
> (eg "find") filldir actually shows up in profiles too.\

> But the *first* thing I'd like to do would be to just get rid of
> __get_user/__put_user as a thing. It really does generate nasty code,
> and we might as well just make it do that function call.
> 
> Because what we do now isn't right. If we care about performance, the
> "__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And
> if you don't care about performance, you should use the regular
> xyz_user() functions that do an ok job by just calling the right-sized
> helper function instead of inlining the crud.
> 
> Hmm?

First, some stats: there's a thousand-odd callers of __get_user().  Out of
those, about 70% are in arch/, mostly in sigframe-related code.  Take
a look at the output of
git grep -n -w __get_user|grep -v '^arch' | sed -e 's/:.*//'|uniq -c|sort -nr
 55 drivers/gpu/drm/drm_ioc32.c
 43 drivers/staging/comedi/comedi_compat32.c
 35 kernel/compat.c
 27 net/compat.c
 27 block/compat_ioctl.c
 15 drivers/usb/core/devio.c
 13 drivers/char/ipmi/ipmi_devintf.c
 11 kernel/signal.c
 10 fs/fcntl.c
  8 ipc/compat.c
  8 drivers/video/fbdev/sbuslib.c
  7 net/socket.c
  7 drivers/gpu/drm/mga/mga_ioc32.c
  6 fs/select.c
  6 drivers/tty/vt/vt_ioctl.c
  5 drivers/tty/vt/selection.c
  5 drivers/spi/spidev.c
  5 drivers/pci/proc.c
  4 kernel/ptrace.c
  4 ipc/compat_mq.c
  4 drivers/tty/vt/consolemap.c
  3 sound/oss/sys_timer.c
  3 drivers/media/usb/uvc/uvc_v4l2.c
  3 drivers/macintosh/ans-lcd.c
and then we have a smallish bunch of files with one or two callers.  For
__put_user() we have ~1800 callers total, ~1100 of them in arch/* and
the rest goes like this:
 73 drivers/gpu/drm/drm_ioc32.c
 66 ipc/compat.c
 58 drivers/gpu/drm/radeon/radeon_ioc32.c
 55 block/compat_ioctl.c
 52 kernel/compat.c
 49 kernel/signal.c
 43 drivers/staging/comedi/comedi_compat32.c
 31 drivers/gpu/drm/r128/r128_ioc32.c
 30 drivers/gpu/drm/mga/mga_ioc32.c
 27 fs/signalfd.c
 25 fs/readdir.c
 24 fs/statfs.c
 19 kernel/sys.c
 17 net/compat.c
 11 drivers/scsi/sg.c
 10 fs/fcntl.c
  8 sound/core/pcm_native.c
  8 fs/binfmt_flat.c
  7 fs/binfmt_elf_fdpic.c
  6 net/socket.c
  6 drivers/char/ipmi/ipmi_devintf.c
  5 sound/oss/sys_timer.c
  5 fs/binfmt_elf.c
  5 drivers/video/fbdev/sbuslib.c
  5 drivers/tty/vt/consolemap.c
  5 drivers/spi/spidev.c
  5 drivers/pci/proc.c
  4 kernel/ptrace.c
  4 ipc/compat_mq.c
  3 fs/select.c
  3 drivers/tty/vt/vt.c
  ...

IOW, we have
* most of users in arch/* (heavily dominated by signal-related code,
both loads and stores).  Those need careful massage; maybe unsafe-based
solution, maybe something else, but it's obviously per-architecture work
and these paths are sensitive.
* few places outside of arch/* where these are abused; absolute
majority are in ioctl compat code and they are _bad_.  Bad on x86, bad on
s390, etc.  I'm not sure if switch to unsafe is the right solution for
those, actually - depends on how we end up dealing with compat ioctls of
that sort.  Might be better to do bulk copy to/from userland, combined with
conversion from 32bit to native kernel-side and passing pointers to kernel
objects to functions doing actual work.  Really depends upon the details.
* some places in fairly common codepaths where __get_user() and
__put_user() are being played 

Re: [git pull] uaccess-related bits of vfs.git

2017-05-12 Thread Linus Torvalds
So I should have asked earlier, but I was feeling rather busy during
the early merge window..

On Sun, Apr 30, 2017 at 8:45 PM, Al Viro  wrote:
> uaccess unification pile.  It's _not_ the end of uaccess work, but
> the next batch of that will go into the next cycle.  This one mostly takes
> copy_from_user() and friends out of arch/* and gets the zero-padding behaviour
> in sync for all architectures.
> Dealing with the nocache/writethrough mess is for the next cycle;
> fortunately, that's x86-only.

So I'm actually more interested to hear if you have any pending work
on cleaning up the __get/put_user() mess we have. Is that on your
radar at all?

In particular, right now, both __get_user() and __put_user() are nasty
and broken interfaces.

It *used* to be that they were designed to just generate a single
instruction. That's not what they currently do at all, due to the
whole SMAP/PAN on x86 and arm.

For example, on x86, right now a single __put_user() call ends up
generating something like

#APP
# 58 "./arch/x86/include/asm/smap.h" 1
661:

662:
.skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90
663:
.pushsection .altinstructions,"a"
 .long 661b - .
 .long 6641f - .
 .word ( 9*32+20)
 .byte 663b-661b
 .byte 6651f-6641f
 .byte 663b-662b
.popsection
.pushsection .altinstr_replacement, "ax"
6641:
.byte 0x0f,0x01,0xcb
6651:
.popsection
# 0 "" 2
#NO_APP
xorl%eax, %eax  # __pu_err
#APP
# 269 "fs/readdir.c" 1

1:  movq %rcx,8(%rdi)   # offset, MEM[(struct __large_struct *)_16]
2:
.section .fixup,"ax"
3:  mov $-14,%eax   #, __pu_err
jmp 2b
.previous
 .pushsection "__ex_table","a"
 .balign 4
 .long (1b) - .
 .long (3b) - .
 .long (ex_handler_default) - .
 .popsection

# 0 "" 2
# 52 "./arch/x86/include/asm/smap.h" 1
661:

662:
.skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90
663:
.pushsection .altinstructions,"a"
 .long 661b - .
 .long 6641f - .
 .word ( 9*32+20)
 .byte 663b-661b
 .byte 6651f-6641f
 .byte 663b-662b
.popsection
.pushsection .altinstr_replacement, "ax"
6641:
.byte 0x0f,0x01,0xca
6651:
.popsection
# 0 "" 2
#NO_APP

and while much of that is out-of-line and in different sections, it
basically means that it's insane how we inline these things.

Yes, yes, the inlined part of the above horror-story ends up being just

clac
xor%eax,%eax
mov%rcx,0x8(%rdi)
stac

and everything else is just boiler-plate for the alt-instruction
handling and the exception handling.

But even that small thing is rather debatable from a performance
angle: the actual cost of __put_user() these days is not the function
call, but the clac/stac (on modern x86) and the PAN bit games (on
arm).

So I'm actually inclined to just say "We should make
__get_user/__put_user() just be aliases for the normal
get_user/put_user() model".

The *correct* way to do fast user space accesses when you hoist error
checking outside is to use

if (!access_ok(..))
goto efault;
user_access_begin();
..
... loop over or otherwise do multiple "raw" accessess ...
unsafe_get/put_user(c, ptr, got_fault);
unsafe_get/put_user(c, ptr, got_fault);
...
user_access_end();
return 0;

got_fault:
user_access_end();
efault:
return -EFAULT;

which is more boilerplate, but ends up generating much better code.
And for "unsafe_put_user()" in particular, we actually could teach gcc
to use "asm goto" to really improve code generation. I have patches
for that if you want to look.

I'm attaching an example patch for "filldir()" that does that modern
thing. It almost had the right form as-is anyway, and under some loads
(eg "find") filldir actually shows up in profiles too.\

And just from a code generation standpoint, look at what the attached
patch does:

[torvalds@i7 linux]$ diff -u fs/readdir.s ~/readdir.s | diffstat
 readdir.s |  548 ++
 1 file changed, 201 insertions(+), 347 deletions(-)

just from getting rid of that crazy repeated SMAP overhead.

Yeah, yeah, doing diffstat on generated assembly files is all kinds of
crazy, but it's an example of what kind of odd garbage we currently
generate.

But the *first* thing I'd like to do would be to just get rid of
__get_user/__put_user as a thing. It really does generate nasty code,
and we might as well just make it do that function call.

Because what we do now isn't right. If we care about performance, the
"__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And
if you don't care about performance, you should use the 

Re: [git pull] uaccess-related bits of vfs.git

2017-05-12 Thread Linus Torvalds
So I should have asked earlier, but I was feeling rather busy during
the early merge window..

On Sun, Apr 30, 2017 at 8:45 PM, Al Viro  wrote:
> uaccess unification pile.  It's _not_ the end of uaccess work, but
> the next batch of that will go into the next cycle.  This one mostly takes
> copy_from_user() and friends out of arch/* and gets the zero-padding behaviour
> in sync for all architectures.
> Dealing with the nocache/writethrough mess is for the next cycle;
> fortunately, that's x86-only.

So I'm actually more interested to hear if you have any pending work
on cleaning up the __get/put_user() mess we have. Is that on your
radar at all?

In particular, right now, both __get_user() and __put_user() are nasty
and broken interfaces.

It *used* to be that they were designed to just generate a single
instruction. That's not what they currently do at all, due to the
whole SMAP/PAN on x86 and arm.

For example, on x86, right now a single __put_user() call ends up
generating something like

#APP
# 58 "./arch/x86/include/asm/smap.h" 1
661:

662:
.skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90
663:
.pushsection .altinstructions,"a"
 .long 661b - .
 .long 6641f - .
 .word ( 9*32+20)
 .byte 663b-661b
 .byte 6651f-6641f
 .byte 663b-662b
.popsection
.pushsection .altinstr_replacement, "ax"
6641:
.byte 0x0f,0x01,0xcb
6651:
.popsection
# 0 "" 2
#NO_APP
xorl%eax, %eax  # __pu_err
#APP
# 269 "fs/readdir.c" 1

1:  movq %rcx,8(%rdi)   # offset, MEM[(struct __large_struct *)_16]
2:
.section .fixup,"ax"
3:  mov $-14,%eax   #, __pu_err
jmp 2b
.previous
 .pushsection "__ex_table","a"
 .balign 4
 .long (1b) - .
 .long (3b) - .
 .long (ex_handler_default) - .
 .popsection

# 0 "" 2
# 52 "./arch/x86/include/asm/smap.h" 1
661:

662:
.skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90
663:
.pushsection .altinstructions,"a"
 .long 661b - .
 .long 6641f - .
 .word ( 9*32+20)
 .byte 663b-661b
 .byte 6651f-6641f
 .byte 663b-662b
.popsection
.pushsection .altinstr_replacement, "ax"
6641:
.byte 0x0f,0x01,0xca
6651:
.popsection
# 0 "" 2
#NO_APP

and while much of that is out-of-line and in different sections, it
basically means that it's insane how we inline these things.

Yes, yes, the inlined part of the above horror-story ends up being just

clac
xor%eax,%eax
mov%rcx,0x8(%rdi)
stac

and everything else is just boiler-plate for the alt-instruction
handling and the exception handling.

But even that small thing is rather debatable from a performance
angle: the actual cost of __put_user() these days is not the function
call, but the clac/stac (on modern x86) and the PAN bit games (on
arm).

So I'm actually inclined to just say "We should make
__get_user/__put_user() just be aliases for the normal
get_user/put_user() model".

The *correct* way to do fast user space accesses when you hoist error
checking outside is to use

if (!access_ok(..))
goto efault;
user_access_begin();
..
... loop over or otherwise do multiple "raw" accessess ...
unsafe_get/put_user(c, ptr, got_fault);
unsafe_get/put_user(c, ptr, got_fault);
...
user_access_end();
return 0;

got_fault:
user_access_end();
efault:
return -EFAULT;

which is more boilerplate, but ends up generating much better code.
And for "unsafe_put_user()" in particular, we actually could teach gcc
to use "asm goto" to really improve code generation. I have patches
for that if you want to look.

I'm attaching an example patch for "filldir()" that does that modern
thing. It almost had the right form as-is anyway, and under some loads
(eg "find") filldir actually shows up in profiles too.\

And just from a code generation standpoint, look at what the attached
patch does:

[torvalds@i7 linux]$ diff -u fs/readdir.s ~/readdir.s | diffstat
 readdir.s |  548 ++
 1 file changed, 201 insertions(+), 347 deletions(-)

just from getting rid of that crazy repeated SMAP overhead.

Yeah, yeah, doing diffstat on generated assembly files is all kinds of
crazy, but it's an example of what kind of odd garbage we currently
generate.

But the *first* thing I'd like to do would be to just get rid of
__get_user/__put_user as a thing. It really does generate nasty code,
and we might as well just make it do that function call.

Because what we do now isn't right. If we care about performance, the
"__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And
if you don't care about performance, you should use the regular
xyz_user() functions