Re: [PATCH kernel v3 1/2] dma: Allow mixing bypass and mapped DMA operation

2020-10-28 Thread Christoph Hellwig
On Wed, Oct 28, 2020 at 06:00:29PM +1100, Alexey Kardashevskiy wrote:
> At the moment we allow bypassing DMA ops only when we can do this for
> the entire RAM. However there are configs with mixed type memory
> where we could still allow bypassing IOMMU in most cases;
> POWERPC with persistent memory is one example.
> 
> This adds an arch hook to determine where bypass can still work and
> we invoke direct DMA API. The following patch checks the bus limit
> on POWERPC to allow or disallow direct mapping.
> 
> This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_
> hooks no-op by default.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  kernel/dma/mapping.c | 24 
>  kernel/dma/Kconfig   |  4 
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 51bb8fa8eb89..a0bc9eb876ed 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -137,6 +137,18 @@ static inline bool dma_map_direct(struct device *dev,
>   return dma_go_direct(dev, *dev->dma_mask, ops);
>  }
>  
> +#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
> +bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
> +bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
> +bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int 
> nents);
> +bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, 
> int nents);
> +#else
> +#define arch_dma_map_page_direct(d, a) (0)
> +#define arch_dma_unmap_page_direct(d, a) (0)
> +#define arch_dma_map_sg_direct(d, s, n) (0)
> +#define arch_dma_unmap_sg_direct(d, s, n) (0)
> +#endif

A bunch of overly long lines here.  Except for that this looks ok to me.
If you want me to queue up the series I can just fix it up.


Re: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation

2020-10-28 Thread Christoph Hellwig
On Wed, Oct 28, 2020 at 05:55:23PM +1100, Alexey Kardashevskiy wrote:
>
> It is passing an address of the end of the mapped area so passing a page 
> struct means passing page and offset which is an extra parameter and we do 
> not want to do anything with the page in those hooks anyway so I'd keep it 
> as is.
>
>
>> and
>> maybe even hide the dma_map_direct inside it.
>
> Call dma_map_direct() from arch_dma_map_page_direct() if 
> arch_dma_map_page_direct() is defined? Seems suboptimal as it is going to 
> be bypass=true in most cases and we save one call by avoiding calling 
> arch_dma_map_page_direct(). Unless I missed something?

C does not even evaluate the right hand side of a || expression if the
left hand evaluates to true.


Re: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation

2020-10-27 Thread Christoph Hellwig
> +static inline bool dma_handle_direct(struct device *dev, dma_addr_t 
> dma_handle)
> +{
> +   return dma_handle >= dev->archdata.dma_offset;
> +}

This won't compile except for powerpc, and directly accesing arch members
in common code is a bad idea.  Maybe both your helpers need to be
supplied by arch code to better abstract this out.

>   if (dma_map_direct(dev, ops))
>   addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
> +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT
> + else if (dev->bus_dma_limit &&
> +  can_map_direct(dev, (phys_addr_t) page_to_phys(page) + offset 
> + size))
> + addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
> +#endif

I don't think page_to_phys needs a phys_addr_t on the return value.
I'd also much prefer if we make this a little more beautiful, here
are a few suggestions:

 - hide the bus_dma_limit check inside can_map_direct, and provide a
   stub so that we can avoid the ifdef
 - use a better name for can_map_direct, and maybe also a better calling
   convention by passing the page (the sg code also has the page), and
   maybe even hide the dma_map_direct inside it.

if (dma_map_direct(dev, ops) ||
arch_dma_map_page_direct(dev, page, offset, size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);

>   BUG_ON(!valid_dma_direction(dir));
>   if (dma_map_direct(dev, ops))
>   dma_direct_unmap_page(dev, addr, size, dir, attrs);
> +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT
> + else if (dev->bus_dma_limit && dma_handle_direct(dev, addr + size))
> + dma_direct_unmap_page(dev, addr, size, dir, attrs);
> +#endif

Same here.

>   if (dma_map_direct(dev, ops))
>   ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
> +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT
> + else if (dev->bus_dma_limit) {
> + struct scatterlist *s;
> + bool direct = true;
> + int i;
> +
> + for_each_sg(sg, s, nents, i) {
> + direct = can_map_direct(dev, sg_phys(s) + s->offset + 
> s->length);
> + if (!direct)
> + break;
> + }
> + if (direct)
> + ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
> + else
> + ents = ops->map_sg(dev, sg, nents, dir, attrs);
> + }
> +#endif

This needs to go into a helper as well.  I think the same style as
above would work pretty nicely as well:

if (dma_map_direct(dev, ops) ||
arch_dma_map_sg_direct(dev, sg, nents))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);

> +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT
> + if (dev->bus_dma_limit) {
> + struct scatterlist *s;
> + bool direct = true;
> + int i;
> +
> + for_each_sg(sg, s, nents, i) {
> + direct = dma_handle_direct(dev, s->dma_address + 
> s->length);
> + if (!direct)
> + break;
> + }
> + if (direct) {
> + dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
> + return;
> + }
> + }
> +#endif

One more time here..


Re: [PATCH kernel 0/2] powerpc/dma: Fallback to dma_ops when persistent memory present

2020-10-27 Thread Christoph Hellwig
On Wed, Oct 21, 2020 at 02:20:24PM +1100, Alexey Kardashevskiy wrote:
> This allows mixing direct DMA (to/from RAM) and
> IOMMU (to/from apersistent memory) on the PPC64/pseries
> platform. This was supposed to be a single patch but
> unexpected move of direct DMA functions happened.
> 
> This is based on sha1
> 7cf726a59435 Linus Torvalds "Merge tag 'linux-kselftest-kunit-5.10-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest".
> 
> Please comment. Thanks.

I really don't like your revert.  I'm almost ready to kill of
dma-direct.h, and I really want it private in kernel/dma/, as people
keep adding abuses to drivers.

We have two options here:

 (1) duplicate the code in arch/powerpc/
 (2) add a hook to kernel/dma/

I've not been a fan of (2) in the past, but now that the code is out
of line, and we could make it dependent on a config option only set by
powerpc, I see it as the lesser evil now.


Re: [PATCH 02/10] fs: don't allow splice read/write without explicit ops

2020-10-27 Thread Christoph Hellwig
On Tue, Oct 27, 2020 at 09:51:34AM +, David Howells wrote:
> David Howells  wrote:
> 
> > > default_file_splice_write is the last piece of generic code that uses
> > > set_fs to make the uaccess routines operate on kernel pointers.  It
> > > implements a "fallback loop" for splicing from files that do not actually
> > > provide a proper splice_read method.  The usual file systems and other
> > > high bandwith instances all provide a ->splice_read, so this just removes
> > > support for various device drivers and procfs/debugfs files.  If splice
> > > support for any of those turns out to be important it can be added back
> > > by switching them to the iter ops and using generic_file_splice_read.
> > 
> > Hmmm...  this causes the copy_file_range() syscall to fail with EINVAL in 
> > some
> > places where before it used to work.
> > 
> > For my part, it causes the generic/112 xfstest to fail with afs, but there 
> > may
> > be other places.
> > 
> > Is this a regression we need to fix in the VFS core?  Or is it something we
> > need to fix in xfstests and assume userspace will fallback to doing it 
> > itself?
> 
> That said, for afs at least, the fix seems to be just this:

And that is the correct fix, I was about to send it to you.

We can't have a "generic" splice using ->read/->write without set_fs,
in addition to the iter_file_splice_write based version being a lot
more efficient than what you had before.


Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"

2020-10-22 Thread Christoph Hellwig
On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> My thinking: if the compiler that calls import_iovec() has garbage in
> the upper 32 bit
> 
> a) gcc will zero it out and not rely on it being zero.
> b) clang will not zero it out, assuming it is zero.
> 
> But
> 
> a) will zero it out when calling the !inlined variant
> b) clang will zero it out when calling the !inlined variant
> 
> When inlining, b) strikes. We access garbage. That would mean that we
> have calling code that's not generated by clang/gcc IIUC.

Most callchains of import_iovec start with the assembly syscall wrappers.


Re: [PATCH 3/8] powerpc: Mark functions called inside uaccess blocks w/ 'notrace'

2020-10-16 Thread Christoph Hellwig
On Thu, Oct 15, 2020 at 10:01:54AM -0500, Christopher M. Riedl wrote:
> Functions called between user_*_access_begin() and user_*_access_end()
> should be either inlined or marked 'notrace' to prevent leaving
> userspace access exposed. Mark any such functions relevant to signal
> handling so that subsequent patches can call them inside uaccess blocks.

I don't think running this much code with uaccess enabled is a good
idea.  Please refactor the code to reduce the criticial sections with
uaccess enabled.

Btw, does powerpc already have the objtool validation that we don't
accidentally jump out of unsafe uaccess critical sections?


Re: [PATCH 1/8] powerpc/uaccess: Add unsafe_copy_from_user

2020-10-16 Thread Christoph Hellwig
On Thu, Oct 15, 2020 at 10:01:52AM -0500, Christopher M. Riedl wrote:
> Implement raw_copy_from_user_allowed() which assumes that userspace read
> access is open. Use this new function to implement raw_copy_from_user().
> Finally, wrap the new function to follow the usual "unsafe_" convention
> of taking a label argument. The new raw_copy_from_user_allowed() calls
> __copy_tofrom_user() internally, but this is still safe to call in user
> access blocks formed with user_*_access_begin()/user_*_access_end()
> since asm functions are not instrumented for tracing.

Please also add a fallback unsafe_copy_from_user to linux/uaccess.h
so this can be used as a generic API.


Re: [RFC v1 0/2] Plumbing to support multiple secure memory backends.

2020-10-14 Thread Christoph Hellwig
Please don't add an abstraction without a second implementation.
Once we have the implementation we can consider the tradeoffs.  E.g.
if expensive indirect function calls are really needed vs simple
branches.


Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops

2020-10-13 Thread Christoph Hellwig
On Sat, Oct 10, 2020 at 01:55:24AM +, Alexander Viro wrote:
> FWIW, I hadn't pushed that branch out (or merged it into #for-next yet);
> for one thing, uml part (mconsole) is simply broken, for another...
> IMO ##5--8 are asking for kernel_pread() and if you look at binfmt_elf.c,
> you'll see elf_read() being pretty much that.  acct.c, keys and usermode
> parts are asking for kernel_pwrite() as well.
> 
> I've got stuck looking through the drivers/target stuff - it would've
> been another kernel_pwrite() candidate, but it smells like its use of
> filp_open() is really asking for trouble, starting with symlink attacks.
> Not sure - I'm not familiar with the area, but...

Can you just pull in the minimal fix so that the branch gets fixed
for this merge window?  All the cleanups can come later.


Re: [PATCH RFC PKS/PMEM 24/58] fs/freevxfs: Utilize new kmap_thread()

2020-10-13 Thread Christoph Hellwig
> - kaddr = kmap(pp);
> + kaddr = kmap_thread(pp);
>   memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE);
> - kunmap(pp);
> + kunmap_thread(pp);

You only Cced me on this particular patch, which means I have absolutely
no idea what kmap_thread and kunmap_thread actually do, and thus can't
provide an informed review.

That being said I think your life would be a lot easier if you add
helpers for the above code sequence and its counterpart that copies
to a potential hughmem page first, as that hides the implementation
details from most users.


Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length

2020-10-08 Thread Christoph Hellwig
On Wed, Oct 07, 2020 at 04:42:55PM +0200, Jann Horn wrote:
> > > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t 
> > > len,
> > >  {
> > >   long ret = -EINVAL;
> > >
> > > - if (!arch_validate_prot(prot, addr))
> > > + if (!arch_validate_prot(prot, addr, len))
> >
> > This call isn't under mmap lock.  I also find it rather weird as the
> > generic code only calls arch_validate_prot from mprotect, only powerpc
> > also calls it from mmap.
> >
> > This seems to go back to commit ef3d3246a0d0
> > ("powerpc/mm: Add Strong Access Ordering support")
> 
> I'm _guessing_ the idea in the generic case might be that mmap()
> doesn't check unknown bits in the protection flags, and therefore
> maybe people wanted to avoid adding new error cases that could be
> caused by random high bits being set? So while the mprotect() case
> checks the flags and refuses unknown values, the mmap() code just lets
> the architecture figure out which bits are actually valid to set (via
> arch_calc_vm_prot_bits()) and silently ignores the rest?
> 
> And powerpc apparently decided that they do want to error out on bogus
> prot values passed to their version of mmap(), and in exchange, assume
> in arch_calc_vm_prot_bits() that the protection bits are valid?

The problem really is that now programs behave different on powerpc
compared to all other architectures.

> powerpc's arch_validate_prot() doesn't actually need the mmap lock, so
> I think this is fine-ish for now (as in, while the code is a bit
> unclean, I don't think I'm making it worse, and I don't think it's
> actually buggy). In theory, we could move the arch_validate_prot()
> call over into the mmap guts, where we're holding the lock, and gate
> it on the architecture or on some feature CONFIG that powerpc can
> activate in its Kconfig. But I'm not sure whether that'd be helping or
> making things worse, so when I sent this patch, I deliberately left
> the powerpc stuff as-is.

For now I'd just duplicate the trivial logic from arch_validate_prot
in the powerpc version of do_mmap2 and add a comment that this check
causes a gratious incompatibility to all other architectures.  And then
hope that the powerpc maintainers fix it up :)


Re: [PATCH 2/2] sparc: Check VMA range in sparc_validate_prot()

2020-10-07 Thread Christoph Hellwig
> +++ b/arch/sparc/include/asm/mman.h
> @@ -60,31 +60,41 @@ static inline int sparc_validate_prot(unsigned long prot, 
> unsigned long addr,
>   if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_ADI))
>   return 0;
>   if (prot & PROT_ADI) {
> + struct vm_area_struct *vma, *next;
> +

I'd split all the ADI logic into a separate, preferable out of line
helper.

> + /* reached the end of the range without errors? */
> + if (addr+len <= vma->vm_end)

missing whitespaces around the arithmetic operator.


Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length

2020-10-07 Thread Christoph Hellwig
On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote:
> diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
> index 078608ec2e92..b1fabb97d138 100644
> --- a/arch/powerpc/kernel/syscalls.c
> +++ b/arch/powerpc/kernel/syscalls.c
> @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len,
>  {
>   long ret = -EINVAL;
>  
> - if (!arch_validate_prot(prot, addr))
> + if (!arch_validate_prot(prot, addr, len))

This call isn't under mmap lock.  I also find it rather weird as the
generic code only calls arch_validate_prot from mprotect, only powerpc
also calls it from mmap.

This seems to go back to commit ef3d3246a0d0
("powerpc/mm: Add Strong Access Ordering support")


Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask

2020-09-24 Thread Christoph Hellwig
On Thu, Sep 24, 2020 at 05:03:11PM +1000, Alexey Kardashevskiy wrote:
> May be... The current behavior is not wrong (after the fix) but not
> optimal either. Even with legacy PCI it should just result in failing
> attempt to set 64bit mask which drivers should still handle, i.e. choose
> a shorter mask.

Err, no.

> Why not ditch the whole dma_get_required_mask() and just fail on setting
> a bigger mask? Are these failures not handled in some drivers? Or there
> are cases when a shorter mask is better? Thanks,

Because that is a complete pain.  Think of it, the device/driver knows
what it supports.  For 98% of the modern devices that means all 64-bit
bits, and for most others this means 32-bits, with a few wackos that
support 48 bits or something like that.  The 98% just take any address
thrown at them, and the others just care that they never see an
address larger than what they support.  They could not care any less
if the systems supports 31, 36, 41, 48, 52, 55, 61 or 63-bit addressing,
an they most certainly should not implement stupid boilerplate code to
guess what addressing mode the system implements.  They just declare
what they support.

Then you have the 12 drivers for devices that can do optimizations if
they never see large DMA addresses.  They use the somewhat misnamed
dma_get_required_mask API to query what the largest address they might
see is and act based on that, while not putting any burden on all the
sane devices/drivers.


[PATCH 1/9] compat.h: fix a spelling error in

2020-09-24 Thread Christoph Hellwig
There is no compat_sys_readv64v2 syscall, only a compat_sys_preadv64v2
one.

Signed-off-by: Christoph Hellwig 
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..654c1ec36671a4 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
const struct compat_iovec __user *vec,
unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0



[PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c

2020-09-24 Thread Christoph Hellwig
From: David Laight 

This lets the compiler inline it into import_iovec() generating
much better code.

Signed-off-by: David Laight 
Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c | 179 
 lib/iov_iter.c  | 176 +++
 2 files changed, 176 insertions(+), 179 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..e5e891a88442ef 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,185 +752,6 @@ static ssize_t do_loop_readv_writev(struct file *filp, 
struct iov_iter *iter,
return ret;
 }
 
-/**
- * rw_copy_check_uvector() - Copy an array of  iovec from userspace
- * into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- * either @fast_pointer, a newly allocated kernel array, or NULL,
- * depending on which array was used.
- *
- * This function copies an array of  iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer)
-{
-   unsigned long seg;
-   ssize_t ret;
-   struct iovec *iov = fast_pointer;
-
-   /*
-* SuS says "The readv() function *may* fail if the iovcnt argument
-* was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-* traditionally returned zero for zero segments, so...
-*/
-   if (nr_segs == 0) {
-   ret = 0;
-   goto out;
-   }
-
-   /*
-* First get the "struct iovec" from user memory and
-* verify all the pointers
-*/
-   if (nr_segs > UIO_MAXIOV) {
-   ret = -EINVAL;
-   goto out;
-   }
-   if (nr_segs > fast_segs) {
-   iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-   if (iov == NULL) {
-   ret = -ENOMEM;
-   goto out;
-   }
-   }
-   if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-   ret = -EFAULT;
-   goto out;
-   }
-
-   /*
-* According to the Single Unix Specification we should return EINVAL
-* if an element length is < 0 when cast to ssize_t or if the
-* total length would overflow the ssize_t return value of the
-* system call.
-*
-* Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
-* overflow case.
-*/
-   ret = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   void __user *buf = iov[seg].iov_base;
-   ssize_t len = (ssize_t)iov[seg].iov_len;
-
-   /* see if we we're about to use an invalid len or if
-* it's about to overflow ssize_t */
-   if (len < 0) {
-   ret = -EINVAL;
-   goto out;
-   }
-   if (type >= 0
-   && unlikely(!access_ok(buf, len))) {
-   ret = -EFAULT;
-   goto out;
-   }
-   if (len > MAX_RW_COUNT - ret) {
-   len = MAX_RW_COUNT - ret;
-   iov[seg].iov_len = len;
-   }
-   ret += len;
-   }
-out:
-   *ret_pointer = iov;
-   return ret;
-}
-
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-   const struct compat_iovec __user *uvector, unsigned long 
nr_segs,
-   unsigned long fast_segs, struct iovec *fast_pointer,
-   struct iovec **ret_pointer)
-{
-   compat_ssize_t tot_len;
-   struct iovec *iov = *ret_pointer = fast_pointer;
-   ssize_t ret = 0;
-

[PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec

2020-09-24 Thread Christoph Hellwig
Split rw_copy_check_uvector into two new helpers with more sensible
calling conventions:

 - iovec_from_user copies a iovec from userspace either into the provided
   stack buffer if it fits, or allocates a new buffer for it.  Returns
   the actually used iovec.  It also verifies that iov_len does fit a
   signed type, and handles compat iovecs if the compat flag is set.
 - __import_iovec consolidates the native and compat versions of
   import_iovec. It calls iovec_from_user, then validates each iovec
   actually points to user addresses, and ensures the total length
   doesn't overflow.

This has two major implications:

 - the access_process_vm case loses the total lenght checking, which
   wasn't required anyway, given that each call receives two iovecs
   for the local and remote side of the operation, and it verifies
   the total length on the local side already.
 - instead of a single loop there now are two loops over the iovecs.
   Given that the iovecs are cache hot this doesn't make a major
   difference

Signed-off-by: Christoph Hellwig 
---
 include/linux/compat.h |   6 -
 include/linux/fs.h |  13 --
 include/linux/uio.h|  12 +-
 lib/iov_iter.c | 300 -
 mm/process_vm_access.c |  34 +++--
 5 files changed, 138 insertions(+), 227 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 654c1ec36671a4..b930de791ff16b 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -451,12 +451,6 @@ extern long compat_arch_ptrace(struct task_struct *child, 
compat_long_t request,
 
 struct epoll_event;/* fortunately, this one is fixed-layout */
 
-extern ssize_t compat_rw_copy_check_uvector(int type,
-   const struct compat_iovec __user *uvector,
-   unsigned long nr_segs,
-   unsigned long fast_segs, struct iovec *fast_pointer,
-   struct iovec **ret_pointer);
-
 extern void __user *compat_alloc_user_space(unsigned long len);
 
 int compat_restore_altstack(const compat_stack_t __user *uss);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a082c..e69b45b6cc7b5f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -178,14 +178,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t 
offset,
 /* File supports async buffered reads */
 #define FMODE_BUF_RASYNC   ((__force fmode_t)0x4000)
 
-/*
- * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
- * that indicates that they should check the contents of the iovec are
- * valid, but not check the memory that the iovec elements
- * points too.
- */
-#define CHECK_IOVEC_ONLY -1
-
 /*
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
@@ -1887,11 +1879,6 @@ static inline int call_mmap(struct file *file, struct 
vm_area_struct *vma)
return file->f_op->mmap(file, vma);
 }
 
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer);
-
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 3835a8a8e9eae0..92c11fe41c6228 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -266,9 +266,15 @@ bool csum_and_copy_from_iter_full(void *addr, size_t 
bytes, __wsum *csum, struct
 size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i);
 
-ssize_t import_iovec(int type, const struct iovec __user * uvector,
-unsigned nr_segs, unsigned fast_segs,
-struct iovec **iov, struct iov_iter *i);
+struct iovec *iovec_from_user(const struct iovec __user *uvector,
+   unsigned long nr_segs, unsigned long fast_segs,
+   struct iovec *fast_iov, bool compat);
+ssize_t import_iovec(int type, const struct iovec __user *uvec,
+unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+struct iov_iter *i);
+ssize_t __import_iovec(int type, const struct iovec __user *uvec,
+unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+struct iov_iter *i, bool compat);
 
 #ifdef CONFIG_COMPAT
 struct compat_iovec;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ccea9db3f72be8..d5d8afe31fca16 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1650,107 +1651,133 @@ const void *dup_iter(struct iov_iter *new, struct 
iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
-/**
- * rw_copy_check_uvector() - Copy an array of  iovec from use

[PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov

2020-09-24 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 security/keys/compat.c   | 36 ++--
 security/keys/internal.h |  5 -
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include 
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-   key_serial_t id,
-   const struct compat_iovec __user *_payload_iov,
-   unsigned ioc,
-   key_serial_t ringid)
-{
-   struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-   struct iov_iter from;
-   long ret;
-
-   if (!_payload_iov)
-   ioc = 0;
-
-   ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-  ioc, ARRAY_SIZE(iovstack), , );
-   if (ret < 0)
-   return ret;
-
-   ret = keyctl_instantiate_key_common(id, , ringid);
-   kfree(iov);
-   return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
case KEYCTL_INSTANTIATE_IOV:
-   return compat_keyctl_instantiate_key_iov(
-   arg2, compat_ptr(arg3), arg4, arg5);
+   return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+ arg5);
 
case KEYCTL_INVALIDATE:
return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
   const struct iovec __user *,
   unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
- struct iov_iter *,
- key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
const char __user *_type,
const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
   struct iov_iter *from,
   key_serial_t ringid)
 {
-- 
2.28.0



let import_iovec deal with compat_iovecs as well v4

2020-09-24 Thread Christoph Hellwig
Hi Al,

this series changes import_iovec to transparently deal with compat iovec
structures, and then cleanups up a lot of code dupliation.

Changes since v3:
 - fix up changed prototypes in compat.h as well

Changes since v2:
 - revert the switch of the access process vm sysclls to iov_iter
 - refactor the import_iovec internals differently
 - switch aio to use __import_iovec

Changes since v1:
 - improve a commit message
 - drop a pointless unlikely
 - drop the PF_FORCE_COMPAT flag
 - add a few more cleanups (including two from David Laight)

Diffstat:
 arch/arm64/include/asm/unistd32.h  |   10 
 arch/mips/kernel/syscalls/syscall_n32.tbl  |   10 
 arch/mips/kernel/syscalls/syscall_o32.tbl  |   10 
 arch/parisc/kernel/syscalls/syscall.tbl|   10 
 arch/powerpc/kernel/syscalls/syscall.tbl   |   10 
 arch/s390/kernel/syscalls/syscall.tbl  |   10 
 arch/sparc/kernel/syscalls/syscall.tbl |   10 
 arch/x86/entry/syscall_x32.c   |5 
 arch/x86/entry/syscalls/syscall_32.tbl |   10 
 arch/x86/entry/syscalls/syscall_64.tbl |   10 
 block/scsi_ioctl.c |   12 
 drivers/scsi/sg.c  |9 
 fs/aio.c   |   38 --
 fs/io_uring.c  |   20 -
 fs/read_write.c|  362 +
 fs/splice.c|   57 ---
 include/linux/compat.h |   24 -
 include/linux/fs.h |   11 
 include/linux/uio.h|   10 
 include/uapi/asm-generic/unistd.h  |   12 
 lib/iov_iter.c |  161 +++--
 mm/process_vm_access.c |   85 
 net/compat.c   |4 
 security/keys/compat.c |   37 --
 security/keys/internal.h   |5 
 security/keys/keyctl.c |2 
 tools/include/uapi/asm-generic/unistd.h|   12 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |   10 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|   10 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |   10 
 30 files changed, 280 insertions(+), 706 deletions(-)


[PATCH 7/9] fs: remove compat_sys_vmsplice

2020-09-24 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  2 +-
 arch/x86/entry/syscall_x32.c  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl|  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 fs/splice.c   | 57 +--
 include/linux/compat.h|  4 --
 include/uapi/asm-generic/unistd.h |  2 +-
 tools/include/uapi/asm-generic/unistd.h   |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, 
compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267n32 splice  sys_splice
 268n32 sync_file_range sys_sync_file_range
 269n32 tee sys_tee
-270n32 vmsplicecompat_sys_vmsplice
+270n32 vmsplicesys_vmsplice
 271n32 move_pages  compat_sys_move_pages
 272n32 set_robust_list compat_sys_set_robust_list
 273n32 get_robust_list compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304o32 splice  sys_splice
 305o32 sync_file_range sys_sync_file_range 
sys32_sync_file_range
 306o32 tee sys_tee
-307o32 vmsplicesys_vmsplice
compat_sys_vmsplice
+307o32 vmsplicesys_vmsplice
 308o32 move_pages  sys_move_pages  
compat_sys_move_pages
 309o32 set_robust_list sys_set_robust_list 
compat_sys_set_robust_list
 310o32 get_robust_list sys_get_robust_list 
compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 29232  sync_file_range parisc_sync_file_range
 29264  sync_file_range sys_sync_file_range
 293common  tee sys_tee
-294common  vmsplicesys_vmsplice
compat_sys_vmsplice
+294common  vmsplicesys_vmsplice
 295common  move_pages  sys_move_pages  
compat_sys_move_pages
 296common  getcpu  sys_getcpu
 297common  epoll_pwait sys_epoll_pwait 
compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282common  unshare sys_unshare
 283common  splice  sys_splice
 284common  tee sys_tee
-285common  vmsplicesys_vmsplice
compat_sys_vmsplice
+285common  vmsplicesys_vmsplice
 286common  openat  sys_openat  
compat_sys_openat
 287common

[PATCH 5/9] fs: remove various compat readv/writev helpers

2020-09-24 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs as well, all the duplicated
code in the compat readv/writev helpers is not needed.  Remove them
and switch the compat syscall handlers to use the native helpers.

Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c| 179 +++--
 include/linux/compat.h |  20 ++---
 2 files changed, 40 insertions(+), 159 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 0a68037580b455..eab427b7cc0a3f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1068,226 +1068,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const 
struct iovec __user *, vec,
return do_pwritev(fd, vec, vlen, pos, flags);
 }
 
+/*
+ * Various compat syscalls.  Note that they all pretend to take a native
+ * iovec - import_iovec will properly treat those as compat_iovecs based on
+ * in_compat_syscall().
+ */
 #ifdef CONFIG_COMPAT
-static size_t compat_readv(struct file *file,
-  const struct compat_iovec __user *vec,
-  unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *iov = iovstack;
-   struct iov_iter iter;
-   ssize_t ret;
-
-   ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
-  UIO_FASTIOV, , );
-   if (ret >= 0) {
-   ret = do_iter_read(file, , pos, flags);
-   kfree(iov);
-   }
-   if (ret > 0)
-   add_rchar(current, ret);
-   inc_syscr(current);
-   return ret;
-}
-
-static size_t do_compat_readv(compat_ulong_t fd,
-const struct compat_iovec __user *vec,
-compat_ulong_t vlen, rwf_t flags)
-{
-   struct fd f = fdget_pos(fd);
-   ssize_t ret;
-   loff_t pos;
-
-   if (!f.file)
-   return -EBADF;
-   pos = f.file->f_pos;
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   if (ret >= 0)
-   f.file->f_pos = pos;
-   fdput_pos(f);
-   return ret;
-
-}
-
 COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen)
 {
-   return do_compat_readv(fd, vec, vlen, 0);
-}
-
-static long do_compat_preadv64(unsigned long fd,
- const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos, rwf_t flags)
-{
-   struct fd f;
-   ssize_t ret;
-
-   if (pos < 0)
-   return -EINVAL;
-   f = fdget(fd);
-   if (!f.file)
-   return -EBADF;
-   ret = -ESPIPE;
-   if (f.file->f_mode & FMODE_PREAD)
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   fdput(f);
-   return ret;
+   return do_readv(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos)
 {
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
+   return do_readv(fd, vec, vlen, flags);
+   return do_preadv(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
rwf_t, flags)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
-}
-
-static size_t compat_writev(struct file *file,
-   const struct compat_iovec __user *vec,
-   unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UI

[PATCH 6/9] fs: remove the compat readv/writev syscalls

2020-09-24 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl  |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl  |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl|  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl   |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl  |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl |  4 ++--
 arch/x86/entry/syscall_x32.c   |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl |  4 ++--
 fs/read_write.c| 14 --
 include/linux/compat.h |  4 
 include/uapi/asm-generic/unistd.h  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h|  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15 n32 ioctl   compat_sys_ioctl
 16 n32 pread64 sys_pread64
 17 n32 pwrite64sys_pwrite64
-18 n32 readv   compat_sys_readv
-19 n32 writev  compat_sys_writev
+18 n32 readv   sys_readv
+19 n32 writev  sys_writev
 20 n32 access  sys_access
 21 n32 pipesysm_pipe
 22 n32 _newselect  compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142o32 _newselect  sys_select  
compat_sys_select
 143o32 flock   sys_flock
 144o32 msync   sys_msync
-145o32 readv   sys_readv   
compat_sys_readv
-146o32 writev  sys_writev  
compat_sys_writev
+145o32 readv   sys_readv
+146o32 writev  sys_writev
 147o32 cacheflush  sys_cacheflush
 148o32 cachectlsys_cachectl
 149o32 sysmips __sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common  flock   sys_flock
 144common  msync   sys_msync
-145common  readv   sys_readv   
compat_sys_readv
-146common  writev  sys_writev  
compat_sys_writev
+145common  readv   sys_readv
+146common  writev  sys_writev
 147common  getsid  sys_getsid
 148common  fdatasync   sys_fdatasync
 149common  _sysctl sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common

[PATCH 8/9] mm: remove compat_process_vm_{readv,writev}

2020-09-24 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  4 +-
 arch/x86/entry/syscall_x32.c  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl|  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 include/linux/compat.h|  8 ---
 include/uapi/asm-generic/unistd.h |  6 +-
 mm/process_vm_access.c| 69 ---
 tools/include/uapi/asm-generic/unistd.h   |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 17 files changed, 30 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306n32 syncfs  sys_syncfs
 307n32 sendmmsgcompat_sys_sendmmsg
 308n32 setns   sys_setns
-309n32 process_vm_readvcompat_sys_process_vm_readv
-310n32 process_vm_writev   compat_sys_process_vm_writev
+309n32 process_vm_readvsys_process_vm_readv
+310n32 process_vm_writev   sys_process_vm_writev
 311n32 kcmpsys_kcmp
 312n32 finit_modulesys_finit_module
 313n32 sched_setattr   sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342o32 syncfs  sys_syncfs
 343o32 sendmmsgsys_sendmmsg
compat_sys_sendmmsg
 344o32 setns   sys_setns
-345o32 process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-346o32 process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+345o32 process_vm_readvsys_process_vm_readv
+346o32 process_vm_writev   sys_process_vm_writev
 347o32 kcmpsys_kcmp
 348o32 finit_modulesys_finit_module
 349o32 sched_setattr   sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327common  syncfs  sys_syncfs
 328common  setns   sys_setns
 329common  sendmmsgsys_sendmmsg
compat_sys_sendmmsg
-330common  process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-331common  process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+330common  process_vm_readvsys_process_vm_readv
+331common  process_vm_writev   sys_process_vm_writev
 332common  kcmpsys_kcmp
 333common  finit_modulesys_finit_module
 334common  sched_setattr   sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls

[PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec

2020-09-24 Thread Christoph Hellwig
Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.

Signed-off-by: Christoph Hellwig 
---
 block/scsi_ioctl.c | 12 ++--
 drivers/scsi/sg.c  |  9 +
 fs/aio.c   |  8 ++--
 fs/io_uring.c  | 20 
 fs/read_write.c|  6 --
 fs/splice.c|  2 +-
 include/linux/uio.h|  8 
 lib/iov_iter.c | 14 ++
 mm/process_vm_access.c |  3 ++-
 net/compat.c   |  4 ++--
 security/keys/compat.c |  5 ++---
 11 files changed, 26 insertions(+), 65 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk 
*bd_disk,
struct iov_iter i;
struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   ret = compat_import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
-   else
-#endif
-   ret = import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
+   ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+  hdr->iovec_count, 0, , );
if (ret < 0)
goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
struct iovec *iov = NULL;
struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   res = compat_import_iovec(rw, hp->dxferp, iov_count,
- 0, , );
-   else
-#endif
-   res = import_iovec(rw, hp->dxferp, iov_count,
-  0, , );
+   res = import_iovec(rw, hp->dxferp, iov_count, 0, , );
if (res < 0)
return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..c45c20d875388c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1489,12 +1489,8 @@ static ssize_t aio_setup_rw(int rw, const struct iocb 
*iocb,
*iovec = NULL;
return ret;
}
-#ifdef CONFIG_COMPAT
-   if (compat)
-   return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-   iter);
-#endif
-   return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
+
+   return __import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter, compat);
 }
 
 static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8b426aa29668cb..8c27dc28da182a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2852,13 +2852,8 @@ static ssize_t __io_import_iovec(int rw, struct io_kiocb 
*req,
return ret;
}
 
-#ifdef CONFIG_COMPAT
-   if (req->ctx->compat)
-   return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV,
-   iovec, iter);
-#endif
-
-   return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter);
+   return __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter,
+ req->ctx->compat);
 }
 
 static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
@@ -4200,8 +4195,9 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
sr->len);
iomsg->iov = NULL;
} else {
-   ret = import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
-   >iov, >msg.msg_iter);
+   ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
+>iov, >msg.msg_iter,
+false);
if (ret > 0)
ret = 0;
}
@@ -4241,9 +4237,9 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb 
*req,
sr->len = iomsg->iov[0].iov_len;
iomsg->iov = NULL;
} else {
-   ret = compat_import_iovec(READ, uiov, len, UIO_FASTIOV,
-   >iov,
-   >msg.msg_iter);
+   ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
+

[PATCH] powerpc: switch 85xx defconfigs from legacy ide to libata

2020-09-23 Thread Christoph Hellwig
Switch the 85xx defconfigs from the soon to be removed legacy ide
driver to libata.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/configs/85xx/mpc85xx_cds_defconfig | 6 +++---
 arch/powerpc/configs/85xx/tqm8540_defconfig | 6 +++---
 arch/powerpc/configs/85xx/tqm8541_defconfig | 6 +++---
 arch/powerpc/configs/85xx/tqm8555_defconfig | 6 +++---
 arch/powerpc/configs/85xx/tqm8560_defconfig | 6 +++---
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/configs/85xx/mpc85xx_cds_defconfig 
b/arch/powerpc/configs/85xx/mpc85xx_cds_defconfig
index 0683d8c292a89b..cea72e85ed2614 100644
--- a/arch/powerpc/configs/85xx/mpc85xx_cds_defconfig
+++ b/arch/powerpc/configs/85xx/mpc85xx_cds_defconfig
@@ -29,9 +29,9 @@ CONFIG_SYN_COOKIES=y
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_SIZE=32768
-CONFIG_IDE=y
-CONFIG_BLK_DEV_GENERIC=y
-CONFIG_BLK_DEV_VIA82CXXX=y
+CONFIG_ATA=y
+CONFIG_ATA_GENERIC=y
+CONFIG_PATA_VIA=y
 CONFIG_NETDEVICES=y
 CONFIG_GIANFAR=y
 CONFIG_E1000=y
diff --git a/arch/powerpc/configs/85xx/tqm8540_defconfig 
b/arch/powerpc/configs/85xx/tqm8540_defconfig
index 98982a0e82d804..bbf040aa1f9aa7 100644
--- a/arch/powerpc/configs/85xx/tqm8540_defconfig
+++ b/arch/powerpc/configs/85xx/tqm8540_defconfig
@@ -30,9 +30,9 @@ CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_SIZE=32768
-CONFIG_IDE=y
-CONFIG_BLK_DEV_GENERIC=y
-CONFIG_BLK_DEV_VIA82CXXX=y
+CONFIG_ATA=y
+CONFIG_ATA_GENERIC=y
+CONFIG_PATA_VIA=y
 CONFIG_NETDEVICES=y
 CONFIG_GIANFAR=y
 CONFIG_E100=y
diff --git a/arch/powerpc/configs/85xx/tqm8541_defconfig 
b/arch/powerpc/configs/85xx/tqm8541_defconfig
index a6e21db1dafe3a..523ad8dcfd9d08 100644
--- a/arch/powerpc/configs/85xx/tqm8541_defconfig
+++ b/arch/powerpc/configs/85xx/tqm8541_defconfig
@@ -30,9 +30,9 @@ CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_SIZE=32768
-CONFIG_IDE=y
-CONFIG_BLK_DEV_GENERIC=y
-CONFIG_BLK_DEV_VIA82CXXX=y
+CONFIG_ATA=y
+CONFIG_ATA_GENERIC=y
+CONFIG_PATA_VIA=y
 CONFIG_NETDEVICES=y
 CONFIG_GIANFAR=y
 CONFIG_E100=y
diff --git a/arch/powerpc/configs/85xx/tqm8555_defconfig 
b/arch/powerpc/configs/85xx/tqm8555_defconfig
index ca1de3979474ba..0032ce1e8c9cbf 100644
--- a/arch/powerpc/configs/85xx/tqm8555_defconfig
+++ b/arch/powerpc/configs/85xx/tqm8555_defconfig
@@ -30,9 +30,9 @@ CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_SIZE=32768
-CONFIG_IDE=y
-CONFIG_BLK_DEV_GENERIC=y
-CONFIG_BLK_DEV_VIA82CXXX=y
+CONFIG_ATA=y
+CONFIG_ATA_GENERIC=y
+CONFIG_PATA_VIA=y
 CONFIG_NETDEVICES=y
 CONFIG_GIANFAR=y
 CONFIG_E100=y
diff --git a/arch/powerpc/configs/85xx/tqm8560_defconfig 
b/arch/powerpc/configs/85xx/tqm8560_defconfig
index ca3b8c8ef30f9c..a80b971f7d6e0b 100644
--- a/arch/powerpc/configs/85xx/tqm8560_defconfig
+++ b/arch/powerpc/configs/85xx/tqm8560_defconfig
@@ -30,9 +30,9 @@ CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_SIZE=32768
-CONFIG_IDE=y
-CONFIG_BLK_DEV_GENERIC=y
-CONFIG_BLK_DEV_VIA82CXXX=y
+CONFIG_ATA=y
+CONFIG_ATA_GENERIC=y
+CONFIG_PATA_VIA=y
 CONFIG_NETDEVICES=y
 CONFIG_GIANFAR=y
 CONFIG_E100=y
-- 
2.28.0



Re: [PATCH 5/9] fs: remove various compat readv/writev helpers

2020-09-23 Thread Christoph Hellwig
On Wed, Sep 23, 2020 at 06:05:27PM +0100, Al Viro wrote:
> On Wed, Sep 23, 2020 at 05:38:31PM +0100, Al Viro wrote:
> > On Wed, Sep 23, 2020 at 03:59:01PM +0100, Al Viro wrote:
> > 
> > > > That's a very good question.  But it does not just compile but actually
> > > > works.  Probably because all the syscall wrappers mean that we don't
> > > > actually generate the normal names.  I just tried this:
> > > > 
> > > > --- a/include/linux/syscalls.h
> > > > +++ b/include/linux/syscalls.h
> > > > @@ -468,7 +468,7 @@ asmlinkage long sys_lseek(unsigned int fd, off_t 
> > > > offset,
> > > >  asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t 
> > > > count);
> > > >  asmlinkage long sys_write(unsigned int fd, const char __user *buf,
> > > > size_t count);
> > > > -asmlinkage long sys_readv(unsigned long fd,
> > > > +asmlinkage long sys_readv(void *fd,
> > > > 
> > > > for fun, and the compiler doesn't care either..
> > > 
> > > Try to build it for sparc or ppc...
> > 
> > FWIW, declarations in syscalls.h used to serve 4 purposes:
> > 1) syscall table initializers needed symbols declared
> > 2) direct calls needed the same
> > 3) catching mismatches between the declarations and definitions
> > 4) centralized list of all syscalls
> > 
> > (2) has been (thankfully) reduced for some time; in any case, ksys_... is
> > used for the remaining ones.
> > 
> > (1) and (3) are served by syscalls.h in architectures other than x86, arm64
> > and s390.  On those 3 (1) is done otherwise (near the syscall table 
> > initializer)
> > and (3) is not done at all.
> > 
> > I wonder if we should do something like
> > 
> > SYSCALL_DECLARE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> >  unsigned long, vlen);
> > in syscalls.h instead, and not under that ifdef.
> > 
> > Let it expand to declaration of sys_...() in generic case and, on x86, into
> > __do_sys_...() and __ia32_sys_...()/__x64_sys_...(), with types matching
> > what SYSCALL_DEFINE ends up using.
> > 
> > Similar macro would cover compat_sys_...() declarations.  That would
> > restore mismatch checking for x86 and friends.  AFAICS, the cost wouldn't
> > be terribly high - cpp would have more to chew through in syscalls.h,
> > but it shouldn't be all that costly.  Famous last words, of course...
> > 
> > Does anybody see fundamental problems with that?
> 
> Just to make it clear - I do not propose to fold that into this series;
> there we just need to keep those declarations in sync with fs/read_write.c

Agreed.  The above idea generally sounds sane to me.


Re: [PATCH 5/9] fs: remove various compat readv/writev helpers

2020-09-23 Thread Christoph Hellwig
On Wed, Sep 23, 2020 at 03:25:49PM +0100, Al Viro wrote:
> On Wed, Sep 23, 2020 at 08:05:43AM +0200, Christoph Hellwig wrote:
> >  COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
> > -   const struct compat_iovec __user *,vec,
> > +   const struct iovec __user *, vec,
> 
> Um...  Will it even compile?
> 
> >  #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
> >  COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
> > -   const struct compat_iovec __user *,vec,
> > +   const struct iovec __user *, vec,
> 
> Ditto.  Look into include/linux/compat.h and you'll see
> 
> asmlinkage long compat_sys_preadv64(unsigned long fd,
> const struct compat_iovec __user *vec,
> unsigned long vlen, loff_t pos);
> 
> How does that manage to avoid the compiler screaming bloody
> murder?

That's a very good question.  But it does not just compile but actually
works.  Probably because all the syscall wrappers mean that we don't
actually generate the normal names.  I just tried this:

--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -468,7 +468,7 @@ asmlinkage long sys_lseek(unsigned int fd, off_t offset,
 asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t count);
 asmlinkage long sys_write(unsigned int fd, const char __user *buf,
size_t count);
-asmlinkage long sys_readv(unsigned long fd,
+asmlinkage long sys_readv(void *fd,

for fun, and the compiler doesn't care either..


Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask

2020-09-23 Thread Christoph Hellwig
On Tue, Sep 22, 2020 at 12:26:18PM +1000, Alexey Kardashevskiy wrote:
> > Well, the original intent of dma_get_required_mask is to return the
> > mask that the driver then uses to figure out what to set, so what aacraid
> > does fits that use case. 
> 
> What was the original intent exactly? The driver asks for the minimum or
> maximum DMA mask the platform supports?
> 
> As for now, we (ppc64/powernv) can do:
> 1. bypass (==64bit)
> 2. a DMA window which used to be limited by 2GB but not anymore.
> 
> I can understand if the driver asked for required mask in expectation to
> receive "less or equal than 32bit" and "more than 32 bit" and choose.
> And this probably was the intent as at the time when the bug was
> introduced, the window was always smaller than 4GB.
> 
> But today the window is bigger than than (44 bits now, or a similar
> value, depends on max page order) so the returned mask is >32. Which
> still enables that DAC in aacraid but I suspect this is accidental.

I think for powernv returning 64-bit always would make a lot of sense.
AFAIK all of powernv is PCIe and not legacy PCI, so returning anything
less isn't going to help to optimize anything.

> > Of course that idea is pretty bogus for
> > PCIe devices.
> 
> Why? From the PHB side, there are windows. From the device side, there
> are many crippled devices, like, no GPU I saw in last years supported
> more than 48bit.

Yes, but dma_get_required_mask is misnamed - the mask is not required,
it is the optimal mask.  Even if the window is smaller we handle it
some way, usually by using swiotlb, or by iommu tricks in your case.

> > I suspect the right fix is to just not query dma_get_required_mask for
> > PCIe devices in aacraid (and other drivers that do something similar).
> 
> May be, if you write nice and big comment next to
> dma_get_required_mask() explaining exactly what it does, then I will
> realize I am getting this all wrong and we will move to fixing the
> drivers :)

Yes, it needs a comment or two, and probaby be renamed to
dma_get_optimal_dma_mask, and a cleanup of most users.  I've added it
to my ever growing TODO list, but I would not be unhappy if someone
else gives it a spin.


[PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec

2020-09-23 Thread Christoph Hellwig
Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.

Signed-off-by: Christoph Hellwig 
---
 block/scsi_ioctl.c | 12 ++--
 drivers/scsi/sg.c  |  9 +
 fs/aio.c   |  8 ++--
 fs/io_uring.c  | 20 
 fs/read_write.c|  6 --
 fs/splice.c|  2 +-
 include/linux/uio.h|  8 
 lib/iov_iter.c | 14 ++
 mm/process_vm_access.c |  3 ++-
 net/compat.c   |  4 ++--
 security/keys/compat.c |  5 ++---
 11 files changed, 26 insertions(+), 65 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk 
*bd_disk,
struct iov_iter i;
struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   ret = compat_import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
-   else
-#endif
-   ret = import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
+   ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+  hdr->iovec_count, 0, , );
if (ret < 0)
goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
struct iovec *iov = NULL;
struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   res = compat_import_iovec(rw, hp->dxferp, iov_count,
- 0, , );
-   else
-#endif
-   res = import_iovec(rw, hp->dxferp, iov_count,
-  0, , );
+   res = import_iovec(rw, hp->dxferp, iov_count, 0, , );
if (res < 0)
return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..c45c20d875388c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1489,12 +1489,8 @@ static ssize_t aio_setup_rw(int rw, const struct iocb 
*iocb,
*iovec = NULL;
return ret;
}
-#ifdef CONFIG_COMPAT
-   if (compat)
-   return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-   iter);
-#endif
-   return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
+
+   return __import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter, compat);
 }
 
 static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3790c7fe9fee22..ba84ecea7cb1a4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2837,13 +2837,8 @@ static ssize_t __io_import_iovec(int rw, struct io_kiocb 
*req,
return ret;
}
 
-#ifdef CONFIG_COMPAT
-   if (req->ctx->compat)
-   return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV,
-   iovec, iter);
-#endif
-
-   return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter);
+   return __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter,
+ req->ctx->compat);
 }
 
 static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
@@ -4179,8 +4174,9 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
sr->len);
iomsg->iov = NULL;
} else {
-   ret = import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
-   >iov, >msg.msg_iter);
+   ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
+>iov, >msg.msg_iter,
+false);
if (ret > 0)
ret = 0;
}
@@ -4220,9 +4216,9 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb 
*req,
sr->len = iomsg->iov[0].iov_len;
iomsg->iov = NULL;
} else {
-   ret = compat_import_iovec(READ, uiov, len, UIO_FASTIOV,
-   >iov,
-   >msg.msg_iter);
+   ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
+

[PATCH 1/9] compat.h: fix a spelling error in

2020-09-23 Thread Christoph Hellwig
There is no compat_sys_readv64v2 syscall, only a compat_sys_preadv64v2
one.

Signed-off-by: Christoph Hellwig 
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..654c1ec36671a4 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
const struct compat_iovec __user *vec,
unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0



[PATCH 5/9] fs: remove various compat readv/writev helpers

2020-09-23 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs as well, all the duplicated
code in the compat readv/writev helpers is not needed.  Remove them
and switch the compat syscall handlers to use the native helpers.

Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c | 179 
 1 file changed, 30 insertions(+), 149 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 0a68037580b455..eab427b7cc0a3f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1068,226 +1068,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const 
struct iovec __user *, vec,
return do_pwritev(fd, vec, vlen, pos, flags);
 }
 
+/*
+ * Various compat syscalls.  Note that they all pretend to take a native
+ * iovec - import_iovec will properly treat those as compat_iovecs based on
+ * in_compat_syscall().
+ */
 #ifdef CONFIG_COMPAT
-static size_t compat_readv(struct file *file,
-  const struct compat_iovec __user *vec,
-  unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *iov = iovstack;
-   struct iov_iter iter;
-   ssize_t ret;
-
-   ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
-  UIO_FASTIOV, , );
-   if (ret >= 0) {
-   ret = do_iter_read(file, , pos, flags);
-   kfree(iov);
-   }
-   if (ret > 0)
-   add_rchar(current, ret);
-   inc_syscr(current);
-   return ret;
-}
-
-static size_t do_compat_readv(compat_ulong_t fd,
-const struct compat_iovec __user *vec,
-compat_ulong_t vlen, rwf_t flags)
-{
-   struct fd f = fdget_pos(fd);
-   ssize_t ret;
-   loff_t pos;
-
-   if (!f.file)
-   return -EBADF;
-   pos = f.file->f_pos;
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   if (ret >= 0)
-   f.file->f_pos = pos;
-   fdput_pos(f);
-   return ret;
-
-}
-
 COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen)
 {
-   return do_compat_readv(fd, vec, vlen, 0);
-}
-
-static long do_compat_preadv64(unsigned long fd,
- const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos, rwf_t flags)
-{
-   struct fd f;
-   ssize_t ret;
-
-   if (pos < 0)
-   return -EINVAL;
-   f = fdget(fd);
-   if (!f.file)
-   return -EBADF;
-   ret = -ESPIPE;
-   if (f.file->f_mode & FMODE_PREAD)
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   fdput(f);
-   return ret;
+   return do_readv(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos)
 {
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
+   return do_readv(fd, vec, vlen, flags);
+   return do_preadv(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
rwf_t, flags)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
-}
-
-static size_t compat_writev(struct file *file,
-   const struct compat_iovec __user *vec,
-   unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *io

[PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c

2020-09-23 Thread Christoph Hellwig
From: David Laight 

This lets the compiler inline it into import_iovec() generating
much better code.

Signed-off-by: David Laight 
Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c | 179 
 lib/iov_iter.c  | 176 +++
 2 files changed, 176 insertions(+), 179 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..e5e891a88442ef 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,185 +752,6 @@ static ssize_t do_loop_readv_writev(struct file *filp, 
struct iov_iter *iter,
return ret;
 }
 
-/**
- * rw_copy_check_uvector() - Copy an array of  iovec from userspace
- * into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- * either @fast_pointer, a newly allocated kernel array, or NULL,
- * depending on which array was used.
- *
- * This function copies an array of  iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer)
-{
-   unsigned long seg;
-   ssize_t ret;
-   struct iovec *iov = fast_pointer;
-
-   /*
-* SuS says "The readv() function *may* fail if the iovcnt argument
-* was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-* traditionally returned zero for zero segments, so...
-*/
-   if (nr_segs == 0) {
-   ret = 0;
-   goto out;
-   }
-
-   /*
-* First get the "struct iovec" from user memory and
-* verify all the pointers
-*/
-   if (nr_segs > UIO_MAXIOV) {
-   ret = -EINVAL;
-   goto out;
-   }
-   if (nr_segs > fast_segs) {
-   iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-   if (iov == NULL) {
-   ret = -ENOMEM;
-   goto out;
-   }
-   }
-   if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-   ret = -EFAULT;
-   goto out;
-   }
-
-   /*
-* According to the Single Unix Specification we should return EINVAL
-* if an element length is < 0 when cast to ssize_t or if the
-* total length would overflow the ssize_t return value of the
-* system call.
-*
-* Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
-* overflow case.
-*/
-   ret = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   void __user *buf = iov[seg].iov_base;
-   ssize_t len = (ssize_t)iov[seg].iov_len;
-
-   /* see if we we're about to use an invalid len or if
-* it's about to overflow ssize_t */
-   if (len < 0) {
-   ret = -EINVAL;
-   goto out;
-   }
-   if (type >= 0
-   && unlikely(!access_ok(buf, len))) {
-   ret = -EFAULT;
-   goto out;
-   }
-   if (len > MAX_RW_COUNT - ret) {
-   len = MAX_RW_COUNT - ret;
-   iov[seg].iov_len = len;
-   }
-   ret += len;
-   }
-out:
-   *ret_pointer = iov;
-   return ret;
-}
-
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-   const struct compat_iovec __user *uvector, unsigned long 
nr_segs,
-   unsigned long fast_segs, struct iovec *fast_pointer,
-   struct iovec **ret_pointer)
-{
-   compat_ssize_t tot_len;
-   struct iovec *iov = *ret_pointer = fast_pointer;
-   ssize_t ret = 0;
-

[PATCH 8/9] mm: remove compat_process_vm_{readv,writev}

2020-09-23 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  4 +-
 arch/x86/entry/syscall_x32.c  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl|  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 include/linux/compat.h|  8 ---
 include/uapi/asm-generic/unistd.h |  6 +-
 mm/process_vm_access.c| 69 ---
 tools/include/uapi/asm-generic/unistd.h   |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 17 files changed, 30 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306n32 syncfs  sys_syncfs
 307n32 sendmmsgcompat_sys_sendmmsg
 308n32 setns   sys_setns
-309n32 process_vm_readvcompat_sys_process_vm_readv
-310n32 process_vm_writev   compat_sys_process_vm_writev
+309n32 process_vm_readvsys_process_vm_readv
+310n32 process_vm_writev   sys_process_vm_writev
 311n32 kcmpsys_kcmp
 312n32 finit_modulesys_finit_module
 313n32 sched_setattr   sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342o32 syncfs  sys_syncfs
 343o32 sendmmsgsys_sendmmsg
compat_sys_sendmmsg
 344o32 setns   sys_setns
-345o32 process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-346o32 process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+345o32 process_vm_readvsys_process_vm_readv
+346o32 process_vm_writev   sys_process_vm_writev
 347o32 kcmpsys_kcmp
 348o32 finit_modulesys_finit_module
 349o32 sched_setattr   sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327common  syncfs  sys_syncfs
 328common  setns   sys_setns
 329common  sendmmsgsys_sendmmsg
compat_sys_sendmmsg
-330common  process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-331common  process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+330common  process_vm_readvsys_process_vm_readv
+331common  process_vm_writev   sys_process_vm_writev
 332common  kcmpsys_kcmp
 333common  finit_modulesys_finit_module
 334common  sched_setattr   sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls

let import_iovec deal with compat_iovecs as well v3

2020-09-23 Thread Christoph Hellwig
Hi Al,

this series changes import_iovec to transparently deal with comat iovec
structures, and then cleanups up a lot of code dupliation.

Changes since v2:
 - revert the switch of the access process vm sysclls to iov_iter
 - refactor the import_iovec internals differently
 - switch aio to use __import_iovec

Changes since v1:
 - improve a commit message
 - drop a pointless unlikely
 - drop the PF_FORCE_COMPAT flag
 - add a few more cleanups (including two from David Laight)

Diffstat:
 arch/arm64/include/asm/unistd32.h  |   10 
 arch/mips/kernel/syscalls/syscall_n32.tbl  |   10 
 arch/mips/kernel/syscalls/syscall_o32.tbl  |   10 
 arch/parisc/kernel/syscalls/syscall.tbl|   10 
 arch/powerpc/kernel/syscalls/syscall.tbl   |   10 
 arch/s390/kernel/syscalls/syscall.tbl  |   10 
 arch/sparc/kernel/syscalls/syscall.tbl |   10 
 arch/x86/entry/syscall_x32.c   |5 
 arch/x86/entry/syscalls/syscall_32.tbl |   10 
 arch/x86/entry/syscalls/syscall_64.tbl |   10 
 block/scsi_ioctl.c |   12 
 drivers/scsi/sg.c  |9 
 fs/aio.c   |   38 --
 fs/io_uring.c  |   20 -
 fs/read_write.c|  362 +
 fs/splice.c|   57 ---
 include/linux/compat.h |   24 -
 include/linux/fs.h |   11 
 include/linux/uio.h|   10 
 include/uapi/asm-generic/unistd.h  |   12 
 lib/iov_iter.c |  161 +++--
 mm/process_vm_access.c |   85 
 net/compat.c   |4 
 security/keys/compat.c |   37 --
 security/keys/internal.h   |5 
 security/keys/keyctl.c |2 
 tools/include/uapi/asm-generic/unistd.h|   12 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |   10 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|   10 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |   10 
 30 files changed, 280 insertions(+), 706 deletions(-)


[PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov

2020-09-23 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 security/keys/compat.c   | 36 ++--
 security/keys/internal.h |  5 -
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include 
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-   key_serial_t id,
-   const struct compat_iovec __user *_payload_iov,
-   unsigned ioc,
-   key_serial_t ringid)
-{
-   struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-   struct iov_iter from;
-   long ret;
-
-   if (!_payload_iov)
-   ioc = 0;
-
-   ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-  ioc, ARRAY_SIZE(iovstack), , );
-   if (ret < 0)
-   return ret;
-
-   ret = keyctl_instantiate_key_common(id, , ringid);
-   kfree(iov);
-   return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
case KEYCTL_INSTANTIATE_IOV:
-   return compat_keyctl_instantiate_key_iov(
-   arg2, compat_ptr(arg3), arg4, arg5);
+   return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+ arg5);
 
case KEYCTL_INVALIDATE:
return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
   const struct iovec __user *,
   unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
- struct iov_iter *,
- key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
const char __user *_type,
const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
   struct iov_iter *from,
   key_serial_t ringid)
 {
-- 
2.28.0



[PATCH 7/9] fs: remove compat_sys_vmsplice

2020-09-23 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  2 +-
 arch/x86/entry/syscall_x32.c  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl|  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 fs/splice.c   | 57 +--
 include/linux/compat.h|  4 --
 include/uapi/asm-generic/unistd.h |  2 +-
 tools/include/uapi/asm-generic/unistd.h   |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, 
compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267n32 splice  sys_splice
 268n32 sync_file_range sys_sync_file_range
 269n32 tee sys_tee
-270n32 vmsplicecompat_sys_vmsplice
+270n32 vmsplicesys_vmsplice
 271n32 move_pages  compat_sys_move_pages
 272n32 set_robust_list compat_sys_set_robust_list
 273n32 get_robust_list compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304o32 splice  sys_splice
 305o32 sync_file_range sys_sync_file_range 
sys32_sync_file_range
 306o32 tee sys_tee
-307o32 vmsplicesys_vmsplice
compat_sys_vmsplice
+307o32 vmsplicesys_vmsplice
 308o32 move_pages  sys_move_pages  
compat_sys_move_pages
 309o32 set_robust_list sys_set_robust_list 
compat_sys_set_robust_list
 310o32 get_robust_list sys_get_robust_list 
compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 29232  sync_file_range parisc_sync_file_range
 29264  sync_file_range sys_sync_file_range
 293common  tee sys_tee
-294common  vmsplicesys_vmsplice
compat_sys_vmsplice
+294common  vmsplicesys_vmsplice
 295common  move_pages  sys_move_pages  
compat_sys_move_pages
 296common  getcpu  sys_getcpu
 297common  epoll_pwait sys_epoll_pwait 
compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282common  unshare sys_unshare
 283common  splice  sys_splice
 284common  tee sys_tee
-285common  vmsplicesys_vmsplice
compat_sys_vmsplice
+285common  vmsplicesys_vmsplice
 286common  openat  sys_openat  
compat_sys_openat
 287common

[PATCH 6/9] fs: remove the compat readv/writev syscalls

2020-09-23 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl  |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl  |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl|  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl   |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl  |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl |  4 ++--
 arch/x86/entry/syscall_x32.c   |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl |  4 ++--
 fs/read_write.c| 14 --
 include/linux/compat.h |  4 
 include/uapi/asm-generic/unistd.h  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h|  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15 n32 ioctl   compat_sys_ioctl
 16 n32 pread64 sys_pread64
 17 n32 pwrite64sys_pwrite64
-18 n32 readv   compat_sys_readv
-19 n32 writev  compat_sys_writev
+18 n32 readv   sys_readv
+19 n32 writev  sys_writev
 20 n32 access  sys_access
 21 n32 pipesysm_pipe
 22 n32 _newselect  compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142o32 _newselect  sys_select  
compat_sys_select
 143o32 flock   sys_flock
 144o32 msync   sys_msync
-145o32 readv   sys_readv   
compat_sys_readv
-146o32 writev  sys_writev  
compat_sys_writev
+145o32 readv   sys_readv
+146o32 writev  sys_writev
 147o32 cacheflush  sys_cacheflush
 148o32 cachectlsys_cachectl
 149o32 sysmips __sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common  flock   sys_flock
 144common  msync   sys_msync
-145common  readv   sys_readv   
compat_sys_readv
-146common  writev  sys_writev  
compat_sys_writev
+145common  readv   sys_readv
+146common  writev  sys_writev
 147common  getsid  sys_getsid
 148common  fdatasync   sys_fdatasync
 149common  _sysctl sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common

[PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec

2020-09-23 Thread Christoph Hellwig
Split rw_copy_check_uvector into two new helpers with more sensible
calling conventions:

 - iovec_from_user copies a iovec from userspace either into the provided
   stack buffer if it fits, or allocates a new buffer for it.  Returns
   the actually used iovec.  It also verifies that iov_len does fit a
   signed type, and handles compat iovecs if the compat flag is set.
 - __import_iovec consolidates the native and compat versions of
   import_iovec. It calls iovec_from_user, then validates each iovec
   actually points to user addresses, and ensures the total length
   doesn't overflow.

This has two major implications:

 - the access_process_vm case loses the total lenght checking, which
   wasn't required anyway, given that each call receives two iovecs
   for the local and remote side of the operation, and it verifies
   the total length on the local side already.
 - instead of a single loop there now are two loops over the iovecs.
   Given that the iovecs are cache hot this doesn't make a major
   difference

Signed-off-by: Christoph Hellwig 
---
 include/linux/compat.h |   6 -
 include/linux/fs.h |  13 --
 include/linux/uio.h|  12 +-
 lib/iov_iter.c | 300 -
 mm/process_vm_access.c |  34 +++--
 5 files changed, 138 insertions(+), 227 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 654c1ec36671a4..b930de791ff16b 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -451,12 +451,6 @@ extern long compat_arch_ptrace(struct task_struct *child, 
compat_long_t request,
 
 struct epoll_event;/* fortunately, this one is fixed-layout */
 
-extern ssize_t compat_rw_copy_check_uvector(int type,
-   const struct compat_iovec __user *uvector,
-   unsigned long nr_segs,
-   unsigned long fast_segs, struct iovec *fast_pointer,
-   struct iovec **ret_pointer);
-
 extern void __user *compat_alloc_user_space(unsigned long len);
 
 int compat_restore_altstack(const compat_stack_t __user *uss);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a082c..e69b45b6cc7b5f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -178,14 +178,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t 
offset,
 /* File supports async buffered reads */
 #define FMODE_BUF_RASYNC   ((__force fmode_t)0x4000)
 
-/*
- * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
- * that indicates that they should check the contents of the iovec are
- * valid, but not check the memory that the iovec elements
- * points too.
- */
-#define CHECK_IOVEC_ONLY -1
-
 /*
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
@@ -1887,11 +1879,6 @@ static inline int call_mmap(struct file *file, struct 
vm_area_struct *vma)
return file->f_op->mmap(file, vma);
 }
 
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer);
-
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 3835a8a8e9eae0..92c11fe41c6228 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -266,9 +266,15 @@ bool csum_and_copy_from_iter_full(void *addr, size_t 
bytes, __wsum *csum, struct
 size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i);
 
-ssize_t import_iovec(int type, const struct iovec __user * uvector,
-unsigned nr_segs, unsigned fast_segs,
-struct iovec **iov, struct iov_iter *i);
+struct iovec *iovec_from_user(const struct iovec __user *uvector,
+   unsigned long nr_segs, unsigned long fast_segs,
+   struct iovec *fast_iov, bool compat);
+ssize_t import_iovec(int type, const struct iovec __user *uvec,
+unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+struct iov_iter *i);
+ssize_t __import_iovec(int type, const struct iovec __user *uvec,
+unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+struct iov_iter *i, bool compat);
 
 #ifdef CONFIG_COMPAT
 struct compat_iovec;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ccea9db3f72be8..d5d8afe31fca16 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1650,107 +1651,133 @@ const void *dup_iter(struct iov_iter *new, struct 
iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
-/**
- * rw_copy_check_uvector() - Copy an array of  iovec from use

Re: [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()

2020-09-21 Thread Christoph Hellwig
On Mon, Sep 21, 2020 at 04:29:37PM +0100, Al Viro wrote:
> On Mon, Sep 21, 2020 at 03:21:35PM +, David Laight wrote:
> 
> > You really don't want to be looping through the array twice.
> 
> Profiles, please.

Given that the iov array should be cache hot I'd be surprised to
see a huge difference.  


[PATCH 10/11] mm: remove compat_process_vm_{readv,writev}

2020-09-21 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  4 +-
 arch/x86/entry/syscall_x32.c  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl|  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 include/linux/compat.h|  8 ---
 include/uapi/asm-generic/unistd.h |  6 +-
 mm/process_vm_access.c| 71 ---
 tools/include/uapi/asm-generic/unistd.h   |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 17 files changed, 30 insertions(+), 111 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306n32 syncfs  sys_syncfs
 307n32 sendmmsgcompat_sys_sendmmsg
 308n32 setns   sys_setns
-309n32 process_vm_readvcompat_sys_process_vm_readv
-310n32 process_vm_writev   compat_sys_process_vm_writev
+309n32 process_vm_readvsys_process_vm_readv
+310n32 process_vm_writev   sys_process_vm_writev
 311n32 kcmpsys_kcmp
 312n32 finit_modulesys_finit_module
 313n32 sched_setattr   sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342o32 syncfs  sys_syncfs
 343o32 sendmmsgsys_sendmmsg
compat_sys_sendmmsg
 344o32 setns   sys_setns
-345o32 process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-346o32 process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+345o32 process_vm_readvsys_process_vm_readv
+346o32 process_vm_writev   sys_process_vm_writev
 347o32 kcmpsys_kcmp
 348o32 finit_modulesys_finit_module
 349o32 sched_setattr   sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327common  syncfs  sys_syncfs
 328common  setns   sys_setns
 329common  sendmmsgsys_sendmmsg
compat_sys_sendmmsg
-330common  process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-331common  process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+330common  process_vm_readvsys_process_vm_readv
+331common  process_vm_writev   sys_process_vm_writev
 332common  kcmpsys_kcmp
 333common  finit_modulesys_finit_module
 334common  sched_setattr   sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls

[PATCH 11/11] security/keys: remove compat_keyctl_instantiate_key_iov

2020-09-21 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 security/keys/compat.c   | 36 ++--
 security/keys/internal.h |  5 -
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include 
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-   key_serial_t id,
-   const struct compat_iovec __user *_payload_iov,
-   unsigned ioc,
-   key_serial_t ringid)
-{
-   struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-   struct iov_iter from;
-   long ret;
-
-   if (!_payload_iov)
-   ioc = 0;
-
-   ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-  ioc, ARRAY_SIZE(iovstack), , );
-   if (ret < 0)
-   return ret;
-
-   ret = keyctl_instantiate_key_common(id, , ringid);
-   kfree(iov);
-   return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
case KEYCTL_INSTANTIATE_IOV:
-   return compat_keyctl_instantiate_key_iov(
-   arg2, compat_ptr(arg3), arg4, arg5);
+   return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+ arg5);
 
case KEYCTL_INVALIDATE:
return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
   const struct iovec __user *,
   unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
- struct iov_iter *,
- key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
const char __user *_type,
const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
   struct iov_iter *from,
   key_serial_t ringid)
 {
-- 
2.28.0



[PATCH 07/11] fs: remove various compat readv/writev helpers

2020-09-21 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs as well, all the duplicated
code in the compat readv/writev helpers is not needed.  Remove them
and switch the compat syscall handlers to use the native helpers.

Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c | 179 
 1 file changed, 30 insertions(+), 149 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 0a68037580b455..eab427b7cc0a3f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1068,226 +1068,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const 
struct iovec __user *, vec,
return do_pwritev(fd, vec, vlen, pos, flags);
 }
 
+/*
+ * Various compat syscalls.  Note that they all pretend to take a native
+ * iovec - import_iovec will properly treat those as compat_iovecs based on
+ * in_compat_syscall().
+ */
 #ifdef CONFIG_COMPAT
-static size_t compat_readv(struct file *file,
-  const struct compat_iovec __user *vec,
-  unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *iov = iovstack;
-   struct iov_iter iter;
-   ssize_t ret;
-
-   ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
-  UIO_FASTIOV, , );
-   if (ret >= 0) {
-   ret = do_iter_read(file, , pos, flags);
-   kfree(iov);
-   }
-   if (ret > 0)
-   add_rchar(current, ret);
-   inc_syscr(current);
-   return ret;
-}
-
-static size_t do_compat_readv(compat_ulong_t fd,
-const struct compat_iovec __user *vec,
-compat_ulong_t vlen, rwf_t flags)
-{
-   struct fd f = fdget_pos(fd);
-   ssize_t ret;
-   loff_t pos;
-
-   if (!f.file)
-   return -EBADF;
-   pos = f.file->f_pos;
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   if (ret >= 0)
-   f.file->f_pos = pos;
-   fdput_pos(f);
-   return ret;
-
-}
-
 COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen)
 {
-   return do_compat_readv(fd, vec, vlen, 0);
-}
-
-static long do_compat_preadv64(unsigned long fd,
- const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos, rwf_t flags)
-{
-   struct fd f;
-   ssize_t ret;
-
-   if (pos < 0)
-   return -EINVAL;
-   f = fdget(fd);
-   if (!f.file)
-   return -EBADF;
-   ret = -ESPIPE;
-   if (f.file->f_mode & FMODE_PREAD)
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   fdput(f);
-   return ret;
+   return do_readv(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos)
 {
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
+   return do_readv(fd, vec, vlen, flags);
+   return do_preadv(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
rwf_t, flags)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
-}
-
-static size_t compat_writev(struct file *file,
-   const struct compat_iovec __user *vec,
-   unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *io

[PATCH 09/11] fs: remove compat_sys_vmsplice

2020-09-21 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  2 +-
 arch/x86/entry/syscall_x32.c  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl|  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 fs/splice.c   | 57 +--
 include/linux/compat.h|  4 --
 include/uapi/asm-generic/unistd.h |  2 +-
 tools/include/uapi/asm-generic/unistd.h   |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, 
compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267n32 splice  sys_splice
 268n32 sync_file_range sys_sync_file_range
 269n32 tee sys_tee
-270n32 vmsplicecompat_sys_vmsplice
+270n32 vmsplicesys_vmsplice
 271n32 move_pages  compat_sys_move_pages
 272n32 set_robust_list compat_sys_set_robust_list
 273n32 get_robust_list compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304o32 splice  sys_splice
 305o32 sync_file_range sys_sync_file_range 
sys32_sync_file_range
 306o32 tee sys_tee
-307o32 vmsplicesys_vmsplice
compat_sys_vmsplice
+307o32 vmsplicesys_vmsplice
 308o32 move_pages  sys_move_pages  
compat_sys_move_pages
 309o32 set_robust_list sys_set_robust_list 
compat_sys_set_robust_list
 310o32 get_robust_list sys_get_robust_list 
compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 29232  sync_file_range parisc_sync_file_range
 29264  sync_file_range sys_sync_file_range
 293common  tee sys_tee
-294common  vmsplicesys_vmsplice
compat_sys_vmsplice
+294common  vmsplicesys_vmsplice
 295common  move_pages  sys_move_pages  
compat_sys_move_pages
 296common  getcpu  sys_getcpu
 297common  epoll_pwait sys_epoll_pwait 
compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282common  unshare sys_unshare
 283common  splice  sys_splice
 284common  tee sys_tee
-285common  vmsplicesys_vmsplice
compat_sys_vmsplice
+285common  vmsplicesys_vmsplice
 286common  openat  sys_openat  
compat_sys_openat
 287common

[PATCH 08/11] fs: remove the compat readv/writev syscalls

2020-09-21 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl  |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl  |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl|  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl   |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl  |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl |  4 ++--
 arch/x86/entry/syscall_x32.c   |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl |  4 ++--
 fs/read_write.c| 14 --
 include/linux/compat.h |  4 
 include/uapi/asm-generic/unistd.h  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h|  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15 n32 ioctl   compat_sys_ioctl
 16 n32 pread64 sys_pread64
 17 n32 pwrite64sys_pwrite64
-18 n32 readv   compat_sys_readv
-19 n32 writev  compat_sys_writev
+18 n32 readv   sys_readv
+19 n32 writev  sys_writev
 20 n32 access  sys_access
 21 n32 pipesysm_pipe
 22 n32 _newselect  compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142o32 _newselect  sys_select  
compat_sys_select
 143o32 flock   sys_flock
 144o32 msync   sys_msync
-145o32 readv   sys_readv   
compat_sys_readv
-146o32 writev  sys_writev  
compat_sys_writev
+145o32 readv   sys_readv
+146o32 writev  sys_writev
 147o32 cacheflush  sys_cacheflush
 148o32 cachectlsys_cachectl
 149o32 sysmips __sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common  flock   sys_flock
 144common  msync   sys_msync
-145common  readv   sys_readv   
compat_sys_readv
-146common  writev  sys_writev  
compat_sys_writev
+145common  readv   sys_readv
+146common  writev  sys_writev
 147common  getsid  sys_getsid
 148common  fdatasync   sys_fdatasync
 149common  _sysctl sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common

let import_iovec deal with compat_iovecs as well v2

2020-09-21 Thread Christoph Hellwig
Hi Al,

this series changes import_iovec to transparently deal with comat iovec
structures, and then cleanups up a lot of code dupliation.

Changes since v1:
 - improve a commit message
 - drop a pointless unlikely
 - drop the PF_FORCE_COMPAT flag
 - add a few more cleanups (including two from David Laight)

Diffstat:
 arch/arm64/include/asm/unistd32.h  |   10 
 arch/mips/kernel/syscalls/syscall_n32.tbl  |   10 
 arch/mips/kernel/syscalls/syscall_o32.tbl  |   10 
 arch/parisc/kernel/syscalls/syscall.tbl|   10 
 arch/powerpc/kernel/syscalls/syscall.tbl   |   10 
 arch/s390/kernel/syscalls/syscall.tbl  |   10 
 arch/sparc/kernel/syscalls/syscall.tbl |   10 
 arch/x86/entry/syscall_x32.c   |5 
 arch/x86/entry/syscalls/syscall_32.tbl |   10 
 arch/x86/entry/syscalls/syscall_64.tbl |   10 
 block/scsi_ioctl.c |   12 
 drivers/scsi/sg.c  |9 
 fs/aio.c   |   38 --
 fs/io_uring.c  |   20 -
 fs/read_write.c|  362 +
 fs/splice.c|   57 ---
 include/linux/compat.h |   24 -
 include/linux/fs.h |   11 
 include/linux/uio.h|   10 
 include/uapi/asm-generic/unistd.h  |   12 
 lib/iov_iter.c |  161 +++--
 mm/process_vm_access.c |   85 
 net/compat.c   |4 
 security/keys/compat.c |   37 --
 security/keys/internal.h   |5 
 security/keys/keyctl.c |2 
 tools/include/uapi/asm-generic/unistd.h|   12 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |   10 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|   10 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |   10 
 30 files changed, 280 insertions(+), 706 deletions(-)


[PATCH 05/11] iov_iter: merge the compat case into rw_copy_check_uvector

2020-09-21 Thread Christoph Hellwig
Stop duplicating the iovec verify code, and instead add add a
__import_iovec helper that does the whole verify and import, but takes
a bool compat to decided on the native or compat layout.  This also
ends up massively simplifying the calling conventions.

Signed-off-by: Christoph Hellwig 
---
 lib/iov_iter.c | 195 ++---
 1 file changed, 70 insertions(+), 125 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index a64867501a7483..8bfa47b63d39aa 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define PIPE_PARANOIA /* for now */
 
@@ -1650,43 +1651,76 @@ const void *dup_iter(struct iov_iter *new, struct 
iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
-static ssize_t rw_copy_check_uvector(int type,
-   const struct iovec __user *uvector, unsigned long nr_segs,
-   unsigned long fast_segs, struct iovec *fast_pointer,
-   struct iovec **ret_pointer)
+static int compat_copy_iovecs_from_user(struct iovec *iov,
+   const struct iovec __user *uvector, unsigned long nr_segs)
+{
+   const struct compat_iovec __user *uiov =
+   (const struct compat_iovec __user *)uvector;
+   unsigned long i;
+   int ret = -EFAULT;
+
+   if (!user_access_begin(uvector, nr_segs * sizeof(*uvector)))
+   return -EFAULT;
+
+   for (i = 0; i < nr_segs; i++) {
+   compat_uptr_t buf;
+   compat_ssize_t len;
+
+   unsafe_get_user(len, [i].iov_len, out);
+   unsafe_get_user(buf, [i].iov_base, out);
+
+   /* check for compat_size_t not fitting in compat_ssize_t .. */
+   if (len < 0) {
+   ret = -EINVAL;
+   goto out;
+   }
+   iov[i].iov_base = compat_ptr(buf);
+   iov[i].iov_len = len;
+   }
+   ret = 0;
+out:
+   user_access_end();
+   return ret;
+}
+
+static ssize_t __import_iovec(int type, const struct iovec __user *uvector,
+   unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+   struct iov_iter *i, bool compat)
 {
+   struct iovec *iov = *iovp;
unsigned long seg;
-   ssize_t ret;
-   struct iovec *iov = fast_pointer;
+   ssize_t ret = 0;
 
/*
 * SuS says "The readv() function *may* fail if the iovcnt argument
 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
 * traditionally returned zero for zero segments, so...
 */
-   if (nr_segs == 0) {
-   ret = 0;
-   goto out;
-   }
+   if (nr_segs == 0)
+   goto done;
 
/*
 * First get the "struct iovec" from user memory and
 * verify all the pointers
 */
-   if (nr_segs > UIO_MAXIOV) {
-   ret = -EINVAL;
-   goto out;
-   }
+   ret = -EINVAL;
+   if (nr_segs > UIO_MAXIOV)
+   goto fail;
if (nr_segs > fast_segs) {
+   ret = -ENOMEM;
iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-   if (iov == NULL) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   if (!iov)
+   goto fail;
}
-   if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
+
+   if (compat) {
+   ret = compat_copy_iovecs_from_user(iov, uvector, nr_segs);
+   if (ret)
+   goto fail;
+   } else {
ret = -EFAULT;
-   goto out;
+   if (copy_from_user(iov, uvector, nr_segs * sizeof(*uvector)))
+   goto fail;
}
 
/*
@@ -1707,11 +1741,11 @@ static ssize_t rw_copy_check_uvector(int type,
 * it's about to overflow ssize_t */
if (len < 0) {
ret = -EINVAL;
-   goto out;
+   goto fail;
}
if (type != CHECK_IOVEC_ONLY && !access_ok(buf, len)) {
ret = -EFAULT;
-   goto out;
+   goto fail;
}
if (len > MAX_RW_COUNT - ret) {
len = MAX_RW_COUNT - ret;
@@ -1719,8 +1753,17 @@ static ssize_t rw_copy_check_uvector(int type,
}
ret += len;
}
-out:
-   *ret_pointer = iov;
+done:
+   iov_iter_init(i, type, iov, nr_segs, ret);
+   if (iov == *iovp)
+   *iovp = NULL;
+   else
+   *iovp = iov;
+   return ret;
+fail:
+   if (iov != *iovp)
+   kfree(iov);
+   *iovp = NULL;
return ret;
 }
 
@@ -1750,116 +1793,18 @@ ssize_t import_iovec(in

[PATCH 01/11] compat.h: fix a spelling error in

2020-09-21 Thread Christoph Hellwig
There is no compat_sys_readv64v2 syscall, only a compat_sys_preadv64v2
one.

Signed-off-by: Christoph Hellwig 
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..654c1ec36671a4 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
const struct compat_iovec __user *vec,
unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0



[PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()

2020-09-21 Thread Christoph Hellwig
From: David Laight 

This is the only direct call of rw_copy_check_uvector().  Removing it
will allow rw_copy_check_uvector() to be inlined into import_iovec(),
while only paying a minor price by setting up an otherwise unused
iov_iter in the process_vm_readv/process_vm_writev syscalls that aren't
in a super hot path.

Signed-off-by: David Laight 
[hch: expanded the commit log, pass CHECK_IOVEC_ONLY instead of 0 for the
  compat case, handle CHECK_IOVEC_ONLY in iov_iter_init]
Signed-off-by: Christoph Hellwig 
---
 lib/iov_iter.c |  2 +-
 mm/process_vm_access.c | 33 ++---
 2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f1232..db54588406dfae 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -443,7 +443,7 @@ void iov_iter_init(struct iov_iter *i, unsigned int 
direction,
const struct iovec *iov, unsigned long nr_segs,
size_t count)
 {
-   WARN_ON(direction & ~(READ | WRITE));
+   WARN_ON(direction & ~(READ | WRITE | CHECK_IOVEC_ONLY));
direction &= READ | WRITE;
 
/* It will get better.  Eventually... */
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 29c052099affdc..40cd502c337534 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -264,7 +264,7 @@ static ssize_t process_vm_rw(pid_t pid,
struct iovec iovstack_r[UIO_FASTIOV];
struct iovec *iov_l = iovstack_l;
struct iovec *iov_r = iovstack_r;
-   struct iov_iter iter;
+   struct iov_iter iter_l, iter_r;
ssize_t rc;
int dir = vm_write ? WRITE : READ;
 
@@ -272,23 +272,25 @@ static ssize_t process_vm_rw(pid_t pid,
return -EINVAL;
 
/* Check iovecs */
-   rc = import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, _l, );
+   rc = import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, _l, _l);
if (rc < 0)
return rc;
-   if (!iov_iter_count())
+   if (!iov_iter_count(_l))
goto free_iovecs;
 
-   rc = rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
-  iovstack_r, _r);
+   rc = import_iovec(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV, _r,
+ _r);
if (rc <= 0)
goto free_iovecs;
 
-   rc = process_vm_rw_core(pid, , iov_r, riovcnt, flags, vm_write);
+   rc = process_vm_rw_core(pid, _l, iter_r.iov, iter_r.nr_segs,
+   flags, vm_write);
 
 free_iovecs:
if (iov_r != iovstack_r)
kfree(iov_r);
-   kfree(iov_l);
+   if (iov_l != iovstack_l)
+   kfree(iov_l);
 
return rc;
 }
@@ -322,30 +324,31 @@ compat_process_vm_rw(compat_pid_t pid,
struct iovec iovstack_r[UIO_FASTIOV];
struct iovec *iov_l = iovstack_l;
struct iovec *iov_r = iovstack_r;
-   struct iov_iter iter;
+   struct iov_iter iter_l, iter_r;
ssize_t rc = -EFAULT;
int dir = vm_write ? WRITE : READ;
 
if (flags != 0)
return -EINVAL;
 
-   rc = compat_import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, _l, 
);
+   rc = compat_import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, _l, 
_l);
if (rc < 0)
return rc;
-   if (!iov_iter_count())
+   if (!iov_iter_count(_l))
goto free_iovecs;
-   rc = compat_rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt,
- UIO_FASTIOV, iovstack_r,
- _r);
+   rc = compat_import_iovec(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
+_r, _r);
if (rc <= 0)
goto free_iovecs;
 
-   rc = process_vm_rw_core(pid, , iov_r, riovcnt, flags, vm_write);
+   rc = process_vm_rw_core(pid, _l, iter_r.iov, iter_r.nr_segs,
+   flags, vm_write);
 
 free_iovecs:
if (iov_r != iovstack_r)
kfree(iov_r);
-   kfree(iov_l);
+   if (iov_l != iovstack_l)
+   kfree(iov_l);
return rc;
 }
 
-- 
2.28.0



[PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector

2020-09-21 Thread Christoph Hellwig
Explicitly check for the magic value insted of implicitly relying on
its numeric representation.   Also drop the rather pointless unlikely
annotation.

Signed-off-by: Christoph Hellwig 
---
 lib/iov_iter.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d7e72343c360eb..a64867501a7483 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1709,8 +1709,7 @@ static ssize_t rw_copy_check_uvector(int type,
ret = -EINVAL;
goto out;
}
-   if (type >= 0
-   && unlikely(!access_ok(buf, len))) {
+   if (type != CHECK_IOVEC_ONLY && !access_ok(buf, len)) {
ret = -EFAULT;
goto out;
}
@@ -1824,7 +1823,7 @@ static ssize_t compat_rw_copy_check_uvector(int type,
}
if (len < 0)/* size_t not fitting in compat_ssize_t .. */
goto out;
-   if (type >= 0 &&
+   if (type != CHECK_IOVEC_ONLY &&
!access_ok(compat_ptr(buf), len)) {
ret = -EFAULT;
goto out;
-- 
2.28.0



[PATCH 06/11] iov_iter: handle the compat case in import_iovec

2020-09-21 Thread Christoph Hellwig
Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec, which removes the need for
special compat logic in most callers.  Only io_uring needs special
treatment given that it can call import_iovec from kernel threads acting
on behalf of native or compat syscalls.  Expose the low-level
__import_iovec helper and use it in io_uring to explicitly pick a iovec
layout.

Signed-off-by: Christoph Hellwig 
---
 block/scsi_ioctl.c | 12 ++--
 drivers/scsi/sg.c  |  9 +
 fs/aio.c   | 38 ++
 fs/io_uring.c  | 20 
 fs/read_write.c|  6 --
 fs/splice.c|  2 +-
 include/linux/uio.h| 10 +++---
 lib/iov_iter.c | 17 +++--
 mm/process_vm_access.c |  7 ---
 net/compat.c   |  4 ++--
 security/keys/compat.c |  5 ++---
 11 files changed, 44 insertions(+), 86 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk 
*bd_disk,
struct iov_iter i;
struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   ret = compat_import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
-   else
-#endif
-   ret = import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
+   ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+  hdr->iovec_count, 0, , );
if (ret < 0)
goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
struct iovec *iov = NULL;
struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   res = compat_import_iovec(rw, hp->dxferp, iov_count,
- 0, , );
-   else
-#endif
-   res = import_iovec(rw, hp->dxferp, iov_count,
-  0, , );
+   res = import_iovec(rw, hp->dxferp, iov_count, 0, , );
if (res < 0)
return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..b377f5c2048e18 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1478,8 +1478,7 @@ static int aio_prep_rw(struct kiocb *req, const struct 
iocb *iocb)
 }
 
 static ssize_t aio_setup_rw(int rw, const struct iocb *iocb,
-   struct iovec **iovec, bool vectored, bool compat,
-   struct iov_iter *iter)
+   struct iovec **iovec, bool vectored, struct iov_iter *iter)
 {
void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf;
size_t len = iocb->aio_nbytes;
@@ -1489,11 +1488,6 @@ static ssize_t aio_setup_rw(int rw, const struct iocb 
*iocb,
*iovec = NULL;
return ret;
}
-#ifdef CONFIG_COMPAT
-   if (compat)
-   return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-   iter);
-#endif
return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
@@ -1517,8 +1511,7 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t 
ret)
}
 }
 
-static int aio_read(struct kiocb *req, const struct iocb *iocb,
-   bool vectored, bool compat)
+static int aio_read(struct kiocb *req, const struct iocb *iocb, bool vectored)
 {
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct iov_iter iter;
@@ -1535,7 +1528,7 @@ static int aio_read(struct kiocb *req, const struct iocb 
*iocb,
if (unlikely(!file->f_op->read_iter))
return -EINVAL;
 
-   ret = aio_setup_rw(READ, iocb, , vectored, compat, );
+   ret = aio_setup_rw(READ, iocb, , vectored, );
if (ret < 0)
return ret;
ret = rw_verify_area(READ, file, >ki_pos, iov_iter_count());
@@ -1545,8 +1538,7 @@ static int aio_read(struct kiocb *req, const struct iocb 
*iocb,
return ret;
 }
 
-static int aio_write(struct kiocb *req, const struct iocb *iocb,
-bool vectored, bool compat)
+static int aio_write(struct kiocb *req, const struct iocb *iocb, bool vectored)
 {
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct iov_iter iter;
@@ -1563,7 +1555,7 @@ static int aio_write(struct kiocb *req, const s

[PATCH 03/11] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c and mark it static

2020-09-21 Thread Christoph Hellwig
From: David Laight 

This lets the compiler inline it into import_iovec() generating
much better code.

Signed-off-by: David Laight 
[hch: drop the now pointless kerneldoc for a static function, and update
  a few other comments]
Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c| 179 -
 include/linux/compat.h |   6 --
 include/linux/fs.h |  11 +--
 lib/iov_iter.c | 150 +-
 4 files changed, 151 insertions(+), 195 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..e5e891a88442ef 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,185 +752,6 @@ static ssize_t do_loop_readv_writev(struct file *filp, 
struct iov_iter *iter,
return ret;
 }
 
-/**
- * rw_copy_check_uvector() - Copy an array of  iovec from userspace
- * into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- * either @fast_pointer, a newly allocated kernel array, or NULL,
- * depending on which array was used.
- *
- * This function copies an array of  iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer)
-{
-   unsigned long seg;
-   ssize_t ret;
-   struct iovec *iov = fast_pointer;
-
-   /*
-* SuS says "The readv() function *may* fail if the iovcnt argument
-* was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-* traditionally returned zero for zero segments, so...
-*/
-   if (nr_segs == 0) {
-   ret = 0;
-   goto out;
-   }
-
-   /*
-* First get the "struct iovec" from user memory and
-* verify all the pointers
-*/
-   if (nr_segs > UIO_MAXIOV) {
-   ret = -EINVAL;
-   goto out;
-   }
-   if (nr_segs > fast_segs) {
-   iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-   if (iov == NULL) {
-   ret = -ENOMEM;
-   goto out;
-   }
-   }
-   if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-   ret = -EFAULT;
-   goto out;
-   }
-
-   /*
-* According to the Single Unix Specification we should return EINVAL
-* if an element length is < 0 when cast to ssize_t or if the
-* total length would overflow the ssize_t return value of the
-* system call.
-*
-* Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
-* overflow case.
-*/
-   ret = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   void __user *buf = iov[seg].iov_base;
-   ssize_t len = (ssize_t)iov[seg].iov_len;
-
-   /* see if we we're about to use an invalid len or if
-* it's about to overflow ssize_t */
-   if (len < 0) {
-   ret = -EINVAL;
-   goto out;
-   }
-   if (type >= 0
-   && unlikely(!access_ok(buf, len))) {
-   ret = -EFAULT;
-   goto out;
-   }
-   if (len > MAX_RW_COUNT - ret) {
-   len = MAX_RW_COUNT - ret;
-   iov[seg].iov_len = len;
-   }
-   ret += len;
-   }
-out:
-   *ret_pointer = iov;
-   return ret;
-}
-
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-   const struct compat_iovec __user *uvector, unsigned long 
nr_segs,
-   unsigned long fast_segs, struct iovec *fast_pointer

Re: [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic*

2020-09-21 Thread Christoph Hellwig
> +# ifndef ARCH_NEEDS_KMAP_HIGH_GET
> +static inline void *arch_kmap_temporary_high_get(struct page *page)
> +{
> + return NULL;
> +}
> +# endif

Turn this into a macro and use #ifndef on the symbol name?

> +static inline void __kunmap_atomic(void *addr)
> +{
> + kumap_atomic_indexed(addr);
> +}
> +
> +
> +#endif /* CONFIG_KMAP_ATOMIC_GENERIC */

Stange double empty line above the endif.

> -#define kunmap_atomic(addr) \
> -do {\
> - BUILD_BUG_ON(__same_type((addr), struct page *));   \
> - kunmap_atomic_high(addr);  \
> - pagefault_enable(); \
> - preempt_enable();   \
> -} while (0)
> -
> +#define kunmap_atomic(addr)  \
> + do {\
> + BUILD_BUG_ON(__same_type((addr), struct page *));   \
> + __kunmap_atomic(addr);  \
> + preempt_enable();   \
> + } while (0)

Why the strange re-indent to a form that is much less common and less
readable?

> +void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
> +{
> + pagefault_disable();
> + return __kmap_atomic_pfn_prot(pfn, prot);
> +}
> +EXPORT_SYMBOL(kmap_atomic_pfn_prot);

The existing kmap_atomic_pfn & co implementation is EXPORT_SYMBOL_GPL,
and this stuff should preferably stay that way.


Re: [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx()

2020-09-21 Thread Christoph Hellwig
On Sat, Sep 19, 2020 at 11:17:52AM +0200, Thomas Gleixner wrote:
> Nothing in modules can use that.
> 
> Signed-off-by: Thomas Gleixner 

Looks good,

Reviewed-by: Christoph Hellwig 


Re: let import_iovec deal with compat_iovecs as well

2020-09-20 Thread 'Christoph Hellwig'
On Sat, Sep 19, 2020 at 02:24:10PM +, David Laight wrote:
> I thought about that change while writing my import_iovec() => iovec_import()
> patch - and thought that the io_uring code would (as usual) cause grief.
> 
> Christoph - did you see those patches?

No.


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-20 Thread Christoph Hellwig
On Sun, Sep 20, 2020 at 12:14:49PM -0700, Andy Lutomirski wrote:
> I wonder if this is really quite cast in stone.  We could also have
> FMODE_SHITTY_COMPAT and set that when a file like this is *opened* in
> compat mode.  Then that particular struct file would be read and
> written using the compat data format.  The change would be
> user-visible, but the user that would see it would be very strange
> indeed.
> 
> I don't have a strong opinion as to whether that is better or worse
> than denying io_uring access to these things, but at least it moves
> the special case out of io_uring.

open could have happened through an io_uring thread a well, so I don't
see how this would do anything but move the problem to a different
place.

> 
> --Andy
---end quoted text---


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-18 Thread Christoph Hellwig
On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> Said that, why not provide a variant that would take an explicit
> "is it compat" argument and use it there?  And have the normal
> one pass in_compat_syscall() to that...

That would help to not introduce a regression with this series yes.
But it wouldn't fix existing bugs when io_uring is used to access
read or write methods that use in_compat_syscall().  One example that
I recently ran into is drivers/scsi/sg.c.


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-18 Thread Christoph Hellwig
On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote:
> > /* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
> > -   return pt_regs_trap_type(current_pt_regs()) == 0x110;
> > +   return pt_regs_trap_type(current_pt_regs()) == 0x110 ||
> > +   (current->flags & PF_FORCE_COMPAT);
> 
> Can't say I like that approach ;-/  Reasoning about the behaviour is much
> harder when it's controlled like that - witness set_fs() shite...

I don't particularly like it either.  But do you have a better idea
how to deal with io_uring vs compat tasks?


[PATCH 8/9] mm: remove compat_process_vm_{readv,writev}

2020-09-18 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  4 +-
 arch/x86/entry/syscall_x32.c  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl|  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 include/linux/compat.h|  8 ---
 include/uapi/asm-generic/unistd.h |  6 +-
 mm/process_vm_access.c| 70 ---
 tools/include/uapi/asm-generic/unistd.h   |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 17 files changed, 30 insertions(+), 110 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306n32 syncfs  sys_syncfs
 307n32 sendmmsgcompat_sys_sendmmsg
 308n32 setns   sys_setns
-309n32 process_vm_readvcompat_sys_process_vm_readv
-310n32 process_vm_writev   compat_sys_process_vm_writev
+309n32 process_vm_readvsys_process_vm_readv
+310n32 process_vm_writev   sys_process_vm_writev
 311n32 kcmpsys_kcmp
 312n32 finit_modulesys_finit_module
 313n32 sched_setattr   sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342o32 syncfs  sys_syncfs
 343o32 sendmmsgsys_sendmmsg
compat_sys_sendmmsg
 344o32 setns   sys_setns
-345o32 process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-346o32 process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+345o32 process_vm_readvsys_process_vm_readv
+346o32 process_vm_writev   sys_process_vm_writev
 347o32 kcmpsys_kcmp
 348o32 finit_modulesys_finit_module
 349o32 sched_setattr   sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327common  syncfs  sys_syncfs
 328common  setns   sys_setns
 329common  sendmmsgsys_sendmmsg
compat_sys_sendmmsg
-330common  process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-331common  process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+330common  process_vm_readvsys_process_vm_readv
+331common  process_vm_writev   sys_process_vm_writev
 332common  kcmpsys_kcmp
 333common  finit_modulesys_finit_module
 334common  sched_setattr   sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls

[PATCH 2/9] compat.h: fix a spelling error in

2020-09-18 Thread Christoph Hellwig
We only have not compat_sys_readv64v2 syscall, only a
compat_sys_preadv64v2 syscall one.  This probably worked given that the
syscall was not referenced from anywhere but the x86 syscall table.

Signed-off-by: Christoph Hellwig 
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 685066f7ad325f..69968c124b3cad 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
const struct compat_iovec __user *vec,
unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0



[PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov

2020-09-18 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 security/keys/compat.c   | 36 ++--
 security/keys/internal.h |  5 -
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include 
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-   key_serial_t id,
-   const struct compat_iovec __user *_payload_iov,
-   unsigned ioc,
-   key_serial_t ringid)
-{
-   struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-   struct iov_iter from;
-   long ret;
-
-   if (!_payload_iov)
-   ioc = 0;
-
-   ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-  ioc, ARRAY_SIZE(iovstack), , );
-   if (ret < 0)
-   return ret;
-
-   ret = keyctl_instantiate_key_common(id, , ringid);
-   kfree(iov);
-   return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
case KEYCTL_INSTANTIATE_IOV:
-   return compat_keyctl_instantiate_key_iov(
-   arg2, compat_ptr(arg3), arg4, arg5);
+   return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+ arg5);
 
case KEYCTL_INVALIDATE:
return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
   const struct iovec __user *,
   unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
- struct iov_iter *,
- key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
const char __user *_type,
const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
   struct iov_iter *from,
   key_serial_t ringid)
 {
-- 
2.28.0



[PATCH 3/9] fs: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector

2020-09-18 Thread Christoph Hellwig
Explicitly check for the magic value insted of implicitly relying on
its number representation.

Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..f153116bc5399b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -840,8 +840,7 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec 
__user * uvector,
ret = -EINVAL;
goto out;
}
-   if (type >= 0
-   && unlikely(!access_ok(buf, len))) {
+   if (type != CHECK_IOVEC_ONLY && unlikely(!access_ok(buf, len))) 
{
ret = -EFAULT;
goto out;
}
@@ -911,7 +910,7 @@ ssize_t compat_rw_copy_check_uvector(int type,
}
if (len < 0)/* size_t not fitting in compat_ssize_t .. */
goto out;
-   if (type >= 0 &&
+   if (type != CHECK_IOVEC_ONLY &&
!access_ok(compat_ptr(buf), len)) {
ret = -EFAULT;
goto out;
-- 
2.28.0



let import_iovec deal with compat_iovecs as well

2020-09-18 Thread Christoph Hellwig
Hi Al,

this series changes import_iovec to transparently deal with comat iovec
structures, and then cleanups up a lot of code dupliation.  But to get
there it first has to fix the pre-existing bug that io_uring compat
contexts don't trigger the in_compat_syscall() check.  This has so far
been relatively harmless as very little code callable from io_uring used
the check, and even that code that could be called usually wasn't.

Diffstat
 arch/arm64/include/asm/unistd32.h  |   10 
 arch/mips/kernel/syscalls/syscall_n32.tbl  |   10 
 arch/mips/kernel/syscalls/syscall_o32.tbl  |   10 
 arch/parisc/kernel/syscalls/syscall.tbl|   10 
 arch/powerpc/kernel/syscalls/syscall.tbl   |   10 
 arch/s390/kernel/syscalls/syscall.tbl  |   10 
 arch/sparc/include/asm/compat.h|3 
 arch/sparc/kernel/syscalls/syscall.tbl |   10 
 arch/x86/entry/syscall_x32.c   |5 
 arch/x86/entry/syscalls/syscall_32.tbl |   10 
 arch/x86/entry/syscalls/syscall_64.tbl |   10 
 arch/x86/include/asm/compat.h  |2 
 block/scsi_ioctl.c |   12 
 drivers/scsi/sg.c  |9 
 fs/aio.c   |   38 --
 fs/io_uring.c  |   21 -
 fs/read_write.c|  307 -
 fs/splice.c|   57 ---
 include/linux/compat.h |   29 -
 include/linux/fs.h |7 
 include/linux/sched.h  |1 
 include/linux/uio.h|7 
 include/uapi/asm-generic/unistd.h  |   12 
 lib/iov_iter.c |   30 --
 mm/process_vm_access.c |   69 
 net/compat.c   |4 
 security/keys/compat.c |   37 --
 security/keys/internal.h   |5 
 security/keys/keyctl.c |2 
 tools/include/uapi/asm-generic/unistd.h|   12 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |   10 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|   10 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |   10 
 33 files changed, 207 insertions(+), 582 deletions(-)


[PATCH 7/9] fs: remove compat_sys_vmsplice

2020-09-18 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  2 +-
 arch/x86/entry/syscall_x32.c  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl|  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 fs/splice.c   | 57 +--
 include/linux/compat.h|  4 --
 include/uapi/asm-generic/unistd.h |  2 +-
 tools/include/uapi/asm-generic/unistd.h   |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, 
compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267n32 splice  sys_splice
 268n32 sync_file_range sys_sync_file_range
 269n32 tee sys_tee
-270n32 vmsplicecompat_sys_vmsplice
+270n32 vmsplicesys_vmsplice
 271n32 move_pages  compat_sys_move_pages
 272n32 set_robust_list compat_sys_set_robust_list
 273n32 get_robust_list compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304o32 splice  sys_splice
 305o32 sync_file_range sys_sync_file_range 
sys32_sync_file_range
 306o32 tee sys_tee
-307o32 vmsplicesys_vmsplice
compat_sys_vmsplice
+307o32 vmsplicesys_vmsplice
 308o32 move_pages  sys_move_pages  
compat_sys_move_pages
 309o32 set_robust_list sys_set_robust_list 
compat_sys_set_robust_list
 310o32 get_robust_list sys_get_robust_list 
compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 29232  sync_file_range parisc_sync_file_range
 29264  sync_file_range sys_sync_file_range
 293common  tee sys_tee
-294common  vmsplicesys_vmsplice
compat_sys_vmsplice
+294common  vmsplicesys_vmsplice
 295common  move_pages  sys_move_pages  
compat_sys_move_pages
 296common  getcpu  sys_getcpu
 297common  epoll_pwait sys_epoll_pwait 
compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282common  unshare sys_unshare
 283common  splice  sys_splice
 284common  tee sys_tee
-285common  vmsplicesys_vmsplice
compat_sys_vmsplice
+285common  vmsplicesys_vmsplice
 286common  openat  sys_openat  
compat_sys_openat
 287common

[PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-18 Thread Christoph Hellwig
Add a flag to force processing a syscall as a compat syscall.  This is
required so that in_compat_syscall() works for I/O submitted by io_uring
helper threads on behalf of compat syscalls.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/include/asm/compat.h | 3 ++-
 arch/x86/include/asm/compat.h   | 2 +-
 fs/io_uring.c   | 9 +
 include/linux/compat.h  | 5 -
 include/linux/sched.h   | 1 +
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h
index 40a267b3bd5208..fee6c51d36e869 100644
--- a/arch/sparc/include/asm/compat.h
+++ b/arch/sparc/include/asm/compat.h
@@ -211,7 +211,8 @@ static inline int is_compat_task(void)
 static inline bool in_compat_syscall(void)
 {
/* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
-   return pt_regs_trap_type(current_pt_regs()) == 0x110;
+   return pt_regs_trap_type(current_pt_regs()) == 0x110 ||
+   (current->flags & PF_FORCE_COMPAT);
 }
 #define in_compat_syscall in_compat_syscall
 #endif
diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index d4edf281fff49d..fbab072d4e5b31 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -208,7 +208,7 @@ static inline bool in_32bit_syscall(void)
 #ifdef CONFIG_COMPAT
 static inline bool in_compat_syscall(void)
 {
-   return in_32bit_syscall();
+   return in_32bit_syscall() || (current->flags & PF_FORCE_COMPAT);
 }
 #define in_compat_syscall in_compat_syscall/* override the generic impl */
 #endif
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3790c7fe9fee22..5755d557c3f7bc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5449,6 +5449,9 @@ static int io_req_defer_prep(struct io_kiocb *req,
if (unlikely(ret))
return ret;
 
+   if (req->ctx->compat)
+   current->flags |= PF_FORCE_COMPAT;
+
switch (req->opcode) {
case IORING_OP_NOP:
break;
@@ -5546,6 +5549,7 @@ static int io_req_defer_prep(struct io_kiocb *req,
break;
}
 
+   current->flags &= ~PF_FORCE_COMPAT;
return ret;
 }
 
@@ -5669,6 +5673,9 @@ static int io_issue_sqe(struct io_kiocb *req, const 
struct io_uring_sqe *sqe,
struct io_ring_ctx *ctx = req->ctx;
int ret;
 
+   if (ctx->compat)
+   current->flags |= PF_FORCE_COMPAT;
+
switch (req->opcode) {
case IORING_OP_NOP:
ret = io_nop(req, cs);
@@ -5898,6 +5905,8 @@ static int io_issue_sqe(struct io_kiocb *req, const 
struct io_uring_sqe *sqe,
break;
}
 
+   current->flags &= ~PF_FORCE_COMPAT;
+
if (ret)
return ret;
 
diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..685066f7ad325f 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -891,7 +891,10 @@ asmlinkage long compat_sys_socketcall(int call, u32 __user 
*args);
  */
 
 #ifndef in_compat_syscall
-static inline bool in_compat_syscall(void) { return is_compat_task(); }
+static inline bool in_compat_syscall(void)
+{
+   return is_compat_task() || (current->flags & PF_FORCE_COMPAT);
+}
 #endif
 
 /**
diff --git a/include/linux/sched.h b/include/linux/sched.h
index afe01e232935fa..c8b183b5655a1e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1491,6 +1491,7 @@ extern struct pid *cad_pid;
  */
 #define PF_IDLE0x0002  /* I am an IDLE thread 
*/
 #define PF_EXITING 0x0004  /* Getting shut down */
+#define PF_FORCE_COMPAT0x0008  /* acting as compat 
task */
 #define PF_VCPU0x0010  /* I'm a virtual CPU */
 #define PF_WQ_WORKER   0x0020  /* I'm a workqueue worker */
 #define PF_FORKNOEXEC  0x0040  /* Forked but didn't exec */
-- 
2.28.0



[PATCH 5/9] fs: remove various compat readv/writev helpers

2020-09-18 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs as well, all the duplicated
code in the compat readv/writev helpers is not needed.  Remove them
and switch the compat syscall handlers to use the native helpers.

Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c | 179 
 1 file changed, 30 insertions(+), 149 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 2f961c653ce561..9eb63c53da78f2 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1211,226 +1211,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const 
struct iovec __user *, vec,
return do_pwritev(fd, vec, vlen, pos, flags);
 }
 
+/*
+ * Various compat syscalls.  Note that they all pretend to take a native
+ * iovec - import_iovec will properly treat those as compat_iovecs based on
+ * in_compat_syscall().
+ */
 #ifdef CONFIG_COMPAT
-static size_t compat_readv(struct file *file,
-  const struct compat_iovec __user *vec,
-  unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *iov = iovstack;
-   struct iov_iter iter;
-   ssize_t ret;
-
-   ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
-  UIO_FASTIOV, , );
-   if (ret >= 0) {
-   ret = do_iter_read(file, , pos, flags);
-   kfree(iov);
-   }
-   if (ret > 0)
-   add_rchar(current, ret);
-   inc_syscr(current);
-   return ret;
-}
-
-static size_t do_compat_readv(compat_ulong_t fd,
-const struct compat_iovec __user *vec,
-compat_ulong_t vlen, rwf_t flags)
-{
-   struct fd f = fdget_pos(fd);
-   ssize_t ret;
-   loff_t pos;
-
-   if (!f.file)
-   return -EBADF;
-   pos = f.file->f_pos;
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   if (ret >= 0)
-   f.file->f_pos = pos;
-   fdput_pos(f);
-   return ret;
-
-}
-
 COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen)
 {
-   return do_compat_readv(fd, vec, vlen, 0);
-}
-
-static long do_compat_preadv64(unsigned long fd,
- const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos, rwf_t flags)
-{
-   struct fd f;
-   ssize_t ret;
-
-   if (pos < 0)
-   return -EINVAL;
-   f = fdget(fd);
-   if (!f.file)
-   return -EBADF;
-   ret = -ESPIPE;
-   if (f.file->f_mode & FMODE_PREAD)
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   fdput(f);
-   return ret;
+   return do_readv(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos)
 {
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
+   return do_readv(fd, vec, vlen, flags);
+   return do_preadv(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
rwf_t, flags)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
-}
-
-static size_t compat_writev(struct file *file,
-   const struct compat_iovec __user *vec,
-   unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *io

[PATCH 6/9] fs: remove the compat readv/writev syscalls

2020-09-18 Thread Christoph Hellwig
Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl  |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl  |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl|  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl   |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl  |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl |  4 ++--
 arch/x86/entry/syscall_x32.c   |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl |  4 ++--
 fs/read_write.c| 14 --
 include/linux/compat.h |  4 
 include/uapi/asm-generic/unistd.h  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h|  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15 n32 ioctl   compat_sys_ioctl
 16 n32 pread64 sys_pread64
 17 n32 pwrite64sys_pwrite64
-18 n32 readv   compat_sys_readv
-19 n32 writev  compat_sys_writev
+18 n32 readv   sys_readv
+19 n32 writev  sys_writev
 20 n32 access  sys_access
 21 n32 pipesysm_pipe
 22 n32 _newselect  compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142o32 _newselect  sys_select  
compat_sys_select
 143o32 flock   sys_flock
 144o32 msync   sys_msync
-145o32 readv   sys_readv   
compat_sys_readv
-146o32 writev  sys_writev  
compat_sys_writev
+145o32 readv   sys_readv
+146o32 writev  sys_writev
 147o32 cacheflush  sys_cacheflush
 148o32 cachectlsys_cachectl
 149o32 sysmips __sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common  flock   sys_flock
 144common  msync   sys_msync
-145common  readv   sys_readv   
compat_sys_readv
-146common  writev  sys_writev  
compat_sys_writev
+145common  readv   sys_readv
+146common  writev  sys_writev
 147common  getsid  sys_getsid
 148common  fdatasync   sys_fdatasync
 149common  _sysctl sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common

[PATCH 4/9] fs: handle the compat case in import_iovec

2020-09-18 Thread Christoph Hellwig
Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

Signed-off-by: Christoph Hellwig 
---
 block/scsi_ioctl.c |  12 +---
 drivers/scsi/sg.c  |   9 +--
 fs/aio.c   |  38 +---
 fs/io_uring.c  |  12 +---
 fs/read_write.c| 127 +++--
 fs/splice.c|   2 +-
 include/linux/compat.h |   6 --
 include/linux/fs.h |   7 +--
 include/linux/uio.h|   7 ---
 lib/iov_iter.c |  30 +-
 mm/process_vm_access.c |   9 +--
 net/compat.c   |   4 +-
 security/keys/compat.c |   5 +-
 13 files changed, 83 insertions(+), 185 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk 
*bd_disk,
struct iov_iter i;
struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   ret = compat_import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
-   else
-#endif
-   ret = import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
+   ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+  hdr->iovec_count, 0, , );
if (ret < 0)
goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
struct iovec *iov = NULL;
struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   res = compat_import_iovec(rw, hp->dxferp, iov_count,
- 0, , );
-   else
-#endif
-   res = import_iovec(rw, hp->dxferp, iov_count,
-  0, , );
+   res = import_iovec(rw, hp->dxferp, iov_count, 0, , );
if (res < 0)
return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..b377f5c2048e18 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1478,8 +1478,7 @@ static int aio_prep_rw(struct kiocb *req, const struct 
iocb *iocb)
 }
 
 static ssize_t aio_setup_rw(int rw, const struct iocb *iocb,
-   struct iovec **iovec, bool vectored, bool compat,
-   struct iov_iter *iter)
+   struct iovec **iovec, bool vectored, struct iov_iter *iter)
 {
void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf;
size_t len = iocb->aio_nbytes;
@@ -1489,11 +1488,6 @@ static ssize_t aio_setup_rw(int rw, const struct iocb 
*iocb,
*iovec = NULL;
return ret;
}
-#ifdef CONFIG_COMPAT
-   if (compat)
-   return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-   iter);
-#endif
return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
@@ -1517,8 +1511,7 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t 
ret)
}
 }
 
-static int aio_read(struct kiocb *req, const struct iocb *iocb,
-   bool vectored, bool compat)
+static int aio_read(struct kiocb *req, const struct iocb *iocb, bool vectored)
 {
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct iov_iter iter;
@@ -1535,7 +1528,7 @@ static int aio_read(struct kiocb *req, const struct iocb 
*iocb,
if (unlikely(!file->f_op->read_iter))
return -EINVAL;
 
-   ret = aio_setup_rw(READ, iocb, , vectored, compat, );
+   ret = aio_setup_rw(READ, iocb, , vectored, );
if (ret < 0)
return ret;
ret = rw_verify_area(READ, file, >ki_pos, iov_iter_count());
@@ -1545,8 +1538,7 @@ static int aio_read(struct kiocb *req, const struct iocb 
*iocb,
return ret;
 }
 
-static int aio_write(struct kiocb *req, const struct iocb *iocb,
-bool vectored, bool compat)
+static int aio_write(struct kiocb *req, const struct iocb *iocb, bool vectored)
 {
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct iov_iter iter;
@@ -1563,7 +1555,7 @@ static int aio_write(struct kiocb *req, const struct iocb 
*iocb,
if (unlikely(!file->f_op->write_iter))
return -EINVAL;
 
-   ret = aio_setup_rw(WRITE, iocb, , vectored, compat, );
+   ret = aio_setup_rw(WRITE, iocb, , vectored, );
if (ret < 0)
r

Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask

2020-09-15 Thread Christoph Hellwig
On Wed, Sep 09, 2020 at 07:36:04PM +1000, Alexey Kardashevskiy wrote:
> I want dma_get_required_mask() to return the bigger mask always.
> 
> Now it depends on (in dma_alloc_direct()):
> 1. dev->dma_ops_bypass: set via pci_set_(coherent_)dma_mask();
> 2. dev->coherent_dma_mask - the same;
> 3. dev->bus_dma_limit - usually not set at all.
> 
> So until we set the mask, dma_get_required_mask() returns smaller mask.
> So aacraid and likes (which calls dma_get_required_mask() before setting
> it) will remain prone for breaks.

Well, the original intent of dma_get_required_mask is to return the
mask that the driver then uses to figure out what to set, so what aacraid
does fits that use case.  Of course that idea is pretty bogus for
PCIe devices.

I suspect the right fix is to just not query dma_get_required_mask for
PCIe devices in aacraid (and other drivers that do something similar).


Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask

2020-09-08 Thread Christoph Hellwig
On Tue, Sep 08, 2020 at 10:06:56PM +1000, Alexey Kardashevskiy wrote:
> On 08/09/2020 15:44, Christoph Hellwig wrote:
>> On Tue, Sep 08, 2020 at 11:51:06AM +1000, Alexey Kardashevskiy wrote:
>>> What is dma_get_required_mask() for anyway? What "requires" what here?
>>
>> Yes, it is a really odd API.  It comes from classic old PCI where
>> 64-bit addressing required an additional bus cycle, and various devices
>> had different addressing schemes, with the smaller addresses beeing
>> more efficient.  So this allows the driver to request the "required"
>> addressing mode to address all memory.  "preferred" might be a better
>> name as we'll bounce buffer if it isn't met.  I also don't really see
>> why a driver would ever want to use it for a modern PCIe device.
>
>
> a-ha, this makes more sense, thanks. Then I guess we need to revert that 
> one bit from yours f1565c24b596, do not we?

Why?  The was the original intent of the API, but now we also use
internally to check the addressing capabilities.


Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask

2020-09-07 Thread Christoph Hellwig
On Tue, Sep 08, 2020 at 11:51:06AM +1000, Alexey Kardashevskiy wrote:
> What is dma_get_required_mask() for anyway? What "requires" what here?

Yes, it is a really odd API.  It comes from classic old PCI where
64-bit addressing required an additional bus cycle, and various devices
had different addressing schemes, with the smaller addresses beeing
more efficient.  So this allows the driver to request the "required"
addressing mode to address all memory.  "preferred" might be a better
name as we'll bounce buffer if it isn't met.  I also don't really see
why a driver would ever want to use it for a modern PCIe device.


Re: [PATCH 12/14] x86: remove address space overrides using set_fs()

2020-09-04 Thread Christoph Hellwig
On Fri, Sep 04, 2020 at 08:38:13AM +0200, Christoph Hellwig wrote:
> > Wait a sec... how is that supposed to build with X86_5LEVEL?  Do you mean
> > 
> > #define LOAD_TASK_SIZE_MINUS_N(n) \
> > ALTERNATIVE __stringify(mov $((1 << 47) - 4096 - (n)),%rdx), \
> > __stringify(mov $((1 << 56) - 4096 - (n)),%rdx), 
> > X86_FEATURE_LA57
> > 
> > there?
> 
> Don't ask me about the how, but it builds and works with X86_5LEVEL,
> and the style is copied from elsewhere..

Actually, it doesn't any more.  Looks like the change to pass the n
parameter as suggested by Linus broke the previously working version.


Re: [PATCH 12/14] x86: remove address space overrides using set_fs()

2020-09-04 Thread Christoph Hellwig
On Fri, Sep 04, 2020 at 03:55:10AM +0100, Al Viro wrote:
> On Thu, Sep 03, 2020 at 04:22:40PM +0200, Christoph Hellwig wrote:
> 
> > diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
> > index c8a85b512796e1..94f7be4971ed04 100644
> > --- a/arch/x86/lib/getuser.S
> > +++ b/arch/x86/lib/getuser.S
> > @@ -35,10 +35,19 @@
> >  #include 
> >  #include 
> >  
> > +#ifdef CONFIG_X86_5LEVEL
> > +#define LOAD_TASK_SIZE_MINUS_N(n) \
> > +   ALTERNATIVE "mov $((1 << 47) - 4096 - (n)),%rdx", \
> > +   "mov $((1 << 56) - 4096 - (n)),%rdx", X86_FEATURE_LA57
> > +#else
> > +#define LOAD_TASK_SIZE_MINUS_N(n) \
> > +   mov $(TASK_SIZE_MAX - (n)),%_ASM_DX
> > +#endif
> 
> Wait a sec... how is that supposed to build with X86_5LEVEL?  Do you mean
> 
> #define LOAD_TASK_SIZE_MINUS_N(n) \
>   ALTERNATIVE __stringify(mov $((1 << 47) - 4096 - (n)),%rdx), \
>   __stringify(mov $((1 << 56) - 4096 - (n)),%rdx), 
> X86_FEATURE_LA57
> 
> there?

Don't ask me about the how, but it builds and works with X86_5LEVEL,
and the style is copied from elsewhere..


Re: [PATCH 0/2] dma-mapping: update default segment_boundary_mask

2020-09-03 Thread Christoph Hellwig
Applied with the recommendation from Michael folded in.


Re: [PATCH 1/2] dma-mapping: introduce dma_get_seg_boundary_nr_pages()

2020-09-03 Thread Christoph Hellwig
On Thu, Sep 03, 2020 at 01:57:39PM +0300, Andy Shevchenko wrote:
> > +{
> > +   if (!dev)
> > +   return (U32_MAX >> page_shift) + 1;
> > +   return (dma_get_seg_boundary(dev) >> page_shift) + 1;
> 
> Can it be better to do something like
>   unsigned long boundary = dev ? dma_get_seg_boundary(dev) : U32_MAX;
> 
>   return (boundary >> page_shift) + 1;
> 
> ?

I don't really see what that would buy us.


Re: [PATCH 14/14] powerpc: remove address space overrides using set_fs()

2020-09-03 Thread Christoph Hellwig
On Thu, Sep 03, 2020 at 05:49:09PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 03, 2020 at 05:43:25PM +0200, Christophe Leroy wrote:
> >
> >
> > Le 03/09/2020 à 16:22, Christoph Hellwig a écrit :
> >> Stop providing the possibility to override the address space using
> >> set_fs() now that there is no need for that any more.
> >>
> >> Signed-off-by: Christoph Hellwig 
> >> ---
> >
> >
> >>   -static inline int __access_ok(unsigned long addr, unsigned long size,
> >> -  mm_segment_t seg)
> >> +static inline bool __access_ok(unsigned long addr, unsigned long size)
> >>   {
> >> -  if (addr > seg.seg)
> >> -  return 0;
> >> -  return (size == 0 || size - 1 <= seg.seg - addr);
> >> +  if (addr >= TASK_SIZE_MAX)
> >> +  return false;
> >> +  return size == 0 || size <= TASK_SIZE_MAX - addr;
> >>   }
> >
> > You don't need to test size == 0 anymore. It used to be necessary because 
> > of the 'size - 1', as size is unsigned.
> >
> > Now you can directly do
> >
> > return size <= TASK_SIZE_MAX - addr;
> >
> > If size is 0, this will always be true (because you already know that addr 
> > is not >= TASK_SIZE_MAX
> 
> True.  What do you think of Linus' comment about always using the
> ppc32 version on ppc64 as well with this?

i.e. something like this folded in:

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 5363f7fc6dd06c..be070254e50943 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -11,26 +11,14 @@
 #ifdef __powerpc64__
 /* We use TASK_SIZE_USER64 as TASK_SIZE is not constant */
 #define TASK_SIZE_MAX  TASK_SIZE_USER64
-
-/*
- * This check is sufficient because there is a large enough gap between user
- * addresses and the kernel addresses.
- */
-static inline bool __access_ok(unsigned long addr, unsigned long size)
-{
-   return addr < TASK_SIZE_MAX && size < TASK_SIZE_MAX;
-}
-
 #else
 #define TASK_SIZE_MAX  TASK_SIZE
+#endif
 
 static inline bool __access_ok(unsigned long addr, unsigned long size)
 {
-   if (addr >= TASK_SIZE_MAX)
-   return false;
-   return size == 0 || size <= TASK_SIZE_MAX - addr;
+   return addr < TASK_SIZE_MAX && size <= TASK_SIZE_MAX - addr;
 }
-#endif /* __powerpc64__ */
 
 #define access_ok(addr, size)  \
(__chk_user_ptr(addr),  \


Re: [PATCH 14/14] powerpc: remove address space overrides using set_fs()

2020-09-03 Thread Christoph Hellwig
On Thu, Sep 03, 2020 at 05:43:25PM +0200, Christophe Leroy wrote:
>
>
> Le 03/09/2020 à 16:22, Christoph Hellwig a écrit :
>> Stop providing the possibility to override the address space using
>> set_fs() now that there is no need for that any more.
>>
>> Signed-off-by: Christoph Hellwig 
>> ---
>
>
>>   -static inline int __access_ok(unsigned long addr, unsigned long size,
>> -mm_segment_t seg)
>> +static inline bool __access_ok(unsigned long addr, unsigned long size)
>>   {
>> -if (addr > seg.seg)
>> -return 0;
>> -return (size == 0 || size - 1 <= seg.seg - addr);
>> +if (addr >= TASK_SIZE_MAX)
>> +return false;
>> +return size == 0 || size <= TASK_SIZE_MAX - addr;
>>   }
>
> You don't need to test size == 0 anymore. It used to be necessary because 
> of the 'size - 1', as size is unsigned.
>
> Now you can directly do
>
>   return size <= TASK_SIZE_MAX - addr;
>
> If size is 0, this will always be true (because you already know that addr 
> is not >= TASK_SIZE_MAX

True.  What do you think of Linus' comment about always using the
ppc32 version on ppc64 as well with this?


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Christoph Hellwig
On Thu, Sep 03, 2020 at 03:36:29PM +0100, Al Viro wrote:
> FWIW, vfs.git#for-next is always a merge of independent branches; I don't
> put stuff directly into #for-next - too easy to lose that way.
> 
> IOW, that would be something like #base.set_fs, included into #for-next
> merge set.  And I've no problem with never-rebased branches...
> 
> Where in the mainline are the most recent prereqs of this series?

I can't think of anything past -rc1, but I haven't actually checked.


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Christoph Hellwig
On Thu, Sep 03, 2020 at 03:28:03PM +0100, Al Viro wrote:
> On Thu, Sep 03, 2020 at 04:22:28PM +0200, Christoph Hellwig wrote:
> 
> > Besides x86 and powerpc I plan to eventually convert all other
> > architectures, although this will be a slow process, starting with the
> > easier ones once the infrastructure is merged.  The process to convert
> > architectures is roughtly:
> > 
> >  (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
> >  (2) implement __get_kernel_nofault and __put_kernel_nofault
> >  (3) remove the arch specific address limitation functionality
> 
> The one to really watch out for is sparc; I have something in that
> direction, will resurrect as soon as I'm done with eventpoll analysis...
> 
> I can live with this series; do you want that in vfs.git#for-next?

Either that or a separate tree is fine with me.  It would be good to
eventually have a non-rebased stable tree so that other arch trees
can work from it, though.


[PATCH 14/14] powerpc: remove address space overrides using set_fs()

2020-09-03 Thread Christoph Hellwig
Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/Kconfig   |  1 -
 arch/powerpc/include/asm/processor.h   |  7 
 arch/powerpc/include/asm/thread_info.h |  5 +--
 arch/powerpc/include/asm/uaccess.h | 57 +++---
 arch/powerpc/kernel/signal.c   |  3 --
 arch/powerpc/lib/sstep.c   |  6 +--
 6 files changed, 18 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4d1c18a3b83977..65bed1fdeaad71 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -249,7 +249,6 @@ config PPC
select PCI_SYSCALL  if PCI
select PPC_DAWR if PPC64
select RTC_LIB
-   select SET_FS
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa42..f01e4d650c520a 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -83,10 +83,6 @@ struct task_struct;
 void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp);
 void release_thread(struct task_struct *);
 
-typedef struct {
-   unsigned long seg;
-} mm_segment_t;
-
 #define TS_FPR(i) fp_state.fpr[i][TS_FPROFFSET]
 #define TS_CKFPR(i) ckfp_state.fpr[i][TS_FPROFFSET]
 
@@ -148,7 +144,6 @@ struct thread_struct {
unsigned long   ksp_vsid;
 #endif
struct pt_regs  *regs;  /* Pointer to saved register state */
-   mm_segment_taddr_limit; /* for get_fs() validation */
 #ifdef CONFIG_BOOKE
/* BookE base exception scratch space; align on cacheline */
unsigned long   normsave[8] cacheline_aligned;
@@ -295,7 +290,6 @@ struct thread_struct {
 #define INIT_THREAD { \
.ksp = INIT_SP, \
.ksp_limit = INIT_SP_LIMIT, \
-   .addr_limit = KERNEL_DS, \
.pgdir = swapper_pg_dir, \
.fpexc_mode = MSR_FE0 | MSR_FE1, \
SPEFSCR_INIT \
@@ -303,7 +297,6 @@ struct thread_struct {
 #else
 #define INIT_THREAD  { \
.ksp = INIT_SP, \
-   .addr_limit = KERNEL_DS, \
.fpexc_mode = 0, \
 }
 #endif
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index ca6c9702570494..46a210b03d2b80 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -90,7 +90,6 @@ void arch_setup_new_exec(void);
 #define TIF_SYSCALL_TRACE  0   /* syscall trace active */
 #define TIF_SIGPENDING 1   /* signal pending */
 #define TIF_NEED_RESCHED   2   /* rescheduling necessary */
-#define TIF_FSCHECK3   /* Check FS is USER_DS on return */
 #define TIF_SYSCALL_EMU4   /* syscall emulation active */
 #define TIF_RESTORE_TM 5   /* need to restore TM FP/VEC/VSX */
 #define TIF_PATCH_PENDING  6   /* pending live patching update */
@@ -130,7 +129,6 @@ void arch_setup_new_exec(void);
 #define _TIF_SYSCALL_TRACEPOINT(1<
 #include 
 
-/*
- * The fs value determines whether argument validity checking should be
- * performed or not.  If get_fs() == USER_DS, checking is performed, with
- * get_fs() == KERNEL_DS, checking is bypassed.
- *
- * For historical reasons, these macros are grossly misnamed.
- *
- * The fs/ds values are now the highest legal address in the "segment".
- * This simplifies the checking in the routines below.
- */
-
-#define MAKE_MM_SEG(s)  ((mm_segment_t) { (s) })
-
-#define KERNEL_DS  MAKE_MM_SEG(~0UL)
 #ifdef __powerpc64__
 /* We use TASK_SIZE_USER64 as TASK_SIZE is not constant */
-#define USER_DSMAKE_MM_SEG(TASK_SIZE_USER64 - 1)
-#else
-#define USER_DSMAKE_MM_SEG(TASK_SIZE - 1)
-#endif
-
-#define get_fs()   (current->thread.addr_limit)
-
-static inline void set_fs(mm_segment_t fs)
-{
-   current->thread.addr_limit = fs;
-   /* On user-mode return check addr_limit (fs) is correct */
-   set_thread_flag(TIF_FSCHECK);
-}
-
-#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
-#define user_addr_max()(get_fs().seg)
+#define TASK_SIZE_MAX  TASK_SIZE_USER64
 
-#ifdef __powerpc64__
 /*
- * This check is sufficient because there is a large enough
- * gap between user addresses and the kernel addresses
+ * This check is sufficient because there is a large enough gap between user
+ * addresses and the kernel addresses.
  */
-#define __access_ok(addr, size, segment)   \
-   (((addr) <= (segment).seg) && ((size) <= (segment).seg))
+static inline bool __access_ok(unsigned long addr, unsigned long size)
+{
+   return addr < TASK_SIZE_MAX && size < TASK_SIZE_MAX;
+}
 
 #else
+#define TASK_SIZE_MAX  TASK_SIZE
 
-s

[PATCH 13/14] powerpc: use non-set_fs based maccess routines

2020-09-03 Thread Christoph Hellwig
Provide __get_kernel_nofault and __put_kernel_nofault routines to
implement the maccess routines without messing with set_fs and without
opening up access to user space.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/uaccess.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 00699903f1efca..7fe3531ad36a77 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -623,4 +623,20 @@ do {   
\
__put_user_goto(*(u8*)(_src + _i), (u8 __user *)(_dst + _i), 
e);\
 } while (0)
 
+#define HAVE_GET_KERNEL_NOFAULT
+
+#define __get_kernel_nofault(dst, src, type, err_label)
\
+do {   \
+   int __kr_err;   \
+   \
+   __get_user_size_allowed(*((type *)(dst)), (__force type __user *)(src),\
+   sizeof(type), __kr_err);\
+   if (unlikely(__kr_err)) \
+   goto err_label; \
+} while (0)
+
+#define __put_kernel_nofault(dst, src, type, err_label)
\
+   __put_user_size_goto(*((type *)(src)),  \
+   (__force type __user *)(dst), sizeof(type), err_label)
+
 #endif /* _ARCH_POWERPC_UACCESS_H */
-- 
2.28.0



[PATCH 07/14] uaccess: add infrastructure for kernel builds with set_fs()

2020-09-03 Thread Christoph Hellwig
Add a CONFIG_SET_FS option that is selected by architecturess that
implement set_fs, which is all of them initially.  If the option is not
set stubs for routines related to overriding the address space are
provided so that architectures can start to opt out of providing set_fs.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
---
 arch/Kconfig|  3 +++
 arch/alpha/Kconfig  |  1 +
 arch/arc/Kconfig|  1 +
 arch/arm/Kconfig|  1 +
 arch/arm64/Kconfig  |  1 +
 arch/c6x/Kconfig|  1 +
 arch/csky/Kconfig   |  1 +
 arch/h8300/Kconfig  |  1 +
 arch/hexagon/Kconfig|  1 +
 arch/ia64/Kconfig   |  1 +
 arch/m68k/Kconfig   |  1 +
 arch/microblaze/Kconfig |  1 +
 arch/mips/Kconfig   |  1 +
 arch/nds32/Kconfig  |  1 +
 arch/nios2/Kconfig  |  1 +
 arch/openrisc/Kconfig   |  1 +
 arch/parisc/Kconfig |  1 +
 arch/powerpc/Kconfig|  1 +
 arch/riscv/Kconfig  |  1 +
 arch/s390/Kconfig   |  1 +
 arch/sh/Kconfig |  1 +
 arch/sparc/Kconfig  |  1 +
 arch/um/Kconfig |  1 +
 arch/x86/Kconfig|  1 +
 arch/xtensa/Kconfig |  1 +
 include/linux/uaccess.h | 18 ++
 26 files changed, 45 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493fc..3fab619a6aa51a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -24,6 +24,9 @@ config KEXEC_ELF
 config HAVE_IMA_KEXEC
bool
 
+config SET_FS
+   bool
+
 config HOTPLUG_SMT
bool
 
diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 9c5f06e8eb9bc0..d6e9fc7a7b19e2 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -39,6 +39,7 @@ config ALPHA
select OLD_SIGSUSPEND
select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67
select MMU_GATHER_NO_RANGE
+   select SET_FS
help
  The Alpha is a 64-bit general-purpose processor designed and
  marketed by the Digital Equipment Corporation of blessed memory,
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index ba00c4e1e1c271..c49f5754a11e40 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -48,6 +48,7 @@ config ARC
select PCI_SYSCALL if PCI
select PERF_USE_VMALLOC if ARC_CACHE_VIPT_ALIASING
select HAVE_ARCH_JUMP_LABEL if ISA_ARCV2 && !CPU_ENDIAN_BE32
+   select SET_FS
 
 config ARCH_HAS_CACHE_LINE_SIZE
def_bool y
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e00d94b1665876..87e1478a42dc4f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -118,6 +118,7 @@ config ARM
select PCI_SYSCALL if PCI
select PERF_USE_VMALLOC
select RTC_LIB
+   select SET_FS
select SYS_SUPPORTS_APM_EMULATION
# Above selects are sorted alphabetically; please add new ones
# according to that.  Thanks.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6d232837cbeee8..fbd9e35bef096f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -192,6 +192,7 @@ config ARM64
select PCI_SYSCALL if PCI
select POWER_RESET
select POWER_SUPPLY
+   select SET_FS
select SPARSE_IRQ
select SWIOTLB
select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index 6444ebfd06a665..48d66bf0465d68 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -22,6 +22,7 @@ config C6X
select GENERIC_CLOCKEVENTS
select MODULES_USE_ELF_RELA
select MMU_GATHER_NO_RANGE if MMU
+   select SET_FS
 
 config MMU
def_bool n
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index 3d5afb5f568543..2836f6e76fdb2d 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -78,6 +78,7 @@ config CSKY
select PCI_DOMAINS_GENERIC if PCI
select PCI_SYSCALL if PCI
select PCI_MSI if PCI
+   select SET_FS
 
 config LOCKDEP_SUPPORT
def_bool y
diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index d11666d538fea8..7945de067e9fcc 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -25,6 +25,7 @@ config H8300
select HAVE_ARCH_KGDB
select HAVE_ARCH_HASH
select CPU_NO_EFFICIENT_FFS
+   select SET_FS
select UACCESS_MEMCPY
 
 config CPU_BIG_ENDIAN
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 667cfc511cf999..f2afabbadd430e 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -31,6 +31,7 @@ config HEXAGON
select GENERIC_CLOCKEVENTS_BROADCAST
select MODULES_USE_ELF_RELA
select GENERIC_CPU_DEVICES
+   select SET_FS
help
  Qualcomm Hexagon is a processor architecture designed for high
  performance and low power across a wide variety of applications.
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 5b4ec80bf5863a..22a6853840e235 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -56,6 +56,7 @@ config IA64
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
select NUMA if !FLATMEM
+   selec

[PATCH 12/14] x86: remove address space overrides using set_fs()

2020-09-03 Thread Christoph Hellwig
Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more.  To properly
handle the TASK_SIZE_MAX checking for 4 vs 5-level page tables on
x86 a new alternative is introduced, which just like the one in
entry_64.S has to use the hardcoded virtual address bits to escape
the fact that TASK_SIZE_MAX isn't actually a constant when 5-level
page tables are enabled.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
---
 arch/x86/Kconfig   |  1 -
 arch/x86/ia32/ia32_aout.c  |  1 -
 arch/x86/include/asm/processor.h   | 11 +--
 arch/x86/include/asm/thread_info.h |  2 --
 arch/x86/include/asm/uaccess.h | 26 +
 arch/x86/kernel/asm-offsets.c  |  3 --
 arch/x86/lib/getuser.S | 47 +++---
 arch/x86/lib/putuser.S | 25 
 8 files changed, 39 insertions(+), 77 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f85c13355732fe..7101ac64bb209d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -237,7 +237,6 @@ config X86
select HAVE_ARCH_KCSAN  if X86_64
select X86_FEATURE_NAMESif PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS
-   select SET_FS
imply IMA_SECURE_AND_OR_TRUSTED_BOOTif EFI
 
 config INSTRUCTION_DECODER
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index ca8a657edf5977..a09fc37ead9d47 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -239,7 +239,6 @@ static int load_aout_binary(struct linux_binprm *bprm)
(regs)->ss = __USER32_DS;
regs->r8 = regs->r9 = regs->r10 = regs->r11 =
regs->r12 = regs->r13 = regs->r14 = regs->r15 = 0;
-   set_fs(USER_DS);
return 0;
 }
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1618eeb08361a9..189573d95c3af6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -482,10 +482,6 @@ extern unsigned int fpu_user_xstate_size;
 
 struct perf_event;
 
-typedef struct {
-   unsigned long   seg;
-} mm_segment_t;
-
 struct thread_struct {
/* Cached TLS descriptors: */
struct desc_struct  tls_array[GDT_ENTRY_TLS_ENTRIES];
@@ -538,8 +534,6 @@ struct thread_struct {
 */
unsigned long   iopl_emul;
 
-   mm_segment_taddr_limit;
-
unsigned intsig_on_uaccess_err:1;
 
/* Floating point and extended processor state */
@@ -785,15 +779,12 @@ static inline void spin_lock_prefetch(const void *x)
 #define INIT_THREAD  {   \
.sp0= TOP_OF_INIT_STACK,  \
.sysenter_cs= __KERNEL_CS,\
-   .addr_limit = KERNEL_DS,  \
 }
 
 #define KSTK_ESP(task) (task_pt_regs(task)->sp)
 
 #else
-#define INIT_THREAD  { \
-   .addr_limit = KERNEL_DS,\
-}
+#define INIT_THREAD { }
 
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index 267701ae3d86dd..44733a4bfc4294 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -102,7 +102,6 @@ struct thread_info {
 #define TIF_SYSCALL_TRACEPOINT 28  /* syscall tracepoint instrumentation */
 #define TIF_ADDR32 29  /* 32-bit address space on 64 bits */
 #define TIF_X3230  /* 32-bit native x86-64 binary 
*/
-#define TIF_FSCHECK31  /* Check FS is USER_DS on return */
 
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
@@ -131,7 +130,6 @@ struct thread_info {
 #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT)
 #define _TIF_ADDR32(1 << TIF_ADDR32)
 #define _TIF_X32   (1 << TIF_X32)
-#define _TIF_FSCHECK   (1 << TIF_FSCHECK)
 
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW_BASE   \
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index ecefaffd15d4c8..a4ceda0510ea87 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -12,30 +12,6 @@
 #include 
 #include 
 
-/*
- * The fs value determines whether argument validity checking should be
- * performed or not.  If get_fs() == USER_DS, checking is performed, with
- * get_fs() == KERNEL_DS, checking is bypassed.
- *
- * For historical reasons, these macros are grossly misnamed.
- */
-
-#define MAKE_MM_SEG(s) ((mm_segment_t) { (s) })
-
-#define KERNEL_DS  MAKE_MM_SEG(-

[PATCH 11/14] x86: make TASK_SIZE_MAX usable from assembly code

2020-09-03 Thread Christoph Hellwig
For 64-bit the only thing missing was a strategic _AC, and for 32-bit we
need to use __PAGE_OFFSET instead of PAGE_OFFSET in the TASK_SIZE
definition to escape the explicit unsigned long cast.  This just works
because __PAGE_OFFSET is defined using _AC itself and thus never needs
the cast anyway.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
---
 arch/x86/include/asm/page_32_types.h | 4 ++--
 arch/x86/include/asm/page_64_types.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/page_32_types.h 
b/arch/x86/include/asm/page_32_types.h
index 26236925fb2c36..f462895a33e452 100644
--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -44,8 +44,8 @@
 /*
  * User space process size: 3GB (default).
  */
-#define IA32_PAGE_OFFSET   PAGE_OFFSET
-#define TASK_SIZE  PAGE_OFFSET
+#define IA32_PAGE_OFFSET   __PAGE_OFFSET
+#define TASK_SIZE  __PAGE_OFFSET
 #define TASK_SIZE_LOW  TASK_SIZE
 #define TASK_SIZE_MAX  TASK_SIZE
 #define DEFAULT_MAP_WINDOW TASK_SIZE
diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 996595c9897e0a..838515daf87b36 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -76,7 +76,7 @@
  *
  * With page table isolation enabled, we map the LDT in ... [stay tuned]
  */
-#define TASK_SIZE_MAX  ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
+#define TASK_SIZE_MAX  ((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
 
 #define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE)
 
-- 
2.28.0



[PATCH 10/14] x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32, 64}_types.h

2020-09-03 Thread Christoph Hellwig
At least for 64-bit this moves them closer to some of the defines
they are based on, and it prepares for using the TASK_SIZE_MAX
definition from assembly.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
---
 arch/x86/include/asm/page_32_types.h | 11 +++
 arch/x86/include/asm/page_64_types.h | 38 +
 arch/x86/include/asm/processor.h | 49 
 3 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/page_32_types.h 
b/arch/x86/include/asm/page_32_types.h
index 565ad755c785e2..26236925fb2c36 100644
--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -41,6 +41,17 @@
 #define __VIRTUAL_MASK_SHIFT   32
 #endif /* CONFIG_X86_PAE */
 
+/*
+ * User space process size: 3GB (default).
+ */
+#define IA32_PAGE_OFFSET   PAGE_OFFSET
+#define TASK_SIZE  PAGE_OFFSET
+#define TASK_SIZE_LOW  TASK_SIZE
+#define TASK_SIZE_MAX  TASK_SIZE
+#define DEFAULT_MAP_WINDOW TASK_SIZE
+#define STACK_TOP  TASK_SIZE
+#define STACK_TOP_MAX  STACK_TOP
+
 /*
  * Kernel image size is limited to 512 MB (see in arch/x86/kernel/head_32.S)
  */
diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 288b065955b729..996595c9897e0a 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -58,6 +58,44 @@
 #define __VIRTUAL_MASK_SHIFT   47
 #endif
 
+/*
+ * User space process size.  This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen.  This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
+ */
+#define TASK_SIZE_MAX  ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
+
+#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE)
+
+/* This decides where the kernel will search for a free chunk of vm
+ * space during mmap's.
+ */
+#define IA32_PAGE_OFFSET   ((current->personality & ADDR_LIMIT_3GB) ? \
+   0xc000 : 0xe000)
+
+#define TASK_SIZE_LOW  (test_thread_flag(TIF_ADDR32) ? \
+   IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW)
+#define TASK_SIZE  (test_thread_flag(TIF_ADDR32) ? \
+   IA32_PAGE_OFFSET : TASK_SIZE_MAX)
+#define TASK_SIZE_OF(child)((test_tsk_thread_flag(child, TIF_ADDR32)) ? \
+   IA32_PAGE_OFFSET : TASK_SIZE_MAX)
+
+#define STACK_TOP  TASK_SIZE_LOW
+#define STACK_TOP_MAX  TASK_SIZE_MAX
+
 /*
  * Maximum kernel image size is limited to 1 GiB, due to the fixmap living
  * in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S).
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 97143d87994c24..1618eeb08361a9 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -782,17 +782,6 @@ static inline void spin_lock_prefetch(const void *x)
 })
 
 #ifdef CONFIG_X86_32
-/*
- * User space process size: 3GB (default).
- */
-#define IA32_PAGE_OFFSET   PAGE_OFFSET
-#define TASK_SIZE  PAGE_OFFSET
-#define TASK_SIZE_LOW  TASK_SIZE
-#define TASK_SIZE_MAX  TASK_SIZE
-#define DEFAULT_MAP_WINDOW TASK_SIZE
-#define STACK_TOP  TASK_SIZE
-#define STACK_TOP_MAX  STACK_TOP
-
 #define INIT_THREAD  {   \
.sp0= TOP_OF_INIT_STACK,  \
.sysenter_cs= __KERNEL_CS,\
@@ -802,44 +791,6 @@ static inline void spin_lock_prefetch(const void *x)
 #define KSTK_ESP(task) (task_pt_regs(task)->sp)
 
 #else
-/*
- * User space process size.  This is the first address outside the user range.
- * There are a few constraints that determine this:
- *
- * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
- * address, then that syscall will enter the kernel with a
- * non-canonical return address, and SYSRET will explode dangerously.
- * We avoid this particular problem by preventing anything executable
- * from being mapped at the maximum canonical address.
- *
- * On AMD CPUs in the Ryzen family, there's a nasty bug

[PATCH 09/14] lkdtm: remove set_fs-based tests

2020-09-03 Thread Christoph Hellwig
Once we can't manipulate the address limit, we also can't test what
happens when the manipulation is abused.

Signed-off-by: Christoph Hellwig 
---
 drivers/misc/lkdtm/bugs.c   | 10 --
 drivers/misc/lkdtm/core.c   |  2 --
 drivers/misc/lkdtm/lkdtm.h  |  2 --
 drivers/misc/lkdtm/usercopy.c   | 15 ---
 tools/testing/selftests/lkdtm/tests.txt |  2 --
 5 files changed, 31 deletions(-)

diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
index 4dfbfd51bdf774..a0675d4154d2fd 100644
--- a/drivers/misc/lkdtm/bugs.c
+++ b/drivers/misc/lkdtm/bugs.c
@@ -312,16 +312,6 @@ void lkdtm_CORRUPT_LIST_DEL(void)
pr_err("list_del() corruption not detected!\n");
 }
 
-/* Test if unbalanced set_fs(KERNEL_DS)/set_fs(USER_DS) check exists. */
-void lkdtm_CORRUPT_USER_DS(void)
-{
-   pr_info("setting bad task size limit\n");
-   set_fs(KERNEL_DS);
-
-   /* Make sure we do not keep running with a KERNEL_DS! */
-   force_sig(SIGKILL);
-}
-
 /* Test that VMAP_STACK is actually allocating with a leading guard page */
 void lkdtm_STACK_GUARD_PAGE_LEADING(void)
 {
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df916632..97803f213d9d45 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -112,7 +112,6 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(CORRUPT_STACK_STRONG),
CRASHTYPE(CORRUPT_LIST_ADD),
CRASHTYPE(CORRUPT_LIST_DEL),
-   CRASHTYPE(CORRUPT_USER_DS),
CRASHTYPE(STACK_GUARD_PAGE_LEADING),
CRASHTYPE(STACK_GUARD_PAGE_TRAILING),
CRASHTYPE(UNSET_SMEP),
@@ -172,7 +171,6 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(USERCOPY_STACK_FRAME_FROM),
CRASHTYPE(USERCOPY_STACK_BEYOND),
CRASHTYPE(USERCOPY_KERNEL),
-   CRASHTYPE(USERCOPY_KERNEL_DS),
CRASHTYPE(STACKLEAK_ERASING),
CRASHTYPE(CFI_FORWARD_PROTO),
 #ifdef CONFIG_X86_32
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 8878538b2c1322..6dec4c9b442ff3 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -27,7 +27,6 @@ void lkdtm_OVERFLOW_UNSIGNED(void);
 void lkdtm_ARRAY_BOUNDS(void);
 void lkdtm_CORRUPT_LIST_ADD(void);
 void lkdtm_CORRUPT_LIST_DEL(void);
-void lkdtm_CORRUPT_USER_DS(void);
 void lkdtm_STACK_GUARD_PAGE_LEADING(void);
 void lkdtm_STACK_GUARD_PAGE_TRAILING(void);
 void lkdtm_UNSET_SMEP(void);
@@ -96,7 +95,6 @@ void lkdtm_USERCOPY_STACK_FRAME_TO(void);
 void lkdtm_USERCOPY_STACK_FRAME_FROM(void);
 void lkdtm_USERCOPY_STACK_BEYOND(void);
 void lkdtm_USERCOPY_KERNEL(void);
-void lkdtm_USERCOPY_KERNEL_DS(void);
 
 /* lkdtm_stackleak.c */
 void lkdtm_STACKLEAK_ERASING(void);
diff --git a/drivers/misc/lkdtm/usercopy.c b/drivers/misc/lkdtm/usercopy.c
index b833367a45d053..109e8d4302c113 100644
--- a/drivers/misc/lkdtm/usercopy.c
+++ b/drivers/misc/lkdtm/usercopy.c
@@ -325,21 +325,6 @@ void lkdtm_USERCOPY_KERNEL(void)
vm_munmap(user_addr, PAGE_SIZE);
 }
 
-void lkdtm_USERCOPY_KERNEL_DS(void)
-{
-   char __user *user_ptr =
-   (char __user *)(0xFUL << (sizeof(unsigned long) * 8 - 4));
-   mm_segment_t old_fs = get_fs();
-   char buf[10] = {0};
-
-   pr_info("attempting copy_to_user() to noncanonical address: %px\n",
-   user_ptr);
-   set_fs(KERNEL_DS);
-   if (copy_to_user(user_ptr, buf, sizeof(buf)) == 0)
-   pr_err("copy_to_user() to noncanonical address succeeded!?\n");
-   set_fs(old_fs);
-}
-
 void __init lkdtm_usercopy_init(void)
 {
/* Prepare cache that lacks SLAB_USERCOPY flag. */
diff --git a/tools/testing/selftests/lkdtm/tests.txt 
b/tools/testing/selftests/lkdtm/tests.txt
index 9d266e79c6a270..74a8d329a72c80 100644
--- a/tools/testing/selftests/lkdtm/tests.txt
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -9,7 +9,6 @@ EXCEPTION
 #CORRUPT_STACK_STRONG Crashes entire system on success
 CORRUPT_LIST_ADD list_add corruption
 CORRUPT_LIST_DEL list_del corruption
-CORRUPT_USER_DS Invalid address limit on user-mode return
 STACK_GUARD_PAGE_LEADING
 STACK_GUARD_PAGE_TRAILING
 UNSET_SMEP CR4 bits went missing
@@ -67,6 +66,5 @@ USERCOPY_STACK_FRAME_TO
 USERCOPY_STACK_FRAME_FROM
 USERCOPY_STACK_BEYOND
 USERCOPY_KERNEL
-USERCOPY_KERNEL_DS
 STACKLEAK_ERASING OK: the rest of the thread stack is properly erased
 CFI_FORWARD_PROTO
-- 
2.28.0



[PATCH 08/14] test_bitmap: remove user bitmap tests

2020-09-03 Thread Christoph Hellwig
We can't run the tests for userspace bitmap parsing if set_fs() doesn't
exist, and it is about to go away for x86, powerpc with other major
architectures to follow.

Signed-off-by: Christoph Hellwig 
---
 lib/test_bitmap.c | 91 +++
 1 file changed, 21 insertions(+), 70 deletions(-)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index df903c53952bb9..4425a1dd4ef1c7 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -354,50 +354,37 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
 
 };
 
-static void __init __test_bitmap_parselist(int is_user)
+static void __init test_bitmap_parselist(void)
 {
int i;
int err;
ktime_t time;
DECLARE_BITMAP(bmap, 2048);
-   char *mode = is_user ? "_user"  : "";
 
for (i = 0; i < ARRAY_SIZE(parselist_tests); i++) {
 #define ptest parselist_tests[i]
 
-   if (is_user) {
-   mm_segment_t orig_fs = get_fs();
-   size_t len = strlen(ptest.in);
-
-   set_fs(KERNEL_DS);
-   time = ktime_get();
-   err = bitmap_parselist_user((__force const char __user 
*)ptest.in, len,
-   bmap, ptest.nbits);
-   time = ktime_get() - time;
-   set_fs(orig_fs);
-   } else {
-   time = ktime_get();
-   err = bitmap_parselist(ptest.in, bmap, ptest.nbits);
-   time = ktime_get() - time;
-   }
+   time = ktime_get();
+   err = bitmap_parselist(ptest.in, bmap, ptest.nbits);
+   time = ktime_get() - time;
 
if (err != ptest.errno) {
-   pr_err("parselist%s: %d: input is %s, errno is %d, 
expected %d\n",
-   mode, i, ptest.in, err, ptest.errno);
+   pr_err("parselist: %d: input is %s, errno is %d, 
expected %d\n",
+   i, ptest.in, err, ptest.errno);
continue;
}
 
if (!err && ptest.expected
 && !__bitmap_equal(bmap, ptest.expected, ptest.nbits)) 
{
-   pr_err("parselist%s: %d: input is %s, result is 0x%lx, 
expected 0x%lx\n",
-   mode, i, ptest.in, bmap[0],
+   pr_err("parselist: %d: input is %s, result is 0x%lx, 
expected 0x%lx\n",
+   i, ptest.in, bmap[0],
*ptest.expected);
continue;
}
 
if (ptest.flags & PARSE_TIME)
-   pr_err("parselist%s: %d: input is '%s' OK, Time: 
%llu\n",
-   mode, i, ptest.in, time);
+   pr_err("parselist: %d: input is '%s' OK, Time: %llu\n",
+   i, ptest.in, time);
 
 #undef ptest
}
@@ -443,75 +430,41 @@ static const struct test_bitmap_parselist parse_tests[] 
__initconst = {
 #undef step
 };
 
-static void __init __test_bitmap_parse(int is_user)
+static void __init test_bitmap_parse(void)
 {
int i;
int err;
ktime_t time;
DECLARE_BITMAP(bmap, 2048);
-   char *mode = is_user ? "_user"  : "";
 
for (i = 0; i < ARRAY_SIZE(parse_tests); i++) {
struct test_bitmap_parselist test = parse_tests[i];
+   size_t len = test.flags & NO_LEN ? UINT_MAX : strlen(test.in);
 
-   if (is_user) {
-   size_t len = strlen(test.in);
-   mm_segment_t orig_fs = get_fs();
-
-   set_fs(KERNEL_DS);
-   time = ktime_get();
-   err = bitmap_parse_user((__force const char __user 
*)test.in, len,
-   bmap, test.nbits);
-   time = ktime_get() - time;
-   set_fs(orig_fs);
-   } else {
-   size_t len = test.flags & NO_LEN ?
-   UINT_MAX : strlen(test.in);
-   time = ktime_get();
-   err = bitmap_parse(test.in, len, bmap, test.nbits);
-   time = ktime_get() - time;
-   }
+   time = ktime_get();
+   err = bitmap_parse(test.in, len, bmap, test.nbits);
+   time = ktime_get() - time;
 
if (err != test.errno) {
-   pr_err("parse%s: %d: input is %s, errno is %d, expected 
%d\n",
-   mode, i, test.in, err, test.errno

[PATCH 06/14] fs: don't allow splice read/write without explicit ops

2020-09-03 Thread Christoph Hellwig
default_file_splice_write is the last piece of generic code that uses
set_fs to make the uaccess routines operate on kernel pointers.  It
implements a "fallback loop" for splicing from files that do not actually
provide a proper splice_read method.  The usual file systems and other
high bandwidth instances all provide a ->splice_read, so this just removes
support for various device drivers and procfs/debugfs files.  If splice
support for any of those turns out to be important it can be added back
by switching them to the iter ops and using generic_file_splice_read.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
---
 fs/read_write.c|   2 +-
 fs/splice.c| 130 +
 include/linux/fs.h |   2 -
 3 files changed, 15 insertions(+), 119 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 702c4301d9eb6b..8c61f67453e3d3 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1077,7 +1077,7 @@ ssize_t vfs_iter_write(struct file *file, struct iov_iter 
*iter, loff_t *ppos,
 }
 EXPORT_SYMBOL(vfs_iter_write);
 
-ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
+static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
  unsigned long vlen, loff_t *pos, rwf_t flags)
 {
struct iovec iovstack[UIO_FASTIOV];
diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07ff..412df7b48f9eb7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -342,89 +342,6 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = {
 };
 EXPORT_SYMBOL(nosteal_pipe_buf_ops);
 
-static ssize_t kernel_readv(struct file *file, const struct kvec *vec,
-   unsigned long vlen, loff_t offset)
-{
-   mm_segment_t old_fs;
-   loff_t pos = offset;
-   ssize_t res;
-
-   old_fs = get_fs();
-   set_fs(KERNEL_DS);
-   /* The cast to a user pointer is valid due to the set_fs() */
-   res = vfs_readv(file, (const struct iovec __user *)vec, vlen, , 0);
-   set_fs(old_fs);
-
-   return res;
-}
-
-static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
-struct pipe_inode_info *pipe, size_t len,
-unsigned int flags)
-{
-   struct kvec *vec, __vec[PIPE_DEF_BUFFERS];
-   struct iov_iter to;
-   struct page **pages;
-   unsigned int nr_pages;
-   unsigned int mask;
-   size_t offset, base, copied = 0;
-   ssize_t res;
-   int i;
-
-   if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
-   return -EAGAIN;
-
-   /*
-* Try to keep page boundaries matching to source pagecache ones -
-* it probably won't be much help, but...
-*/
-   offset = *ppos & ~PAGE_MASK;
-
-   iov_iter_pipe(, READ, pipe, len + offset);
-
-   res = iov_iter_get_pages_alloc(, , len + offset, );
-   if (res <= 0)
-   return -ENOMEM;
-
-   nr_pages = DIV_ROUND_UP(res + base, PAGE_SIZE);
-
-   vec = __vec;
-   if (nr_pages > PIPE_DEF_BUFFERS) {
-   vec = kmalloc_array(nr_pages, sizeof(struct kvec), GFP_KERNEL);
-   if (unlikely(!vec)) {
-   res = -ENOMEM;
-   goto out;
-   }
-   }
-
-   mask = pipe->ring_size - 1;
-   pipe->bufs[to.head & mask].offset = offset;
-   pipe->bufs[to.head & mask].len -= offset;
-
-   for (i = 0; i < nr_pages; i++) {
-   size_t this_len = min_t(size_t, len, PAGE_SIZE - offset);
-   vec[i].iov_base = page_address(pages[i]) + offset;
-   vec[i].iov_len = this_len;
-   len -= this_len;
-   offset = 0;
-   }
-
-   res = kernel_readv(in, vec, nr_pages, *ppos);
-   if (res > 0) {
-   copied = res;
-   *ppos += res;
-   }
-
-   if (vec != __vec)
-   kfree(vec);
-out:
-   for (i = 0; i < nr_pages; i++)
-   put_page(pages[i]);
-   kvfree(pages);
-   iov_iter_advance(, copied);  /* truncates and discards */
-   return res;
-}
-
 /*
  * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos'
  * using sendpage(). Return the number of bytes sent.
@@ -788,33 +705,6 @@ iter_file_splice_write(struct pipe_inode_info *pipe, 
struct file *out,
 
 EXPORT_SYMBOL(iter_file_splice_write);
 
-static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer 
*buf,
- struct splice_desc *sd)
-{
-   int ret;
-   void *data;
-   loff_t tmp = sd->pos;
-
-   data = kmap(buf->page);
-   ret = __kernel_write(sd->u.file, data + buf->offset, sd->len, );
-   kunmap(buf->page);
-
-   return ret;
-}
-
-static ssize_t default_file_splice_write(struct pipe_inode_info *pipe,
-struct file *out, 

[PATCH 05/14] fs: don't allow kernel reads and writes without iter ops

2020-09-03 Thread Christoph Hellwig
Don't allow calling ->read or ->write with set_fs as a preparation for
killing off set_fs.  All the instances that we use kernel_read/write on
are using the iter ops already.

If a file has both the regular ->read/->write methods and the iter
variants those could have different semantics for messed up enough
drivers.  Also fails the kernel access to them in that case.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
---
 fs/read_write.c | 67 +++--
 1 file changed, 42 insertions(+), 25 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..702c4301d9eb6b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -419,27 +419,41 @@ static ssize_t new_sync_read(struct file *filp, char 
__user *buf, size_t len, lo
return ret;
 }
 
+static int warn_unsupported(struct file *file, const char *op)
+{
+   pr_warn_ratelimited(
+   "kernel %s not supported for file %pD4 (pid: %d comm: %.20s)\n",
+   op, file, current->pid, current->comm);
+   return -EINVAL;
+}
+
 ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
-   mm_segment_t old_fs = get_fs();
+   struct kvec iov = {
+   .iov_base   = buf,
+   .iov_len= min_t(size_t, count, MAX_RW_COUNT),
+   };
+   struct kiocb kiocb;
+   struct iov_iter iter;
ssize_t ret;
 
if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
return -EINVAL;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
+   /*
+* Also fail if ->read_iter and ->read are both wired up as that
+* implies very convoluted semantics.
+*/
+   if (unlikely(!file->f_op->read_iter || file->f_op->read))
+   return warn_unsupported(file, "read");
 
-   if (count > MAX_RW_COUNT)
-   count =  MAX_RW_COUNT;
-   set_fs(KERNEL_DS);
-   if (file->f_op->read)
-   ret = file->f_op->read(file, (void __user *)buf, count, pos);
-   else if (file->f_op->read_iter)
-   ret = new_sync_read(file, (void __user *)buf, count, pos);
-   else
-   ret = -EINVAL;
-   set_fs(old_fs);
+   init_sync_kiocb(, file);
+   kiocb.ki_pos = *pos;
+   iov_iter_kvec(, READ, , 1, iov.iov_len);
+   ret = file->f_op->read_iter(, );
if (ret > 0) {
+   *pos = kiocb.ki_pos;
fsnotify_access(file);
add_rchar(current, ret);
}
@@ -510,28 +524,31 @@ static ssize_t new_sync_write(struct file *filp, const 
char __user *buf, size_t
 /* caller is responsible for file_start_write/file_end_write */
 ssize_t __kernel_write(struct file *file, const void *buf, size_t count, 
loff_t *pos)
 {
-   mm_segment_t old_fs;
-   const char __user *p;
+   struct kvec iov = {
+   .iov_base   = (void *)buf,
+   .iov_len= min_t(size_t, count, MAX_RW_COUNT),
+   };
+   struct kiocb kiocb;
+   struct iov_iter iter;
ssize_t ret;
 
if (WARN_ON_ONCE(!(file->f_mode & FMODE_WRITE)))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
+   /*
+* Also fail if ->write_iter and ->write are both wired up as that
+* implies very convoluted semantics.
+*/
+   if (unlikely(!file->f_op->write_iter || file->f_op->write))
+   return warn_unsupported(file, "write");
 
-   old_fs = get_fs();
-   set_fs(KERNEL_DS);
-   p = (__force const char __user *)buf;
-   if (count > MAX_RW_COUNT)
-   count =  MAX_RW_COUNT;
-   if (file->f_op->write)
-   ret = file->f_op->write(file, p, count, pos);
-   else if (file->f_op->write_iter)
-   ret = new_sync_write(file, p, count, pos);
-   else
-   ret = -EINVAL;
-   set_fs(old_fs);
+   init_sync_kiocb(, file);
+   kiocb.ki_pos = *pos;
+   iov_iter_kvec(, WRITE, , 1, iov.iov_len);
+   ret = file->f_op->write_iter(, );
if (ret > 0) {
+   *pos = kiocb.ki_pos;
fsnotify_modify(file);
add_wchar(current, ret);
}
-- 
2.28.0



[PATCH 03/14] proc: add a read_iter method to proc proc_ops

2020-09-03 Thread Christoph Hellwig
This will allow proc files to implement iter read semantics.

Signed-off-by: Christoph Hellwig 
---
 fs/proc/inode.c | 53 ++---
 include/linux/proc_fs.h |  1 +
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 93dd2045737504..58c075e2a452d6 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -297,6 +297,21 @@ static loff_t proc_reg_llseek(struct file *file, loff_t 
offset, int whence)
return rv;
 }
 
+static ssize_t proc_reg_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   struct proc_dir_entry *pde = PDE(file_inode(iocb->ki_filp));
+   ssize_t ret;
+
+   if (pde_is_permanent(pde))
+   return pde->proc_ops->proc_read_iter(iocb, iter);
+
+   if (!use_pde(pde))
+   return -EIO;
+   ret = pde->proc_ops->proc_read_iter(iocb, iter);
+   unuse_pde(pde);
+   return ret;
+}
+
 static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char 
__user *buf, size_t count, loff_t *ppos)
 {
typeof_member(struct proc_ops, proc_read) read;
@@ -578,6 +593,18 @@ static const struct file_operations proc_reg_file_ops = {
.release= proc_reg_release,
 };
 
+static const struct file_operations proc_iter_file_ops = {
+   .llseek = proc_reg_llseek,
+   .read_iter  = proc_reg_read_iter,
+   .write  = proc_reg_write,
+   .poll   = proc_reg_poll,
+   .unlocked_ioctl = proc_reg_unlocked_ioctl,
+   .mmap   = proc_reg_mmap,
+   .get_unmapped_area = proc_reg_get_unmapped_area,
+   .open   = proc_reg_open,
+   .release= proc_reg_release,
+};
+
 #ifdef CONFIG_COMPAT
 static const struct file_operations proc_reg_file_ops_compat = {
.llseek = proc_reg_llseek,
@@ -591,6 +618,19 @@ static const struct file_operations 
proc_reg_file_ops_compat = {
.open   = proc_reg_open,
.release= proc_reg_release,
 };
+
+static const struct file_operations proc_iter_file_ops_compat = {
+   .llseek = proc_reg_llseek,
+   .read_iter  = proc_reg_read_iter,
+   .write  = proc_reg_write,
+   .poll   = proc_reg_poll,
+   .unlocked_ioctl = proc_reg_unlocked_ioctl,
+   .compat_ioctl   = proc_reg_compat_ioctl,
+   .mmap   = proc_reg_mmap,
+   .get_unmapped_area = proc_reg_get_unmapped_area,
+   .open   = proc_reg_open,
+   .release= proc_reg_release,
+};
 #endif
 
 static void proc_put_link(void *p)
@@ -642,10 +682,17 @@ struct inode *proc_get_inode(struct super_block *sb, 
struct proc_dir_entry *de)
 
if (S_ISREG(inode->i_mode)) {
inode->i_op = de->proc_iops;
-   inode->i_fop = _reg_file_ops;
+   if (de->proc_ops->proc_read_iter)
+   inode->i_fop = _iter_file_ops;
+   else
+   inode->i_fop = _reg_file_ops;
 #ifdef CONFIG_COMPAT
-   if (de->proc_ops->proc_compat_ioctl)
-   inode->i_fop = _reg_file_ops_compat;
+   if (de->proc_ops->proc_compat_ioctl) {
+   if (de->proc_ops->proc_read_iter)
+   inode->i_fop = _iter_file_ops_compat;
+   else
+   inode->i_fop = _reg_file_ops_compat;
+   }
 #endif
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = de->proc_iops;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 2df965cd09742d..270cab43ca3dad 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -30,6 +30,7 @@ struct proc_ops {
unsigned int proc_flags;
int (*proc_open)(struct inode *, struct file *);
ssize_t (*proc_read)(struct file *, char __user *, size_t, loff_t *);
+   ssize_t (*proc_read_iter)(struct kiocb *, struct iov_iter *);
ssize_t (*proc_write)(struct file *, const char __user *, size_t, 
loff_t *);
loff_t  (*proc_lseek)(struct file *, loff_t, int);
int (*proc_release)(struct inode *, struct file *);
-- 
2.28.0



remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-03 Thread Christoph Hellwig
Hi all,

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.

[Note to Linus: I'd like to get this into linux-next rather earlier
than later.  Do you think it is ok to add this tree to linux-next?]

The file system part has been posted a few times, and the read/write side
has been pretty much unchanced.  For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them.  The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it.  All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.

Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged.  The process to convert
architectures is roughtly:

 (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
 (2) implement __get_kernel_nofault and __put_kernel_nofault
 (3) remove the arch specific address limitation functionality

Changes since v2:
 - add back the patch to support splice through read_iter/write iter
   on /proc/sys/*
 - entirely remove the tests that depend on set_fs.  Note that for
   lkdtm the maintainer (Kees) disagrees with this request from Linus
 - fix a wrong check in the powerpc access_ok, and drop a few spurious
   cleanups there

Changes since v1:
 - drop the patch to remove the non-iter ops for /dev/zero and
   /dev/null as they caused a performance regression
 - don't enable user access in __get_kernel on powerpc
 - xfail the set_fs() based lkdtm tests

Diffstat:


[PATCH 04/14] sysctl: Convert to iter interfaces

2020-09-03 Thread Christoph Hellwig
From: "Matthew Wilcox (Oracle)" 

Using the read_iter/write_iter interfaces allows for in-kernel users
to set sysctls without using set_fs().  Also, the buffer is a string,
so give it the real type of 'char *', not void *.

Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Christoph Hellwig 
---
 fs/proc/proc_sysctl.c  | 46 ++
 include/linux/bpf-cgroup.h |  2 +-
 kernel/bpf/cgroup.c|  2 +-
 3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 6c1166ccdaea57..a4a3122f8a584a 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -540,13 +541,14 @@ static struct dentry *proc_sys_lookup(struct inode *dir, 
struct dentry *dentry,
return err;
 }
 
-static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
-   size_t count, loff_t *ppos, int write)
+static ssize_t proc_sys_call_handler(struct kiocb *iocb, struct iov_iter *iter,
+   int write)
 {
-   struct inode *inode = file_inode(filp);
+   struct inode *inode = file_inode(iocb->ki_filp);
struct ctl_table_header *head = grab_header(inode);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
-   void *kbuf;
+   size_t count = iov_iter_count(iter);
+   char *kbuf;
ssize_t error;
 
if (IS_ERR(head))
@@ -569,32 +571,30 @@ static ssize_t proc_sys_call_handler(struct file *filp, 
void __user *ubuf,
error = -ENOMEM;
if (count >= KMALLOC_MAX_SIZE)
goto out;
+   kbuf = kzalloc(count + 1, GFP_KERNEL);
+   if (!kbuf)
+   goto out;
 
if (write) {
-   kbuf = memdup_user_nul(ubuf, count);
-   if (IS_ERR(kbuf)) {
-   error = PTR_ERR(kbuf);
-   goto out;
-   }
-   } else {
-   kbuf = kzalloc(count, GFP_KERNEL);
-   if (!kbuf)
-   goto out;
+   error = -EFAULT;
+   if (!copy_from_iter_full(kbuf, count, iter))
+   goto out_free_buf;
+   kbuf[count] = '\0';
}
 
error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, , ,
-  ppos);
+  >ki_pos);
if (error)
goto out_free_buf;
 
/* careful: calling conventions are nasty here */
-   error = table->proc_handler(table, write, kbuf, , ppos);
+   error = table->proc_handler(table, write, kbuf, , >ki_pos);
if (error)
goto out_free_buf;
 
if (!write) {
error = -EFAULT;
-   if (copy_to_user(ubuf, kbuf, count))
+   if (copy_to_iter(kbuf, count, iter) < count)
goto out_free_buf;
}
 
@@ -607,16 +607,14 @@ static ssize_t proc_sys_call_handler(struct file *filp, 
void __user *ubuf,
return error;
 }
 
-static ssize_t proc_sys_read(struct file *filp, char __user *buf,
-   size_t count, loff_t *ppos)
+static ssize_t proc_sys_read(struct kiocb *iocb, struct iov_iter *iter)
 {
-   return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 0);
+   return proc_sys_call_handler(iocb, iter, 0);
 }
 
-static ssize_t proc_sys_write(struct file *filp, const char __user *buf,
-   size_t count, loff_t *ppos)
+static ssize_t proc_sys_write(struct kiocb *iocb, struct iov_iter *iter)
 {
-   return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 1);
+   return proc_sys_call_handler(iocb, iter, 1);
 }
 
 static int proc_sys_open(struct inode *inode, struct file *filp)
@@ -853,8 +851,8 @@ static int proc_sys_getattr(const struct path *path, struct 
kstat *stat,
 static const struct file_operations proc_sys_file_operations = {
.open   = proc_sys_open,
.poll   = proc_sys_poll,
-   .read   = proc_sys_read,
-   .write  = proc_sys_write,
+   .read_iter  = proc_sys_read,
+   .write_iter = proc_sys_write,
.llseek = default_llseek,
 };
 
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 64f367044e25f4..82b26a1386d85e 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -136,7 +136,7 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 
major, u32 minor,
 
 int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
   struct ctl_table *table, int write,
-  void **buf, size_t *pcount, loff_t *ppos,
+  char **buf, size_t *pcount, loff_t *ppos,
   

[PATCH 01/14] proc: remove a level of indentation in proc_get_inode

2020-09-03 Thread Christoph Hellwig
Just return early on inode allocation failure.

Signed-off-by: Christoph Hellwig 
---
 fs/proc/inode.c | 72 +
 1 file changed, 37 insertions(+), 35 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 28d6105e908e4c..016b1302cbabc0 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -619,42 +619,44 @@ struct inode *proc_get_inode(struct super_block *sb, 
struct proc_dir_entry *de)
 {
struct inode *inode = new_inode(sb);
 
-   if (inode) {
-   inode->i_ino = de->low_ino;
-   inode->i_mtime = inode->i_atime = inode->i_ctime = 
current_time(inode);
-   PROC_I(inode)->pde = de;
-
-   if (is_empty_pde(de)) {
-   make_empty_dir_inode(inode);
-   return inode;
-   }
-   if (de->mode) {
-   inode->i_mode = de->mode;
-   inode->i_uid = de->uid;
-   inode->i_gid = de->gid;
-   }
-   if (de->size)
-   inode->i_size = de->size;
-   if (de->nlink)
-   set_nlink(inode, de->nlink);
-
-   if (S_ISREG(inode->i_mode)) {
-   inode->i_op = de->proc_iops;
-   inode->i_fop = _reg_file_ops;
+   if (!inode) {
+   pde_put(de);
+   return NULL;
+   }
+
+   inode->i_ino = de->low_ino;
+   inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
+   PROC_I(inode)->pde = de;
+   if (is_empty_pde(de)) {
+   make_empty_dir_inode(inode);
+   return inode;
+   }
+
+   if (de->mode) {
+   inode->i_mode = de->mode;
+   inode->i_uid = de->uid;
+   inode->i_gid = de->gid;
+   }
+   if (de->size)
+   inode->i_size = de->size;
+   if (de->nlink)
+   set_nlink(inode, de->nlink);
+
+   if (S_ISREG(inode->i_mode)) {
+   inode->i_op = de->proc_iops;
+   inode->i_fop = _reg_file_ops;
 #ifdef CONFIG_COMPAT
-   if (!de->proc_ops->proc_compat_ioctl) {
-   inode->i_fop = _reg_file_ops_no_compat;
-   }
+   if (!de->proc_ops->proc_compat_ioctl)
+   inode->i_fop = _reg_file_ops_no_compat;
 #endif
-   } else if (S_ISDIR(inode->i_mode)) {
-   inode->i_op = de->proc_iops;
-   inode->i_fop = de->proc_dir_ops;
-   } else if (S_ISLNK(inode->i_mode)) {
-   inode->i_op = de->proc_iops;
-   inode->i_fop = NULL;
-   } else
-   BUG();
-   } else
-  pde_put(de);
+   } else if (S_ISDIR(inode->i_mode)) {
+   inode->i_op = de->proc_iops;
+   inode->i_fop = de->proc_dir_ops;
+   } else if (S_ISLNK(inode->i_mode)) {
+   inode->i_op = de->proc_iops;
+   inode->i_fop = NULL;
+   } else {
+   BUG();
+   }
return inode;
 }
-- 
2.28.0



[PATCH 02/14] proc: cleanup the compat vs no compat file ops

2020-09-03 Thread Christoph Hellwig
Instead of providing a special no-compat version provide a special
compat version for operations with ->compat_ioctl.

Signed-off-by: Christoph Hellwig 
---
 fs/proc/inode.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 016b1302cbabc0..93dd2045737504 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -572,9 +572,6 @@ static const struct file_operations proc_reg_file_ops = {
.write  = proc_reg_write,
.poll   = proc_reg_poll,
.unlocked_ioctl = proc_reg_unlocked_ioctl,
-#ifdef CONFIG_COMPAT
-   .compat_ioctl   = proc_reg_compat_ioctl,
-#endif
.mmap   = proc_reg_mmap,
.get_unmapped_area = proc_reg_get_unmapped_area,
.open   = proc_reg_open,
@@ -582,12 +579,13 @@ static const struct file_operations proc_reg_file_ops = {
 };
 
 #ifdef CONFIG_COMPAT
-static const struct file_operations proc_reg_file_ops_no_compat = {
+static const struct file_operations proc_reg_file_ops_compat = {
.llseek = proc_reg_llseek,
.read   = proc_reg_read,
.write  = proc_reg_write,
.poll   = proc_reg_poll,
.unlocked_ioctl = proc_reg_unlocked_ioctl,
+   .compat_ioctl   = proc_reg_compat_ioctl,
.mmap   = proc_reg_mmap,
.get_unmapped_area = proc_reg_get_unmapped_area,
.open   = proc_reg_open,
@@ -646,8 +644,8 @@ struct inode *proc_get_inode(struct super_block *sb, struct 
proc_dir_entry *de)
inode->i_op = de->proc_iops;
inode->i_fop = _reg_file_ops;
 #ifdef CONFIG_COMPAT
-   if (!de->proc_ops->proc_compat_ioctl)
-   inode->i_fop = _reg_file_ops_no_compat;
+   if (de->proc_ops->proc_compat_ioctl)
+   inode->i_fop = _reg_file_ops_compat;
 #endif
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = de->proc_iops;
-- 
2.28.0



Re: [PATCH 10/10] powerpc: remove address space overrides using set_fs()

2020-09-03 Thread Christoph Hellwig
On Wed, Sep 02, 2020 at 11:02:22AM -0700, Linus Torvalds wrote:
> I don't see why this change would make any difference.

Me neither, but while looking at a different project I did spot places
that actually do an access_ok with len 0, that's why I wanted him to
try.

That being said: Christophe are these number stables?  Do you get
similar numbers with multiple runs?

> And btw, why do the 32-bit and 64-bit checks even differ? It's not
> like the extra (single) instruction should even matter. I think the
> main reason is that the simpler 64-bit case could stay as a macro
> (because it only uses "addr" and "size" once), but honestly, that
> "simplification" doesn't help when you then need to have that #ifdef
> for the 32-bit case and an inline function anyway.

I'll have to leave that to the powerpc folks.  The intent was to not
change the behavior (and I even fucked that up for the the size == 0
case).

> However, I suspect a bigger reason for the actual performance
> degradation would be the patch that makes things use "write_iter()"
> for writing, even when a simpler "write()" exists.

Except that we do not actually have such a patch.  For normal user
writes we only use ->write_iter if ->write is not present.  But what
shows up in the profile is that /dev/zero only has a read_iter op and
not a normal read.  I've added a patch below that implements a normal
read which might help a tad with this workload, but should not be part
of a regression.

Also Christophe:  can you bisect which patch starts this?  Is it really
this last patch in the series?

---
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cdebc..1dc99ab158457a 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -726,6 +726,27 @@ static ssize_t read_iter_zero(struct kiocb *iocb, struct 
iov_iter *iter)
return written;
 }
 
+static ssize_t read_zero(struct file *file, char __user *buf,
+size_t count, loff_t *ppos)
+{
+   size_t cleared = 0;
+
+   while (count) {
+   size_t chunk = min_t(size_t, count, PAGE_SIZE);
+
+   if (clear_user(buf + cleared, chunk))
+   return cleared ? cleared : -EFAULT;
+   cleared += chunk;
+   count -= chunk;
+
+   if (signal_pending(current))
+   return cleared ? cleared : -ERESTARTSYS;
+   cond_resched();
+   }
+
+   return cleared;
+}
+
 static int mmap_zero(struct file *file, struct vm_area_struct *vma)
 {
 #ifndef CONFIG_MMU
@@ -921,6 +942,7 @@ static const struct file_operations zero_fops = {
.llseek = zero_lseek,
.write  = write_zero,
.read_iter  = read_iter_zero,
+   .read   = read_zero,
.write_iter = write_iter_zero,
.mmap   = mmap_zero,
.get_unmapped_area = get_unmapped_area_zero,


Re: [PATCH 10/10] powerpc: remove address space overrides using set_fs()

2020-09-02 Thread Christoph Hellwig
On Wed, Sep 02, 2020 at 08:15:12AM +0200, Christophe Leroy wrote:
>> -return 0;
>> -return (size == 0 || size - 1 <= seg.seg - addr);
>> +if (addr >= TASK_SIZE_MAX)
>> +return false;
>> +if (size == 0)
>> +return false;
>
> __access_ok() was returning true when size == 0 up to now. Any reason to 
> return false now ?

No, this is accidental and broken.  Can you re-run your benchmark with
this fixed?


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-02 Thread Christoph Hellwig
On Tue, Sep 01, 2020 at 06:25:12PM +0100, Al Viro wrote:
> On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:
> 
> > 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero
> 
> Interesting...  Could you get an instruction-level profile inside 
> iov_iter_zero(),
> along with the disassembly of that sucker?

So the interesting thing here is with that none of these code paths
should have changed at all, and the biggest items on the profile look
the same modulo some minor reordering.


Re: [PATCH 05/10] lkdtm: disable set_fs-based tests for !CONFIG_SET_FS

2020-09-02 Thread Christoph Hellwig
On Tue, Sep 01, 2020 at 11:57:37AM -0700, Kees Cook wrote:
> On Sat, Aug 29, 2020 at 11:24:06AM +0200, Christoph Hellwig wrote:
> > On Thu, Aug 27, 2020 at 11:06:28AM -0700, Linus Torvalds wrote:
> > > On Thu, Aug 27, 2020 at 8:00 AM Christoph Hellwig  wrote:
> > > >
> > > > Once we can't manipulate the address limit, we also can't test what
> > > > happens when the manipulation is abused.
> > > 
> > > Just remove these tests entirely.
> > > 
> > > Once set_fs() doesn't exist on x86, the tests no longer make any sense
> > > what-so-ever, because test coverage will be basically zero.
> > > 
> > > So don't make the code uglier just to maintain a fiction that
> > > something is tested when it isn't really.
> > 
> > Sure fine with me unless Kees screams.
> 
> To clarify: if any of x86, arm64, arm, powerpc, riscv, and s390 are
> using set_fs(), I want to keep this test. "ugly" is fine in lkdtm. :)

And Linus wants them gone entirely, so I'll need a stage fight between
the two of you.  At least for this merge window I'm only planning on
x86 and power, plus maybe riscv if I get the work done in time.  Although
helper from the maintainers would be welcome.  s390 has a driver that
still uses set_fs that will need some surgery, although it shouldn't
be too bad, but arm will be a piece of work.  Unless I get help it will
take a while.


Re: [RESEND][PATCH 0/7] Avoid overflow at boundary_size

2020-09-01 Thread Christoph Hellwig
On Tue, Sep 01, 2020 at 12:54:01AM -0700, Nicolin Chen wrote:
> Hi Christoph,
> 
> On Tue, Sep 01, 2020 at 09:36:23AM +0200, Christoph Hellwig wrote:
> > I really don't like all the open coded smarts in the various drivers.
> > What do you think about a helper like the one in the untested patch
> 
> A helper function will be actually better. I was thinking of
> one yet not very sure about the naming and where to put it.
> 
> > below (on top of your series).  Also please include the original
> > segment boundary patch with the next resend so that the series has
> > the full context.
> 
> I will use your change instead and resend with the ULONG_MAX
> change. But in that case, should I make separate changes for
> different files like this series, or just one single change
> like yours?
> 
> Asking this as I was expecting that those changes would get
> applied by different maintainers. But now it feels like you
> will merge it to your tree at once?

I guess one patch is fine.  I can queue it up in the dma-mapping
tree as a prep patch for the default boundary change.


Re: [RESEND][PATCH 0/7] Avoid overflow at boundary_size

2020-09-01 Thread Christoph Hellwig
I really don't like all the open coded smarts in the various drivers.
What do you think about a helper like the one in the untested patch
below (on top of your series).  Also please include the original
segment boundary patch with the next resend so that the series has
the full context.

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 1ef2c647bd3ec2..6f7de4f4e191e7 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -141,10 +141,7 @@ iommu_arena_find_pages(struct device *dev, struct 
pci_iommu_arena *arena,
unsigned long boundary_size;
 
base = arena->dma_base >> PAGE_SHIFT;
-
-   boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (boundary_size >> PAGE_SHIFT) + 1;
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, PAGE_SHIFT);
 
/* Search forward for the first mask-aligned sequence of N free ptes */
ptes = arena->ptes;
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 945954903bb0ba..b49b73a95067d2 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -485,8 +485,7 @@ sba_search_bitmap(struct ioc *ioc, struct device *dev,
ASSERT(((unsigned long) ioc->res_hint & (sizeof(unsigned long) - 1UL)) 
== 0);
ASSERT(res_ptr < res_end);
 
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (dma_get_seg_boundary(dev) >> iovp_shift) + 1;
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, iovp_shift);
 
BUG_ON(ioc->ibase & ~iovp_mask);
shift = ioc->ibase >> iovp_shift;
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index c01ccbf8afdd42..cbc2e62db597cf 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -236,11 +236,7 @@ static unsigned long iommu_range_alloc(struct device *dev,
}
}
 
-   /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
-   boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
-
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (boundary_size >> tbl->it_page_shift) + 1;
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, tbl->it_page_shift);
 
n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
 boundary_size, align_mask);
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index ecb067acc6d532..4a37d8f4de9d9d 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -261,13 +261,11 @@ static unsigned long __dma_alloc_iommu(struct device *dev,
   unsigned long start, int size)
 {
struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
-   unsigned long boundary_size;
 
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (dma_get_seg_boundary(dev) >> PAGE_SHIFT) + 1;
return iommu_area_alloc(zdev->iommu_bitmap, zdev->iommu_pages,
start, size, zdev->start_dma >> PAGE_SHIFT,
-   boundary_size, 0);
+   dma_get_seg_boundary_nr_pages(dev, PAGE_SHIFT),
+   0);
 }
 
 static dma_addr_t dma_alloc_address(struct device *dev, int size)
diff --git a/arch/sparc/kernel/iommu-common.c b/arch/sparc/kernel/iommu-common.c
index 843e71894d0482..e6139c99762e11 100644
--- a/arch/sparc/kernel/iommu-common.c
+++ b/arch/sparc/kernel/iommu-common.c
@@ -166,10 +166,6 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
}
}
 
-   boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
-
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (boundary_size >> iommu->table_shift) + 1;
/*
 * if the skip_span_boundary_check had been set during init, we set
 * things up so that iommu_is_span_boundary() merely checks if the
@@ -178,7 +174,11 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
if ((iommu->flags & IOMMU_NO_SPAN_BOUND) != 0) {
shift = 0;
boundary_size = iommu->poolsize * iommu->nr_pools;
+   } else {
+   boundary_size = dma_get_seg_boundary_nr_pages(dev,
+   iommu->table_shift);
}
+
n = iommu_area_alloc(iommu->map, limit, start, npages, shift,
 boundary_size, align_mask);
if (n == -1) {
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index d981c37305ae31..c3e4e2df26a8b8 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -472,8 +472,7 @@ static int dma_4u_map_sg(struct device *dev, struct 
scatterlist *sglist,
outs->dma_length = 0;
 
max_seg_size = 

  1   2   3   4   5   6   7   8   9   10   >