Re: [PATCH v3 -next] binder: simplify the return expression of binder_mmap

2020-10-02 Thread Martijn Coenen
Thanks!

On Tue, Sep 29, 2020 at 3:30 AM Liu Shixin  wrote:
>
> Simplify the return expression.
>
> Signed-off-by: Liu Shixin 

Acked-by: Martijn Coenen 

> ---
> v3: Add the change description.
> v2: Get rid of the "ret" and "failure string" variables.
> v1: Simplify the return expression.
> ---
>  drivers/android/binder.c | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 37a505c41dec..49c0700816a5 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -5180,9 +5180,7 @@ static const struct vm_operations_struct binder_vm_ops 
> = {
>
>  static int binder_mmap(struct file *filp, struct vm_area_struct *vma)
>  {
> -   int ret;
> struct binder_proc *proc = filp->private_data;
> -   const char *failure_string;
>
> if (proc->tsk != current->group_leader)
> return -EINVAL;
> @@ -5194,9 +5192,9 @@ static int binder_mmap(struct file *filp, struct 
> vm_area_struct *vma)
>  (unsigned long)pgprot_val(vma->vm_page_prot));
>
> if (vma->vm_flags & FORBIDDEN_MMAP_FLAGS) {
> -   ret = -EPERM;
> -   failure_string = "bad vm_flags";
> -   goto err_bad_arg;
> +   pr_err("%s: %d %lx-%lx %s failed %d\n", __func__,
> +  proc->pid, vma->vm_start, vma->vm_end, "bad vm_flags", 
> -EPERM);
> +   return -EPERM;
> }
> vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP;
> vma->vm_flags &= ~VM_MAYWRITE;
> @@ -5204,15 +5202,7 @@ static int binder_mmap(struct file *filp, struct 
> vm_area_struct *vma)
> vma->vm_ops = _vm_ops;
> vma->vm_private_data = proc;
>
> -   ret = binder_alloc_mmap_handler(>alloc, vma);
> -   if (ret)
> -   return ret;
> -   return 0;
> -
> -err_bad_arg:
> -   pr_err("%s: %d %lx-%lx %s failed %d\n", __func__,
> -  proc->pid, vma->vm_start, vma->vm_end, failure_string, ret);
> -   return ret;
> +   return binder_alloc_mmap_handler(>alloc, vma);
>  }
>
>  static int binder_open(struct inode *nodp, struct file *filp)
> --
> 2.25.1
>


Re: LOOP_CONFIGURE ioctl doesn't work if lo_offset/lo_sizelimit are set

2020-08-25 Thread Martijn Coenen
Hi,

I just sent a patch to fix the issue. The loop device would have
respected the configuration, but indeed the size of the underlying
block device was not set correctly, so reading back the size would
give the wrong result.

Thanks,
Martijn

On Mon, Aug 24, 2020 at 8:24 PM Martijn Coenen  wrote:
>
> Hi Lennart,
>
> Thanks for the report, I'll look into it. FWIW, we've been using
> LOOP_CONFIGURE on Android with lo_offset/lo_sizelimit without issues,
> but it may be a particular configuration that's causing issues.
>
> Thanks,
> Martijn
>
> On Mon, Aug 24, 2020 at 5:44 PM Lennart Poettering  
> wrote:
> >
> > Hi!
> >
> > Even with fe6a8fc5ed2f0081f17375ae2005718522c392c6 the LOOP_CONFIGURE
> > ioctl doesn't work correctly. It gets confused if the
> > lo_offset/lo_sizelimit fields are set to non-zero.
> >
> > In a quick test I ran (on Linux 5.8.3) I call LOOP_CONFIGURE with
> > .lo_offset=3221204992 and .lo_sizelimit=50331648 and immediately
> > verify the size of the block device with BLKGETSIZE64. It should of
> > course return 50331648, but actually returns 3271557120. (the precise
> > values have no particular relevance, it's just what I happened to use
> > in my test.) If I instead use LOOP_SET_STATUS64 with the exact same
> > parameters, everything works correctly. In either case, if I use
> > LOOP_GET_STATUS64 insted of BLKGETSIZE64 to verify things, everything
> > looks great.
> >
> > My guess is that the new ioctl simply doesn't properly propagate the
> > size limit into the underlying block device like it should. I didn't
> > have the time to investigate further though.
> >
> > Lennart


Re: LOOP_CONFIGURE ioctl doesn't work if lo_offset/lo_sizelimit are set

2020-08-24 Thread Martijn Coenen
Hi Lennart,

Thanks for the report, I'll look into it. FWIW, we've been using
LOOP_CONFIGURE on Android with lo_offset/lo_sizelimit without issues,
but it may be a particular configuration that's causing issues.

Thanks,
Martijn

On Mon, Aug 24, 2020 at 5:44 PM Lennart Poettering  wrote:
>
> Hi!
>
> Even with fe6a8fc5ed2f0081f17375ae2005718522c392c6 the LOOP_CONFIGURE
> ioctl doesn't work correctly. It gets confused if the
> lo_offset/lo_sizelimit fields are set to non-zero.
>
> In a quick test I ran (on Linux 5.8.3) I call LOOP_CONFIGURE with
> .lo_offset=3221204992 and .lo_sizelimit=50331648 and immediately
> verify the size of the block device with BLKGETSIZE64. It should of
> course return 50331648, but actually returns 3271557120. (the precise
> values have no particular relevance, it's just what I happened to use
> in my test.) If I instead use LOOP_SET_STATUS64 with the exact same
> parameters, everything works correctly. In either case, if I use
> LOOP_GET_STATUS64 insted of BLKGETSIZE64 to verify things, everything
> looks great.
>
> My guess is that the new ioctl simply doesn't properly propagate the
> size limit into the underlying block device like it should. I didn't
> have the time to investigate further though.
>
> Lennart


[PATCH v3] binder: print warnings when detecting oneway spamming.

2020-08-21 Thread Martijn Coenen
The most common cause of the binder transaction buffer filling up is a
client rapidly firing oneway transactions into a process, before it has
a chance to handle them. Yet the root cause of this is often hard to
debug, because either the system or the app will stop, and by that time
binder debug information we dump in bugreports is no longer relevant.

This change warns as soon as a process dips below 80% of its oneway
space (less than 100kB available in the configuration), when any one
process is responsible for either more than 50 transactions, or more
than 50% of the oneway space.

Signed-off-by: Martijn Coenen 
---
v2: fixed call-site in binder_alloc_selftest

v3: include size of struct binder_buffer in calculation, fix comments

 drivers/android/binder.c|  2 +-
 drivers/android/binder_alloc.c  | 55 +++--
 drivers/android/binder_alloc.h  |  5 ++-
 drivers/android/binder_alloc_selftest.c |  2 +-
 4 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index f936530a19b0..946332bc871a 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3136,7 +3136,7 @@ static void binder_transaction(struct binder_proc *proc,
 
t->buffer = binder_alloc_new_buf(_proc->alloc, tr->data_size,
tr->offsets_size, extra_buffers_size,
-   !reply && (t->flags & TF_ONE_WAY));
+   !reply && (t->flags & TF_ONE_WAY), current->tgid);
if (IS_ERR(t->buffer)) {
/*
 * -ESRCH indicates VMA cleared. The target is dying.
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 69609696a843..b5b6c9cf1b0b 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -338,12 +338,50 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
return vma;
 }
 
+static void debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
+{
+   /*
+* Find the amount and size of buffers allocated by the current caller;
+* The idea is that once we cross the threshold, whoever is responsible
+* for the low async space is likely to try to send another async txn,
+* and at some point we'll catch them in the act. This is more efficient
+* than keeping a map per pid.
+*/
+   struct rb_node *n = alloc->free_buffers.rb_node;
+   struct binder_buffer *buffer;
+   size_t total_alloc_size = 0;
+   size_t num_buffers = 0;
+
+   for (n = rb_first(>allocated_buffers); n != NULL;
+n = rb_next(n)) {
+   buffer = rb_entry(n, struct binder_buffer, rb_node);
+   if (buffer->pid != pid)
+   continue;
+   if (!buffer->async_transaction)
+   continue;
+   total_alloc_size += binder_alloc_buffer_size(alloc, buffer)
+   + sizeof(struct binder_buffer);
+   num_buffers++;
+   }
+
+   /*
+* Warn if this pid has more than 50 transactions, or more than 50% of
+* async space (which is 25% of total buffer size).
+*/
+   if (num_buffers > 50 || total_alloc_size > alloc->buffer_size / 4) {
+   binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
+"%d: pid %d spamming oneway? %zd buffers allocated 
for a total size of %zd\n",
+ alloc->pid, pid, num_buffers, total_alloc_size);
+   }
+}
+
 static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
size_t data_size,
size_t offsets_size,
size_t extra_buffers_size,
-   int is_async)
+   int is_async,
+   int pid)
 {
struct rb_node *n = alloc->free_buffers.rb_node;
struct binder_buffer *buffer;
@@ -486,11 +524,20 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
buffer->offsets_size = offsets_size;
buffer->async_transaction = is_async;
buffer->extra_buffers_size = extra_buffers_size;
+   buffer->pid = pid;
if (is_async) {
alloc->free_async_space -= size + sizeof(struct binder_buffer);
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC,
 "%d: binder_alloc_buf size %zd async free %zd\n",
  alloc->pid, size, alloc->free_async_space);
+   if (alloc->free_async_space < alloc->buffer_size / 10) {
+   /*
+* Start detecting spammers once we have less th

[PATCH v2] ANDROID: binder: print warnings when detecting oneway spamming.

2020-08-20 Thread Martijn Coenen
The most common cause of the binder transaction buffer filling up is a
client rapidly firing oneway transactions into a process, before it has
a chance to handle them. Yet the root cause of this is often hard to
debug, because either the system or the app will stop, and by that time
binder debug information we dump in bugreports is no longer relevant.

This change warns as soon as a process dips below 80% of its oneway
space (less than 100kB available in the configuration), when any one
process is responsible for either more than 50 transactions, or more
than 50% of the oneway space.

Signed-off-by: Martijn Coenen 
---
v2: fixed call-site in binder_alloc_selftest

 drivers/android/binder.c|  2 +-
 drivers/android/binder_alloc.c  | 49 +++--
 drivers/android/binder_alloc.h  |  5 ++-
 drivers/android/binder_alloc_selftest.c |  2 +-
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index f936530a19b0..946332bc871a 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3136,7 +3136,7 @@ static void binder_transaction(struct binder_proc *proc,
 
t->buffer = binder_alloc_new_buf(_proc->alloc, tr->data_size,
tr->offsets_size, extra_buffers_size,
-   !reply && (t->flags & TF_ONE_WAY));
+   !reply && (t->flags & TF_ONE_WAY), current->tgid);
if (IS_ERR(t->buffer)) {
/*
 * -ESRCH indicates VMA cleared. The target is dying.
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 69609696a843..76e8e633dbd4 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -338,12 +338,48 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
return vma;
 }
 
+static void debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
+{
+   /*
+* Find the amount and size of buffers allocated by the current caller;
+* The idea is that once we cross the threshold, whoever is responsible
+* for the low async space is likely to try to send another async txn,
+* and at some point we'll catch them in the act. This is more efficient
+* than keeping a map per pid.
+*/
+   struct rb_node *n = alloc->free_buffers.rb_node;
+   struct binder_buffer *buffer;
+   size_t buffer_size;
+   size_t total_alloc_size = 0;
+   size_t num_buffers = 0;
+
+   for (n = rb_first(>allocated_buffers); n != NULL;
+n = rb_next(n)) {
+   buffer = rb_entry(n, struct binder_buffer, rb_node);
+   if (buffer->pid != pid)
+   continue;
+   if (!buffer->async_transaction)
+   continue;
+   buffer_size = binder_alloc_buffer_size(alloc, buffer);
+   total_alloc_size += buffer_size;
+   num_buffers++;
+   }
+
+   // Warn if this pid has more than 50% of async space, or more than 50 
txns
+   if (num_buffers > 50 || total_alloc_size > alloc->buffer_size / 4) {
+   binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
+"%d: pid %d spamming oneway? %zd buffers allocated 
for a total size of %zd\n",
+ alloc->pid, pid, num_buffers, total_alloc_size);
+   }
+}
+
 static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
size_t data_size,
size_t offsets_size,
size_t extra_buffers_size,
-   int is_async)
+   int is_async,
+   int pid)
 {
struct rb_node *n = alloc->free_buffers.rb_node;
struct binder_buffer *buffer;
@@ -486,11 +522,16 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
buffer->offsets_size = offsets_size;
buffer->async_transaction = is_async;
buffer->extra_buffers_size = extra_buffers_size;
+   buffer->pid = pid;
if (is_async) {
alloc->free_async_space -= size + sizeof(struct binder_buffer);
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC,
 "%d: binder_alloc_buf size %zd async free %zd\n",
  alloc->pid, size, alloc->free_async_space);
+   if (alloc->free_async_space < alloc->buffer_size / 10) {
+   // Start detecting spammers once we reach 80% of async 
space used
+   debug_low_async_space_locked(alloc, pid);
+   }
}
return buffer;
 
@@ -508,6 +549,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
 

Re: [PATCH] binder: print warnings when detecting oneway spamming.

2020-08-20 Thread Martijn Coenen
On Thu, Aug 20, 2020 at 12:41 PM kernel test robot  wrote:
>
> Hi Martijn,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on staging/staging-testing]
> [also build test ERROR on v5.9-rc1 next-20200820]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:
> https://github.com/0day-ci/linux/commits/Martijn-Coenen/binder-print-warnings-when-detecting-oneway-spamming/20200820-155358
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git 
> bc752d2f345bf55d71b3422a6a24890ea03168dc
> config: s390-randconfig-r002-20200818 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
> 4deda57106f7c9b982a49cb907c33e3966c8de7f)
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install s390 cross compiling tool for clang build
> # apt-get install binutils-s390x-linux-gnu
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
>
> All errors (new ones prefixed by >>):
>
> >> drivers/android/binder_alloc_selftest.c:122:61: error: too few arguments 
> >> to function call, expected 6, have 5
>buffers[i] = binder_alloc_new_buf(alloc, sizes[i], 0, 0, 
> 0);

missed this call-site, will send v2.

Martijn
>  ^
>drivers/android/binder_alloc.h:118:30: note: 'binder_alloc_new_buf' 
> declared here
>extern struct binder_buffer *binder_alloc_new_buf(struct binder_alloc 
> *alloc,
> ^
>1 error generated.
>
> # 
> https://github.com/0day-ci/linux/commit/9d0b269f4468d6793f6fd76a410fdde39dbf6ac2
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review 
> Martijn-Coenen/binder-print-warnings-when-detecting-oneway-spamming/20200820-155358
> git checkout 9d0b269f4468d6793f6fd76a410fdde39dbf6ac2
> vim +122 drivers/android/binder_alloc_selftest.c
>
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  114
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  115  static void 
> binder_selftest_alloc_buf(struct binder_alloc *alloc,
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  116
> struct binder_buffer *buffers[],
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  117
> size_t *sizes, int *seq)
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  118  {
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  119  int i;
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  120
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  121  for (i = 0; i < BUFFER_NUM; 
> i++) {
> 4175e2b46fd4b9 Sherry Yang 2017-08-23 @122  buffers[i] = 
> binder_alloc_new_buf(alloc, sizes[i], 0, 0, 0);
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  123  if 
> (IS_ERR(buffers[i]) ||
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  124  
> !check_buffer_pages_allocated(alloc, buffers[i],
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  125
> sizes[i])) {
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  126  
> pr_err_size_seq(sizes, seq);
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  127  
> binder_selftest_failures++;
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  128  }
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  129  }
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  130  }
> 4175e2b46fd4b9 Sherry Yang 2017-08-23  131
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


[PATCH] binder: print warnings when detecting oneway spamming.

2020-08-20 Thread Martijn Coenen
The most common cause of the binder transaction buffer filling up is a
client rapidly firing oneway transactions into a process, before it has
a chance to handle them. Yet the root cause of this is often hard to
debug, because either the system or the app will stop, and by that time
binder debug information we dump in bugreports is no longer relevant.

This change warns as soon as a process dips below 80% of its oneway
space (less than 100kB available in the configuration), when any one
process is responsible for either more than 50 transactions, or more
than 50% of the oneway space.

Signed-off-by: Martijn Coenen 
---
 drivers/android/binder.c   |  2 +-
 drivers/android/binder_alloc.c | 49 +++---
 drivers/android/binder_alloc.h |  5 +++-
 3 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index f936530a19b0..946332bc871a 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3136,7 +3136,7 @@ static void binder_transaction(struct binder_proc *proc,
 
t->buffer = binder_alloc_new_buf(_proc->alloc, tr->data_size,
tr->offsets_size, extra_buffers_size,
-   !reply && (t->flags & TF_ONE_WAY));
+   !reply && (t->flags & TF_ONE_WAY), current->tgid);
if (IS_ERR(t->buffer)) {
/*
 * -ESRCH indicates VMA cleared. The target is dying.
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 69609696a843..76e8e633dbd4 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -338,12 +338,48 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
return vma;
 }
 
+static void debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
+{
+   /*
+* Find the amount and size of buffers allocated by the current caller;
+* The idea is that once we cross the threshold, whoever is responsible
+* for the low async space is likely to try to send another async txn,
+* and at some point we'll catch them in the act. This is more efficient
+* than keeping a map per pid.
+*/
+   struct rb_node *n = alloc->free_buffers.rb_node;
+   struct binder_buffer *buffer;
+   size_t buffer_size;
+   size_t total_alloc_size = 0;
+   size_t num_buffers = 0;
+
+   for (n = rb_first(>allocated_buffers); n != NULL;
+n = rb_next(n)) {
+   buffer = rb_entry(n, struct binder_buffer, rb_node);
+   if (buffer->pid != pid)
+   continue;
+   if (!buffer->async_transaction)
+   continue;
+   buffer_size = binder_alloc_buffer_size(alloc, buffer);
+   total_alloc_size += buffer_size;
+   num_buffers++;
+   }
+
+   // Warn if this pid has more than 50% of async space, or more than 50 
txns
+   if (num_buffers > 50 || total_alloc_size > alloc->buffer_size / 4) {
+   binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
+"%d: pid %d spamming oneway? %zd buffers allocated 
for a total size of %zd\n",
+ alloc->pid, pid, num_buffers, total_alloc_size);
+   }
+}
+
 static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
size_t data_size,
size_t offsets_size,
size_t extra_buffers_size,
-   int is_async)
+   int is_async,
+   int pid)
 {
struct rb_node *n = alloc->free_buffers.rb_node;
struct binder_buffer *buffer;
@@ -486,11 +522,16 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
buffer->offsets_size = offsets_size;
buffer->async_transaction = is_async;
buffer->extra_buffers_size = extra_buffers_size;
+   buffer->pid = pid;
if (is_async) {
alloc->free_async_space -= size + sizeof(struct binder_buffer);
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC,
 "%d: binder_alloc_buf size %zd async free %zd\n",
  alloc->pid, size, alloc->free_async_space);
+   if (alloc->free_async_space < alloc->buffer_size / 10) {
+   // Start detecting spammers once we reach 80% of async 
space used
+   debug_low_async_space_locked(alloc, pid);
+   }
}
return buffer;
 
@@ -508,6 +549,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
  * @offsets_size:   user specified buffer offset
  * @extra_buffers_size: size of extra space for meta-d

Re: [PATCH v2] loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE

2020-08-11 Thread Martijn Coenen
On Mon, Aug 10, 2020 at 7:16 PM Lennart Poettering  wrote:
>
> When LOOP_CONFIGURE is used with LO_FLAGS_PARTSCAN we need to propagate
> this into the GENHD_FL_NO_PART_SCAN. LOOP_SETSTATUS does this,
> LOOP_CONFIGURE doesn't so far. Effect is that setting up a loopback
> device with partition scanning doesn't actually work when LOOP_CONFIGURE
> is issued, though it works fine with LOOP_SETSTATUS.
>
> Let's correct that and propagate the flag in LOOP_CONFIGURE too.
>
> Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")
>
> Signed-off-by: Lennart Poettering 
> Acked-by: Martijn Coenen 

Thanks, still looks good to me.

> ---
>  drivers/block/loop.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index d18160146226..2f137d6ce169 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1171,6 +1171,8 @@ static int loop_configure(struct loop_device *lo, 
> fmode_t mode,
> if (part_shift)
> lo->lo_flags |= LO_FLAGS_PARTSCAN;
> partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
> +   if (partscan)
> +   lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
>
> /* Grab the block_device to prevent its destruction after we
>  * put /dev/loopXX inode. Later in __loop_clr_fd() we bdput(bdev).
> --
> 2.26.2


Re: [PATCH] loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE

2020-08-10 Thread Martijn Coenen
On Fri, Aug 7, 2020 at 5:31 PM Lennart Poettering  wrote:
> Thanks for the review. I'll fix this up and send a v2. Are you OK with
> me adding your Ack to the patch?

Yeah, sure!

> And also should this geta cc for stable?

LOOP_CONFIGURE was just added in v5.8, and stable is v5.7 now, so I
don't think that's needed.

Thanks,
Martijn

>
> Thanks,
>
> Lennart
>
> --
> Lennart Poettering, Berlin


Re: [PATCH] loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE

2020-08-07 Thread Martijn Coenen
Hi Lennart,

Thanks again for the patch, I tested it and it looks good to me. I'll
also add a test case to LTP for this. Two minor nits on the patch:

On Thu, Aug 6, 2020 at 9:32 AM Lennart Poettering  wrote:
> Let's correct that and propagate the flag in LOOP_SETSTATUS too.

Think you meant LOOP_CONFIGURE.

Also, could you add a "Fixes" tag, like:

Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")


>
> Signed-off-by: Lennart Poettering 
> ---
>  drivers/block/loop.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index d18160146226..2f137d6ce169 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1171,6 +1171,8 @@ static int loop_configure(struct loop_device *lo, 
> fmode_t mode,
> if (part_shift)
> lo->lo_flags |= LO_FLAGS_PARTSCAN;
> partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
> +   if (partscan)
> +   lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
>
> /* Grab the block_device to prevent its destruction after we
>  * put /dev/loopXX inode. Later in __loop_clr_fd() we bdput(bdev).
> --
> 2.26.2


Re: [PATCH v2] binder: Prevent context manager from incrementing ref 0

2020-07-28 Thread Martijn Coenen
Thanks a lot for the detailed explanation, I understood now.

Martijn

On Tue, Jul 28, 2020 at 4:50 PM Jann Horn  wrote:
>
> On Tue, Jul 28, 2020 at 3:50 PM Martijn Coenen  wrote:
> > On Mon, Jul 27, 2020 at 2:04 PM Jann Horn  wrote:
> > >  - task B opens /dev/binder once, creating binder_proc instance P3
> > >  - P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way
> > >transaction)
> > >  - P2 receives the handle and uses it to call P3 (two-way transaction)
> > >  - P3 calls P2 (via magic handle 0) (two-way transaction)
> > >  - P2 calls P2 (via handle 1) (two-way transaction)
> >
> > Why do you need P3 involved at all? Could P2 just straight away make a
> > call on handle 1?
>
> Yes, it could, I think. IIRC these steps are necessary if you want to
> actually get a KASAN splat, instead of just a transaction-to-self with
> no further consequences. It has been a while since I looked at this,
> but I'll try to figure out again what was going on...
>
>
> A UAF occurs in the following code due to the transaction-to-self,
> because the "if (t->to_thread == thread)" is tricked into believing
> that the transaction has already been accepted.
>
> static int binder_thread_release(struct binder_proc *proc,
>  struct binder_thread *thread)
> {
> struct binder_transaction *t;
> struct binder_transaction *send_reply = NULL;
> [...]
> t = thread->transaction_stack;
> if (t) {
> [...]
> if (t->to_thread == thread)
> send_reply = t;
> } else {
> [...]
> }
> [...]
> //NOTE: transaction is freed here because top-of-stack is
> //  wrongly treated as already-accepted incoming transaction)
> if (send_reply)
> binder_send_failed_reply(send_reply, BR_DEAD_REPLY);
> //NOTE pending transaction work is processed here (transaction has not
> // yet been accepted)
> binder_release_work(proc, >todo);
> [...]
> }
>
> An unaccepted transaction will only have a non-NULL ->to_thread if the
> transaction has a specific target thread; for a non-reply transaction,
> that is only the case if it is a two-way transaction that was sent
> while the binder call stack already contained the target task (iow,
> the transaction looks like a synchronous callback invocation).
>
> With the steps:
>
>  - P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way
>transaction)
>  - P2 receives the handle and uses it to call P3 (two-way transaction)
>  - P3 calls P2 (via magic handle 0) (two-way transaction)
>  - P2 calls P2 (via handle 1) (two-way transaction)
>
> the call stack will look like this:
>
> P3 -> P2 -> P3 -> P2 -> P2
>
> The initial call from P3 to P2 was IIRC just to give P2 a handle to
> P3; IIRC the relevant part of the call stack was:
>
> P2 -> P3 -> P2 -> P2
>
> Because P2 already occurs down in the call stack, the final
> transaction "P2 -> P2" is considered to be a callback and is
> thread-directed.
>
>
> The reason why P3 has to be created from task B is the "if
> (target_node && target_proc->pid == proc->pid)" check for transactions
> to reference 0.


Re: [PATCH v2] binder: Prevent context manager from incrementing ref 0

2020-07-28 Thread Martijn Coenen
Thanks Jann, the change LGTM, one question on the repro scenario that
wasn't immediately obvious to me:

On Mon, Jul 27, 2020 at 2:04 PM Jann Horn  wrote:
>  - task B opens /dev/binder once, creating binder_proc instance P3
>  - P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way
>transaction)
>  - P2 receives the handle and uses it to call P3 (two-way transaction)
>  - P3 calls P2 (via magic handle 0) (two-way transaction)
>  - P2 calls P2 (via handle 1) (two-way transaction)

Why do you need P3 involved at all? Could P2 just straight away make a
call on handle 1?

>
> And then, if P2 does *NOT* accept the incoming transaction work, but
> instead closes the binder fd, we get a crash.
>
> Solve it by preventing the context manager from using ACQUIRE on ref 0.
> There shouldn't be any legitimate reason for the context manager to do
> that.
>
> Additionally, print a warning if someone manages to find another way to
> trigger a transaction-to-self bug in the future.
>
> Cc: sta...@vger.kernel.org
> Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
> Acked-by: Todd Kjos 
Reviewed-by: Martijn Coenen 

> Signed-off-by: Jann Horn 
> ---
> fixed that broken binder_user_error() from the first version...
> I sent v1 while I had a dirty tree containing the missing fix. whoops.
>
>  drivers/android/binder.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index f50c5f182bb5..5b310eea9e52 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -2982,6 +2982,12 @@ static void binder_transaction(struct binder_proc 
> *proc,
> goto err_dead_binder;
> }
> e->to_node = target_node->debug_id;
> +   if (WARN_ON(proc == target_proc)) {
> +   return_error = BR_FAILED_REPLY;
> +   return_error_param = -EINVAL;
> +   return_error_line = __LINE__;
> +   goto err_invalid_target_handle;
> +   }
> if (security_binder_transaction(proc->tsk,
> target_proc->tsk) < 0) {
> return_error = BR_FAILED_REPLY;
> @@ -3635,10 +3641,17 @@ static int binder_thread_write(struct binder_proc 
> *proc,
> struct binder_node *ctx_mgr_node;
> mutex_lock(>context_mgr_node_lock);
> ctx_mgr_node = 
> context->binder_context_mgr_node;
> -   if (ctx_mgr_node)
> +   if (ctx_mgr_node) {
> +   if (ctx_mgr_node->proc == proc) {
> +   binder_user_error("%d:%d 
> context manager tried to acquire desc 0\n",
> + proc->pid, 
> thread->pid);
> +   
> mutex_unlock(>context_mgr_node_lock);
> +   return -EINVAL;
> +   }
> ret = binder_inc_ref_for_node(
> proc, ctx_mgr_node,
> strong, NULL, );
> +   }
> mutex_unlock(>context_mgr_node_lock);
> }
> if (ret)
>
> base-commit: 2a89b99f580371b86ae9bafd6cbeccd3bfab524a
> --
> 2.28.0.rc0.142.g3c755180ce-goog
>


Re: [LTP] LTP: syscalls: regression on mainline - ioctl_loop01 mknod07 setns01

2020-06-05 Thread Martijn Coenen
Hi Yang,

On Fri, Jun 5, 2020 at 11:27 AM Yang Xu  wrote:
> since  ~LOOP_SET_STATUS_SETTABLE_FLAGS has been included in
> ~LOOP_SET_STATUS_CLEARABLE_FLAGS, do we still need the previous step?
> What do you think about it?

Yeah, I don't think we need the previous step with the current set of
flags, because clearable is a subset of settable. I will send a
follow-up patch some time next week.

Best,
Martijn

>
> Best Regards
> Yang Xu
>
> > Hey Yang,
> >
> > On Fri, Jun 5, 2020 at 10:59 AM Yang Xu  
> > wrote:
> >>
> >> Hi Martijn
> >>
> >> Sorry for noise. I see your patch in here[1] . I will modify
> >> ioctl_loop01 to test that LO_FLAGS_PARTSCAN can not clear and
> >> LO_FLAGS_AUTOCLEAR can be clear.
> >
> > Thanks, that would indeed be useful.
> >
> >>
> >> ps: Giving the url of patch is better so that other people doesn't need
> >> to investigate it again.
> >> [1]https://patchwork.kernel.org/patch/11588321/
> >
> > Ok, will do next time!
> >
> > Best,
> > Martijn
> >>
> >> Best Regards
> >> Yang Xu
> >>> Hi Martijn
> >>>
> >>>> Hi Naresh,
> >>>>
> >>>> I just sent a patch and cc'd you. I verified all the loop tests pass
> >>>> again with that patch.
> >>> I think you want to say "without".  I verified the ioctl_loop01 fails
> >>> with faf1d25440 ("loop: Clean up LOOP_SET_STATUS lo_flags handling").
> >>>
> >>> This kernel commit breaks old behaviour(if old flag all 0, new flag is
> >>> always 0 regradless your flag setting).
> >>>
> >>> I think we should modify code as below:
> >>> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> >>> index 13518ba191f5..c6ba8cf486ce 100644
> >>> --- a/drivers/block/loop.c
> >>> +++ b/drivers/block/loop.c
> >>> @@ -1364,11 +1364,9 @@ loop_set_status(struct loop_device *lo, const
> >>> struct loop_info64 *info)
> >>>   if (err)
> >>>   goto out_unfreeze;
> >>>
> >>> -   /* Mask out flags that can't be set using LOOP_SET_STATUS. */
> >>> -   lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS;
> >>> -   /* For those flags, use the previous values instead */
> >>> -   lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS;
> >>> -   /* For flags that can't be cleared, use previous values too */
> >>> +   /* Mask out flags that can be set using LOOP_SET_STATUS. */
> >>> +   lo->lo_flags &= LOOP_SET_STATUS_SETTABLE_FLAGS;
> >>> +   /* For flags that can't be cleared, use previous values. */
> >>>   lo->lo_flags |= prev_lo_flags &~LOOP_SET_STATUS_CLEARABLE_FLAGS;
> >>>
> >>> Best Regards
> >>> Yang Xu
> >>>>
> >>>> Thanks,
> >>>> Martijn
> >>>>
> >>>>
> >>>> On Thu, Jun 4, 2020 at 9:10 PM Martijn Coenen  wrote:
> >>>>>
> >>>>> Hi Naresh,
> >>>>>
> >>>>> I suspect the loop failures are due to
> >>>>> faf1d25440d6ad06d509dada4b6fe62fea844370 ("loop: Clean up
> >>>>> LOOP_SET_STATUS lo_flags handling"), I will investigate and get back
> >>>>> to you.
> >>>>>
> >>>>> Thanks,
> >>>>> Martijn
> >>>>>
> >>>>> On Thu, Jun 4, 2020 at 7:19 PM Naresh Kamboju
> >>>>>  wrote:
> >>>>>>
> >>>>>> + linux-bl...@vger.kernel.org
> >>>>>>
> >>>>>> On Thu, 4 Jun 2020 at 22:47, Naresh Kamboju
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> Following three test cases reported as regression on Linux mainline
> >>>>>>> kernel
> >>>>>>> on x86_64, arm64, arm and i386
> >>>>>>>
> >>>>>>> ltp-syscalls-tests:
> >>>>>>>   * ioctl_loop01
> >>>>>>>   * mknod07
> >>>>>>>   * setns01
> >>>>>>>
> >>>>>>> git repo:
> >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >>>>>>> git branch: master
>

Re: [LTP] LTP: syscalls: regression on mainline - ioctl_loop01 mknod07 setns01

2020-06-05 Thread Martijn Coenen
Hey Yang,

On Fri, Jun 5, 2020 at 10:59 AM Yang Xu  wrote:
>
> Hi Martijn
>
> Sorry for noise. I see your patch in here[1] . I will modify
> ioctl_loop01 to test that LO_FLAGS_PARTSCAN can not clear and
> LO_FLAGS_AUTOCLEAR can be clear.

Thanks, that would indeed be useful.

>
> ps: Giving the url of patch is better so that other people doesn't need
> to investigate it again.
> [1]https://patchwork.kernel.org/patch/11588321/

Ok, will do next time!

Best,
Martijn
>
> Best Regards
> Yang Xu
> > Hi Martijn
> >
> >> Hi Naresh,
> >>
> >> I just sent a patch and cc'd you. I verified all the loop tests pass
> >> again with that patch.
> > I think you want to say "without".  I verified the ioctl_loop01 fails
> > with faf1d25440 ("loop: Clean up LOOP_SET_STATUS lo_flags handling").
> >
> > This kernel commit breaks old behaviour(if old flag all 0, new flag is
> > always 0 regradless your flag setting).
> >
> > I think we should modify code as below:
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index 13518ba191f5..c6ba8cf486ce 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -1364,11 +1364,9 @@ loop_set_status(struct loop_device *lo, const
> > struct loop_info64 *info)
> >  if (err)
> >  goto out_unfreeze;
> >
> > -   /* Mask out flags that can't be set using LOOP_SET_STATUS. */
> > -   lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS;
> > -   /* For those flags, use the previous values instead */
> > -   lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS;
> > -   /* For flags that can't be cleared, use previous values too */
> > +   /* Mask out flags that can be set using LOOP_SET_STATUS. */
> > +   lo->lo_flags &= LOOP_SET_STATUS_SETTABLE_FLAGS;
> > +   /* For flags that can't be cleared, use previous values. */
> >  lo->lo_flags |= prev_lo_flags &~LOOP_SET_STATUS_CLEARABLE_FLAGS;
> >
> > Best Regards
> > Yang Xu
> >>
> >> Thanks,
> >> Martijn
> >>
> >>
> >> On Thu, Jun 4, 2020 at 9:10 PM Martijn Coenen  wrote:
> >>>
> >>> Hi Naresh,
> >>>
> >>> I suspect the loop failures are due to
> >>> faf1d25440d6ad06d509dada4b6fe62fea844370 ("loop: Clean up
> >>> LOOP_SET_STATUS lo_flags handling"), I will investigate and get back
> >>> to you.
> >>>
> >>> Thanks,
> >>> Martijn
> >>>
> >>> On Thu, Jun 4, 2020 at 7:19 PM Naresh Kamboju
> >>>  wrote:
> >>>>
> >>>> + linux-bl...@vger.kernel.org
> >>>>
> >>>> On Thu, 4 Jun 2020 at 22:47, Naresh Kamboju
> >>>>  wrote:
> >>>>>
> >>>>> Following three test cases reported as regression on Linux mainline
> >>>>> kernel
> >>>>> on x86_64, arm64, arm and i386
> >>>>>
> >>>>>ltp-syscalls-tests:
> >>>>>  * ioctl_loop01
> >>>>>  * mknod07
> >>>>>  * setns01
> >>>>>
> >>>>> git repo:
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >>>>> git branch: master
> >>>>> GOOD:
> >>>>>git commit: b23c4771ff62de8ca9b5e4a2d64491b2fb6f8f69
> >>>>>git describe: v5.7-1230-gb23c4771ff62
> >>>>> BAD:
> >>>>>git commit: 1ee08de1e234d95b5b4f866878b72fceb5372904
> >>>>>git describe: v5.7-3523-g1ee08de1e234
> >>>>>
> >>>>> kernel-config:
> >>>>> https://builds.tuxbuild.com/U3bU0dMA62OVHb4DvZIVuw/kernel.config
> >>>>>
> >>>>> We are investigating these failures.
> >>>>>
> >>>>> tst_test.c:906: CONF: btrfs driver not available
> >>>>> tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s
> >>>>> tst_device.c:88: INFO: Found free device 1 '/dev/loop1'
> >>>>> ioctl_loop01.c:49: PASS: /sys/block/loop1/loop/partscan = 0
> >>>>> [ 1073.639677] loop_set_status: loop1 () has still dirty pages
> >>>>> (nrpages=1)
> >>>>> ioctl_loop01.c:50: PASS: /sys/block/loop1/loop/autoclear = 0
> >>>>> ioctl_loop01.c:51: PASS: /sys/block/loop1/loop/backing_file =
> >>

Re: LTP: syscalls: regression on mainline - ioctl_loop01 mknod07 setns01

2020-06-04 Thread Martijn Coenen
Hi Naresh,

I just sent a patch and cc'd you. I verified all the loop tests pass
again with that patch.

Thanks,
Martijn


On Thu, Jun 4, 2020 at 9:10 PM Martijn Coenen  wrote:
>
> Hi Naresh,
>
> I suspect the loop failures are due to
> faf1d25440d6ad06d509dada4b6fe62fea844370 ("loop: Clean up
> LOOP_SET_STATUS lo_flags handling"), I will investigate and get back
> to you.
>
> Thanks,
> Martijn
>
> On Thu, Jun 4, 2020 at 7:19 PM Naresh Kamboju  
> wrote:
> >
> > + linux-bl...@vger.kernel.org
> >
> > On Thu, 4 Jun 2020 at 22:47, Naresh Kamboju  
> > wrote:
> > >
> > > Following three test cases reported as regression on Linux mainline kernel
> > > on x86_64, arm64, arm and i386
> > >
> > >   ltp-syscalls-tests:
> > > * ioctl_loop01
> > > * mknod07
> > > * setns01
> > >
> > > git repo: 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > git branch: master
> > > GOOD:
> > >   git commit: b23c4771ff62de8ca9b5e4a2d64491b2fb6f8f69
> > >   git describe: v5.7-1230-gb23c4771ff62
> > > BAD:
> > >   git commit: 1ee08de1e234d95b5b4f866878b72fceb5372904
> > >   git describe: v5.7-3523-g1ee08de1e234
> > >
> > > kernel-config: 
> > > https://builds.tuxbuild.com/U3bU0dMA62OVHb4DvZIVuw/kernel.config
> > >
> > > We are investigating these failures.
> > >
> > > tst_test.c:906: CONF: btrfs driver not available
> > > tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s
> > > tst_device.c:88: INFO: Found free device 1 '/dev/loop1'
> > > ioctl_loop01.c:49: PASS: /sys/block/loop1/loop/partscan = 0
> > > [ 1073.639677] loop_set_status: loop1 () has still dirty pages (nrpages=1)
> > > ioctl_loop01.c:50: PASS: /sys/block/loop1/loop/autoclear = 0
> > > ioctl_loop01.c:51: PASS: /sys/block/loop1/loop/backing_file =
> > > '/scratch/ltp-mnIdulzriQ/9cPtLQ/test.img'
> > > ioctl_loop01.c:63: FAIL: expect 12 but got 17
> > > ioctl_loop01.c:67: FAIL: /sys/block/loop1/loop/partscan != 1 got 0
> > > ioctl_loop01.c:68: FAIL: /sys/block/loop1/loop/autoclear != 1 got 0
> > > ioctl_loop01.c:79: FAIL: access /dev/loop1p1 fails
> > > [ 1073.679678] loop_set_status: loop1 () has still dirty pages (nrpages=1)
> > > ioctl_loop01.c:85: FAIL: access /sys/block/loop1/loop1p1 fails
> > >
> > > HINT: You _MAY_ be missing kernel fixes, see:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10c70d95c0f2
> > >
> > > mke2fs 1.43.8 (1-Jan-2018)
> > > [ 1264.711379] EXT4-fs (loop0): mounting ext2 file system using the
> > > ext4 subsystem
> > > [ 1264.716642] EXT4-fs (loop0): mounted filesystem without journal. Opts: 
> > > (null)
> > > mknod07 0  TINFO  :  Using test device LTP_DEV='/dev/loop0'
> > > mknod07 0  TINFO  :  Formatting /dev/loop0 with ext2 opts='' extra 
> > > opts=''
> > > mknod07 1  TPASS  :  mknod failed as expected:
> > > TEST_ERRNO=EACCES(13): Permission denied
> > > mknod07 2  TPASS  :  mknod failed as expected:
> > > TEST_ERRNO=EACCES(13): Permission denied
> > > mknod07 3  TFAIL  :  mknod07.c:155: mknod succeeded unexpectedly
> > > mknod07 4  TPASS  :  mknod failed as expected:
> > > TEST_ERRNO=EPERM(1): Operation not permitted
> > > mknod07 5  TPASS  :  mknod failed as expected:
> > > TEST_ERRNO=EROFS(30): Read-only file system
> > > mknod07 6  TPASS  :  mknod failed as expected:
> > > TEST_ERRNO=ELOOP(40): Too many levels of symbolic links
> > >
> > >
> > > setns01 0  TINFO  :  ns_name=ipc, ns_fds[0]=6, ns_types[0]=0x800
> > > setns01 0  TINFO  :  ns_name=mnt, ns_fds[1]=7, ns_types[1]=0x2
> > > setns01 0  TINFO  :  ns_name=net, ns_fds[2]=8, ns_types[2]=0x4000
> > > setns01 0  TINFO  :  ns_name=pid, ns_fds[3]=9, ns_types[3]=0x2000
> > > setns01 0  TINFO  :  ns_name=uts, ns_fds[4]=10, ns_types[4]=0x400
> > > setns01 0  TINFO  :  setns(-1, 0x800)
> > > setns01 1  TPASS  :  invalid fd exp_errno=9
> > > setns01 0  TINFO  :  setns(-1, 0x2)
> > > setns01 2  TPASS  :  invalid fd exp_errno=9
> > > setns01 0  TINFO  :  setns(-1, 0x4000)
> > > setns01 3  TPASS  :  invalid fd exp_errno=9
> > > setns01 0  TINFO  :  setns(-1, 0x2000)
> > > setns01 4  TPASS  :  invalid fd exp_errno=9
> > > setns01

[PATCH] loop: Fix wrong masking of status flags

2020-06-04 Thread Martijn Coenen
In faf1d25440d6, loop_set_status() now assigns lo_status directly from
the passed in lo_flags, but then fixes it up by masking out flags that
can't be set by LOOP_SET_STATUS; unfortunately the mask was negated.

Re-ran all ltp ioctl_loop tests, and they all passed.

Pass run of the previously failing one:

tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
tst_device.c:88: INFO: Found free device 0 '/dev/loop0'
ioctl_loop01.c:49: PASS: /sys/block/loop0/loop/partscan = 0
ioctl_loop01.c:50: PASS: /sys/block/loop0/loop/autoclear = 0
ioctl_loop01.c:51: PASS: /sys/block/loop0/loop/backing_file =
'/tmp/ZRJ6H4/test.img'
ioctl_loop01.c:65: PASS: get expected lo_flag 12
ioctl_loop01.c:67: PASS: /sys/block/loop0/loop/partscan = 1
ioctl_loop01.c:68: PASS: /sys/block/loop0/loop/autoclear = 1
ioctl_loop01.c:77: PASS: access /dev/loop0p1 succeeds
ioctl_loop01.c:83: PASS: access /sys/block/loop0/loop0p1 succeeds

Summary:
passed   8
failed   0
skipped  0
warnings 0

Fixes: faf1d25440d6 ("loop: Clean up LOOP_SET_STATUS lo_flags handling")
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4212288ab157..ad63e4247868 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1390,7 +1390,7 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
goto out_unfreeze;
 
/* Mask out flags that can't be set using LOOP_SET_STATUS. */
-   lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS;
+   lo->lo_flags &= LOOP_SET_STATUS_SETTABLE_FLAGS;
/* For those flags, use the previous values instead */
lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS;
/* For flags that can't be cleared, use previous values too */
-- 
2.27.0.278.ge193c7cf3a9-goog



Re: LTP: syscalls: regression on mainline - ioctl_loop01 mknod07 setns01

2020-06-04 Thread Martijn Coenen
Hi Naresh,

I suspect the loop failures are due to
faf1d25440d6ad06d509dada4b6fe62fea844370 ("loop: Clean up
LOOP_SET_STATUS lo_flags handling"), I will investigate and get back
to you.

Thanks,
Martijn

On Thu, Jun 4, 2020 at 7:19 PM Naresh Kamboju  wrote:
>
> + linux-bl...@vger.kernel.org
>
> On Thu, 4 Jun 2020 at 22:47, Naresh Kamboju  wrote:
> >
> > Following three test cases reported as regression on Linux mainline kernel
> > on x86_64, arm64, arm and i386
> >
> >   ltp-syscalls-tests:
> > * ioctl_loop01
> > * mknod07
> > * setns01
> >
> > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > git branch: master
> > GOOD:
> >   git commit: b23c4771ff62de8ca9b5e4a2d64491b2fb6f8f69
> >   git describe: v5.7-1230-gb23c4771ff62
> > BAD:
> >   git commit: 1ee08de1e234d95b5b4f866878b72fceb5372904
> >   git describe: v5.7-3523-g1ee08de1e234
> >
> > kernel-config: 
> > https://builds.tuxbuild.com/U3bU0dMA62OVHb4DvZIVuw/kernel.config
> >
> > We are investigating these failures.
> >
> > tst_test.c:906: CONF: btrfs driver not available
> > tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s
> > tst_device.c:88: INFO: Found free device 1 '/dev/loop1'
> > ioctl_loop01.c:49: PASS: /sys/block/loop1/loop/partscan = 0
> > [ 1073.639677] loop_set_status: loop1 () has still dirty pages (nrpages=1)
> > ioctl_loop01.c:50: PASS: /sys/block/loop1/loop/autoclear = 0
> > ioctl_loop01.c:51: PASS: /sys/block/loop1/loop/backing_file =
> > '/scratch/ltp-mnIdulzriQ/9cPtLQ/test.img'
> > ioctl_loop01.c:63: FAIL: expect 12 but got 17
> > ioctl_loop01.c:67: FAIL: /sys/block/loop1/loop/partscan != 1 got 0
> > ioctl_loop01.c:68: FAIL: /sys/block/loop1/loop/autoclear != 1 got 0
> > ioctl_loop01.c:79: FAIL: access /dev/loop1p1 fails
> > [ 1073.679678] loop_set_status: loop1 () has still dirty pages (nrpages=1)
> > ioctl_loop01.c:85: FAIL: access /sys/block/loop1/loop1p1 fails
> >
> > HINT: You _MAY_ be missing kernel fixes, see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10c70d95c0f2
> >
> > mke2fs 1.43.8 (1-Jan-2018)
> > [ 1264.711379] EXT4-fs (loop0): mounting ext2 file system using the
> > ext4 subsystem
> > [ 1264.716642] EXT4-fs (loop0): mounted filesystem without journal. Opts: 
> > (null)
> > mknod07 0  TINFO  :  Using test device LTP_DEV='/dev/loop0'
> > mknod07 0  TINFO  :  Formatting /dev/loop0 with ext2 opts='' extra 
> > opts=''
> > mknod07 1  TPASS  :  mknod failed as expected:
> > TEST_ERRNO=EACCES(13): Permission denied
> > mknod07 2  TPASS  :  mknod failed as expected:
> > TEST_ERRNO=EACCES(13): Permission denied
> > mknod07 3  TFAIL  :  mknod07.c:155: mknod succeeded unexpectedly
> > mknod07 4  TPASS  :  mknod failed as expected:
> > TEST_ERRNO=EPERM(1): Operation not permitted
> > mknod07 5  TPASS  :  mknod failed as expected:
> > TEST_ERRNO=EROFS(30): Read-only file system
> > mknod07 6  TPASS  :  mknod failed as expected:
> > TEST_ERRNO=ELOOP(40): Too many levels of symbolic links
> >
> >
> > setns01 0  TINFO  :  ns_name=ipc, ns_fds[0]=6, ns_types[0]=0x800
> > setns01 0  TINFO  :  ns_name=mnt, ns_fds[1]=7, ns_types[1]=0x2
> > setns01 0  TINFO  :  ns_name=net, ns_fds[2]=8, ns_types[2]=0x4000
> > setns01 0  TINFO  :  ns_name=pid, ns_fds[3]=9, ns_types[3]=0x2000
> > setns01 0  TINFO  :  ns_name=uts, ns_fds[4]=10, ns_types[4]=0x400
> > setns01 0  TINFO  :  setns(-1, 0x800)
> > setns01 1  TPASS  :  invalid fd exp_errno=9
> > setns01 0  TINFO  :  setns(-1, 0x2)
> > setns01 2  TPASS  :  invalid fd exp_errno=9
> > setns01 0  TINFO  :  setns(-1, 0x4000)
> > setns01 3  TPASS  :  invalid fd exp_errno=9
> > setns01 0  TINFO  :  setns(-1, 0x2000)
> > setns01 4  TPASS  :  invalid fd exp_errno=9
> > setns01 0  TINFO  :  setns(-1, 0x400)
> > setns01 5  TPASS  :  invalid fd exp_errno=9
> > setns01 0  TINFO  :  setns(11, 0x800)
> > setns01 6  TFAIL  :  setns01.c:176: regular file fd exp_errno=22:
> > errno=EBADF(9): Bad file descriptor
> > setns01 0  TINFO  :  setns(11, 0x2)
> > setns01 7  TFAIL  :  setns01.c:176: regular file fd exp_errno=22:
> > errno=EBADF(9): Bad file descriptor
> > setns01 0  TINFO  :  setns(11, 0x4000)
> > setns01 8  TFAIL  :  setns01.c:176: regular file fd exp_errno=22:
> > errno=EBADF(9): Bad file descriptor
> > setns01 0  TINFO  :  setns(11, 0x2000)
> > setns01 9  TFAIL  :  setns01.c:176: regular file fd exp_errno=22:
> > errno=EBADF(9): Bad file descriptor
> > setns01 0  TINFO  :  setns(11, 0x400)
> > setns0110  TFAIL  :  setns01.c:176: regular file fd exp_errno=22:
> > errno=EBADF(9): Bad file descriptor
> >
> > Full test log link,
> > https://lkft.validation.linaro.org/scheduler/job/1467931#L8047
> >
> > test results comparison shows this test case started failing from 
> > June-2-2020
> > 

Re: Writeback bug causing writeback stalls

2020-06-02 Thread Martijn Coenen
On Mon, Jun 1, 2020 at 11:09 AM Jan Kara  wrote:
> Thanks for testing! My test run has completed fine so I'll submit patches
> for review. But I'm curious what's causing the dips in throughput in your
> test...

It turned out to be unrelated to your patch. Sorry for the noise! We
have the patch in dogfood on some of our devices, and I will let you
know if we run into any issues. I'll also spend some more time
reviewing your patches and will respond to them later.

Thanks,
Martijn
>
> Honza
>
> --
> Jan Kara 
> SUSE Labs, CR


Re: Writeback bug causing writeback stalls

2020-05-29 Thread Martijn Coenen
Hi Jan,

On Fri, May 29, 2020 at 5:20 PM Jan Kara  wrote:
> I understand. I have written a fix (attached). Currently its under testing
> together with other cleanups. If everything works fine, I plan to submit
> the patches on Monday.

Thanks a lot for the quick fix! I ran my usual way to reproduce the
problem, and did not see it, so that's good! I do observe write speed
dips - eg we usually sustain 180 MB/s on this device, but now it
regularly dips down to 10 MB/s, then jumps back up again. That might
be unrelated to your patch though, I will run more tests over the
weekend and report back!

Best,
Martijn

>
> Honza
> --
> Jan Kara 
> SUSE Labs, CR


Re: Writeback bug causing writeback stalls

2020-05-27 Thread Martijn Coenen
Hi Jan,

On Mon, May 25, 2020 at 9:31 AM Jan Kara  wrote:
> Well, most importantly filesystems like ext4, xfs, btrfs don't hold i_rwsem
> when writing back inode and that's deliberate because of performance. We
> don't want to block writes (or event reads in case of XFS) for the inode
> during writeback.

Thanks for clarifying, that makes sense. By the way, do you have an
ETA for your fix? We are under some time pressure to get this fixed in
our downstream kernels, but I'd much rather take a fix from upstream
from somebody who knows this code well. Alternatively, I can take a
stab at the idea you proposed and send a patch to LKML for review this
week.

Thanks,
Martijn


>
> Honza
> --
> Jan Kara 
> SUSE Labs, CR


Re: Writeback bug causing writeback stalls

2020-05-23 Thread Martijn Coenen
Jaegeuk wondered whether callers of write_inode_now() should hold
i_rwsem, and whether that would also prevent this problem. Some
existing callers of write_inode_now() do, eg ntfs and hfs:

hfs_file_fsync()
inode_lock(inode);

/* sync the inode to buffers */
ret = write_inode_now(inode, 0);

but there are also some that don't (eg fat, fuse, orangefs).

Thanks,
Martijn


On Fri, May 22, 2020 at 5:36 PM Jan Kara  wrote:
>
> On Fri 22-05-20 17:23:30, Martijn Coenen wrote:
> > [ dropped android-storage-c...@google.com from CC: since that list
> > can't receive emails from outside google.com - sorry about that ]
> >
> > Hi Jan,
> >
> > On Fri, May 22, 2020 at 4:41 PM Jan Kara  wrote:
> > > > The easiest way to fix this, I think, is to call requeue_inode() at the 
> > > > end of
> > > > writeback_single_inode(), much like it is called from 
> > > > writeback_sb_inodes().
> > > > However, requeue_inode() has the following ominous warning:
> > > >
> > > > /*
> > > >  * Find proper writeback list for the inode depending on its current 
> > > > state and
> > > >  * possibly also change of its state while we were doing writeback.  
> > > > Here we
> > > >  * handle things such as livelock prevention or fairness of writeback 
> > > > among
> > > >  * inodes. This function can be called only by flusher thread - noone 
> > > > else
> > > >  * processes all inodes in writeback lists and requeueing inodes behind 
> > > > flusher
> > > >  * thread's back can have unexpected consequences.
> > > >  */
> > > >
> > > > Obviously this is very critical code both from a correctness and a 
> > > > performance
> > > > point of view, so I wanted to run this by the maintainers and folks who 
> > > > have
> > > > contributed to this code first.
> > >
> > > Sadly, the fix won't be so easy. The main problem with calling
> > > requeue_inode() from writeback_single_inode() is that if there's parallel
> > > sync(2) call, inode->i_io_list is used to track all inodes that need 
> > > writing
> > > before sync(2) can complete. So requeueing inodes in parallel while 
> > > sync(2)
> > > runs can result in breaking data integrity guarantees of it.
> >
> > Ah, makes sense.
> >
> > > But I agree
> > > we need to find some mechanism to safely move inode to appropriate dirty
> > > list reasonably quickly.
> > >
> > > Probably I'd add an inode state flag telling that inode is queued for
> > > writeback by flush worker and we won't touch dirty lists in that case,
> > > otherwise we are safe to update current writeback list as needed. I'll 
> > > work
> > > on fixing this as when I was reading the code I've noticed there are other
> > > quirks in the code as well. Thanks for the report!
> >
> > Thanks! While looking at the code I also saw some other paths that
> > appeared to be racy, though I haven't worked them out in detail to
> > confirm that - the locking around the inode and writeback lists is
> > tricky. What's the best way to follow up on those? Happy to post them
> > to this same thread after I spend a bit more time looking at the code.
>
> Sure, if you are aware some some other problems, just write them to this
> thread. FWIW stuff that I've found so far:
>
> 1) __I_DIRTY_TIME_EXPIRED setting in move_expired_inodes() can get lost as
> there are other places doing RMW modifications of inode->i_state.
>
> 2) sync(2) is prone to livelocks as when we queue inodes from b_dirty_time
> list, we don't take dirtied_when into account (and that's the only thing
> that makes sure aggressive dirtier cannot livelock sync).
>
> Honza
> --
> Jan Kara 
> SUSE Labs, CR


Re: Writeback bug causing writeback stalls

2020-05-22 Thread Martijn Coenen
[ dropped android-storage-c...@google.com from CC: since that list
can't receive emails from outside google.com - sorry about that ]

Hi Jan,

On Fri, May 22, 2020 at 4:41 PM Jan Kara  wrote:
> > The easiest way to fix this, I think, is to call requeue_inode() at the end 
> > of
> > writeback_single_inode(), much like it is called from writeback_sb_inodes().
> > However, requeue_inode() has the following ominous warning:
> >
> > /*
> >  * Find proper writeback list for the inode depending on its current state 
> > and
> >  * possibly also change of its state while we were doing writeback.  Here we
> >  * handle things such as livelock prevention or fairness of writeback among
> >  * inodes. This function can be called only by flusher thread - noone else
> >  * processes all inodes in writeback lists and requeueing inodes behind 
> > flusher
> >  * thread's back can have unexpected consequences.
> >  */
> >
> > Obviously this is very critical code both from a correctness and a 
> > performance
> > point of view, so I wanted to run this by the maintainers and folks who have
> > contributed to this code first.
>
> Sadly, the fix won't be so easy. The main problem with calling
> requeue_inode() from writeback_single_inode() is that if there's parallel
> sync(2) call, inode->i_io_list is used to track all inodes that need writing
> before sync(2) can complete. So requeueing inodes in parallel while sync(2)
> runs can result in breaking data integrity guarantees of it.

Ah, makes sense.

> But I agree
> we need to find some mechanism to safely move inode to appropriate dirty
> list reasonably quickly.
>
> Probably I'd add an inode state flag telling that inode is queued for
> writeback by flush worker and we won't touch dirty lists in that case,
> otherwise we are safe to update current writeback list as needed. I'll work
> on fixing this as when I was reading the code I've noticed there are other
> quirks in the code as well. Thanks for the report!

Thanks! While looking at the code I also saw some other paths that
appeared to be racy, though I haven't worked them out in detail to
confirm that - the locking around the inode and writeback lists is
tricky. What's the best way to follow up on those? Happy to post them
to this same thread after I spend a bit more time looking at the code.

Thanks,
Martijn


>
> Honza
>
> --
> Jan Kara 
> SUSE Labs, CR


Writeback bug causing writeback stalls

2020-05-22 Thread Martijn Coenen
Hi,

We've been working on implementing a FUSE filesystem in Android, and have
run into what appears to be a bug in the kernel writeback code. The problem
that we observed is that an inode in the filesystem is on the b_dirty_time list,
but its i_state field does have I_DIRTY_PAGES set, which I think means that
the inode is on the wrong list. This condition doesn't appear to fix itself even
as new pages are dirtied, because __mark_inode_dirty() has this check:

if ((inode->i_state & flags) != flags) {

before considering moving the inode to another list. Since the inode already
has I_DIRTY_PAGES set, we're not going to move it to the dirty list. I *think*
the only way the inode gets out of this condition is whenever we handle the
b_dirty_time list, which can take a while.

The reason we even noticed this bug in the first place is that FUSE has a very
low wb max_ratio by default (1), and so at some point processes get stuck in
balance_dirty_pages_ratelimited(), waiting for pages to be written. They hold
the inode's write lock, and when other processes try to acquire it, they get
stuck. We have a watchdog that reboots the device after ~10 mins of a task
being stuck in D state, and this triggered regularly in some tests.

After careful studying of the kernel code, I found a reliable repro scenario
for this condition, which is described in more detail below. But essentially
what I think is happening is this:

__mark_inode_dirty() has an early return condition for when a sync is in
progress, where it updates the inode i_state but not the writeback list:

inode->i_state |= flags;

/*
* If the inode is being synced, just update its dirty state.
* The unlocker will place the inode on the appropriate
* superblock list, based upon its state.
*/
if (inode->i_state & I_SYNC)
goto out_unlock_inode;

now this comment is true for the generic flusher threads, which run
writeback_sb_inodes(), which calls requeue_inode() to move the inode back to the
correct wb list when the sync is done. However, there is another
function that uses
I_SYNC: writeback_single_inode(). This function has some comments saying it
prefers not to touch writeback lists, and in fact only removes it if the inode
is clean:

/*
* Skip inode if it is clean and we have no outstanding writeback in
* WB_SYNC_ALL mode. We don't want to mess with writeback lists in this
* function since flusher thread may be doing for example sync in
* parallel and if we move the inode, it could get skipped. So here we
* make sure inode is on some writeback list and leave it there unless
* we have completely cleaned the inode.
*/

writeback_single_inode() is called from a few functions, in particular
write_inode_now(). write_inode_now() is called by FUSE's flush() f_ops.

So, the sequence of events is something like this. Let's assume the inode is
already on b_dirty_time for valid reasons. Then:

CPU1  CPU2
fuse_flush()
  write_inode_now()
writeback_single_inode()
  sets I_SYNC
__writeback_single_inode()
  writes back data
  clears inode dirty flags
  unlocks inode
  calls mark_inode_dirty_sync()
sets I_DIRTY_SYNC, but doesn't
update wb list because I_SYNC is
still set
  write() // somebody else writes
  mark_inode_dirty(I_DIRTY_PAGES)
  sets I_DIRTY_PAGES on i_state
  doesn't update wb list,
  because I_SYNC set
  locks inode again
  sees inode is still dirty,
  doesn't touch WB list
  clears I_SYNC

So now we have an inode on b_dirty_time with I_DIRTY_PAGES | I_DIRTY_SYNC set,
and subsequent calls to mark_inode_dirty() with either I_DIRTY_PAGES or
I_DIRTY_SYNC will do nothing to change that. The flusher won't touch
the inode either,
because it's not on a b_dirty or b_io list.

The easiest way to fix this, I think, is to call requeue_inode() at the end of
writeback_single_inode(), much like it is called from writeback_sb_inodes().
However, requeue_inode() has the following ominous warning:

/*
 * Find proper writeback list for the inode depending on its current state and
 * possibly also change of its state while we were doing writeback.  Here we
 * handle things such as livelock prevention or fairness of writeback among
 * inodes. This function can be called only by flusher thread - noone else
 * processes all inodes in writeback lists and requeueing inodes behind flusher
 * thread's back can have unexpected consequences.
 */

Obviously this is very critical code both from a correctness and a performance
point of view, so I wanted to run this by the maintainers and folks who have
contributed to this code first.

The way I got to reproduce this 

Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl

2020-05-13 Thread Martijn Coenen
On Wed, May 13, 2020 at 12:22 PM Christoph Hellwig  wrote:
>
> On Wed, May 13, 2020 at 09:07:43AM +0200, Martijn Coenen wrote:
> > On Wed, May 13, 2020 at 4:30 AM Jens Axboe  wrote:
> > > > Looks acceptable to me, but I'm getting a failure applying it to
> > > > for-5.8/drivers on this patch:
> > > >
> > > > Applying: loop: Refactor loop_set_status() size calculation
> > > >
> > > > So you'll probably want to respin on the right branch.
> >
> > This series depends on a separate bugfix I sent to LKML earlier - see
> > https://lkml.org/lkml/2020/3/31/755 . I mentioned it in [00/10] of
> > this series, but perhaps I should have just included that patch.
> >
> > I just verified that patch + this series still applies cleanly on your
> > for-5.8/drivers tree, but if you prefer I send a v5 with that patch
> > going first let me know.
>
> You probably want to resend with the fix includes as the first patch.
> And drop the truncation check now that we figured out that we don't
> actually need it.

Just sent v5, thanks!

Martijn


[PATCH v5 01/11] loop: Call loop_config_discard() only after new config is applied

2020-05-13 Thread Martijn Coenen
loop_set_status() calls loop_config_discard() to configure discard for
the loop device; however, the discard configuration depends on whether
the loop device uses encryption, and when we call it the encryption
configuration has not been updated yet. Move the call down so we apply
the correct discard configuration based on the new configuration.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Bob Liu 
Reviewed-by: Bart Van Assche 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index da693e6a834e..f1754262fc94 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1334,8 +1334,6 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
}
}
 
-   loop_config_discard(lo);
-
memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE);
lo->lo_file_name[LO_NAME_SIZE-1] = 0;
@@ -1359,6 +1357,8 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
lo->lo_key_owner = uid;
}
 
+   loop_config_discard(lo);
+
/* update dio if lo_offset or transfer is changed */
__loop_update_dio(lo, lo->use_dio);
 
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH v5 02/11] loop: Remove sector_t truncation checks

2020-05-13 Thread Martijn Coenen
sector_t is now always u64, so we don't need to check for truncation.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f1754262fc94..00de7fec0ed5 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -228,24 +228,20 @@ static void __loop_update_dio(struct loop_device *lo, 
bool dio)
blk_mq_unfreeze_queue(lo->lo_queue);
 }
 
-static int
+static void
 figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit)
 {
loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
-   sector_t x = (sector_t)size;
struct block_device *bdev = lo->lo_device;
 
-   if (unlikely((loff_t)x != size))
-   return -EFBIG;
if (lo->lo_offset != offset)
lo->lo_offset = offset;
if (lo->lo_sizelimit != sizelimit)
lo->lo_sizelimit = sizelimit;
-   set_capacity(lo->lo_disk, x);
+   set_capacity(lo->lo_disk, size);
bd_set_size(bdev, (loff_t)get_capacity(bdev->bd_disk) << 9);
/* let user-space know about the new size */
kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
-   return 0;
 }
 
 static inline int
@@ -1003,10 +999,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
!file->f_op->write_iter)
lo_flags |= LO_FLAGS_READ_ONLY;
 
-   error = -EFBIG;
size = get_loop_size(lo, file);
-   if ((loff_t)(sector_t)size != size)
-   goto out_unlock;
+
error = loop_prepare_queue(lo);
if (error)
goto out_unlock;
@@ -1328,10 +1322,7 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
lo->lo_device->bd_inode->i_mapping->nrpages);
goto out_unfreeze;
}
-   if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit)) {
-   err = -EFBIG;
-   goto out_unfreeze;
-   }
+   figure_loop_size(lo, info->lo_offset, info->lo_sizelimit);
}
 
memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
@@ -1534,7 +1525,9 @@ static int loop_set_capacity(struct loop_device *lo)
if (unlikely(lo->lo_state != Lo_bound))
return -ENXIO;
 
-   return figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit);
+   figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit);
+
+   return 0;
 }
 
 static int loop_set_dio(struct loop_device *lo, unsigned long arg)
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH v5 11/11] loop: Add LOOP_CONFIGURE ioctl

2020-05-13 Thread Martijn Coenen
This allows userspace to completely setup a loop device with a single
ioctl, removing the in-between state where the device can be partially
configured - eg the loop device has a backing file associated with it,
but is reading from the wrong offset.

Besides removing the intermediate state, another big benefit of this
ioctl is that LOOP_SET_STATUS can be slow; the main reason for this
slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to
freeze the associated queue; this requires waiting for RCU
synchronization, which I've measured can take about 15-20ms on this
device on average.

In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can
also be used to:
- Set the correct block size immediately by setting
  loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE)
- Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO
  in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO)
- Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY
  in loop_config.info.lo_flags

Here's setting up ~70 regular loop devices with an offset on an x86
Android device, using LOOP_SET_FD and LOOP_SET_STATUS:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
0m03.40s real 0m00.02s user 0m00.03s system

Here's configuring ~70 devices in the same way, but using a modified
losetup that uses the new LOOP_CONFIGURE ioctl:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
0m01.94s real 0m00.01s user 0m00.01s system

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c  | 104 --
 include/uapi/linux/loop.h |  21 
 2 files changed, 97 insertions(+), 28 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 13518ba191f5..a565c5aafa52 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -228,6 +228,19 @@ static void __loop_update_dio(struct loop_device *lo, bool 
dio)
blk_mq_unfreeze_queue(lo->lo_queue);
 }
 
+/**
+ * loop_validate_block_size() - validates the passed in block size
+ * @bsize: size to validate
+ */
+static int
+loop_validate_block_size(unsigned short bsize)
+{
+   if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
+   return -EINVAL;
+
+   return 0;
+}
+
 /**
  * loop_set_size() - sets device size and notifies userspace
  * @lo: struct loop_device to set the size for
@@ -1050,23 +1063,24 @@ loop_set_status_from_info(struct loop_device *lo,
return 0;
 }
 
-static int loop_set_fd(struct loop_device *lo, fmode_t mode,
-  struct block_device *bdev, unsigned int arg)
+static int loop_configure(struct loop_device *lo, fmode_t mode,
+ struct block_device *bdev,
+ const struct loop_config *config)
 {
struct file *file;
struct inode*inode;
struct address_space *mapping;
struct block_device *claimed_bdev = NULL;
-   int lo_flags = 0;
int error;
loff_t  size;
boolpartscan;
+   unsigned short  bsize;
 
/* This is safe, since we have a reference from open(). */
__module_get(THIS_MODULE);
 
error = -EBADF;
-   file = fget(arg);
+   file = fget(config->fd);
if (!file)
goto out;
 
@@ -1075,7 +1089,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
 * here to avoid changing device under exclusive owner.
 */
if (!(mode & FMODE_EXCL)) {
-   claimed_bdev = bd_start_claiming(bdev, loop_set_fd);
+   claimed_bdev = bd_start_claiming(bdev, loop_configure);
if (IS_ERR(claimed_bdev)) {
error = PTR_ERR(claimed_bdev);
goto out_putf;
@@ -1097,11 +,26 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
mapping = file->f_mapping;
inode = mapping->host;
 
+   size = get_loop_size(lo, file);
+
+   if ((config->info.lo_flags & ~LOOP_CONFIGURE_SETTABLE_FLAGS) != 0) {
+   error = -EINVAL;
+   goto out_unlock;
+   }
+
+   if (config->block_size) {
+   error = loop_validate_block_size(config->block_size);
+   if (error)
+   goto out_unlock;
+   }
+
+   error = loop_set_status_from_info(lo, >info);
+   if (error)
+   goto out_unlock;
+
if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) ||
!file->f_op->write_iter)
-   lo_flags |= LO_FLAGS_READ_ONLY;
-
-   size = get_loop_size(lo, file);
+   lo->lo_flags |= LO_FLAGS_READ_ONLY;
 
error = loop_prepare_queue(lo);
if (er

[PATCH v5 10/11] loop: Clean up LOOP_SET_STATUS lo_flags handling

2020-05-13 Thread Martijn Coenen
LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in
particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas
LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this
explicit by updating the UAPI to include the flags that can be
set/cleared using this ioctl.

The implementation can then blindly take over the passed in flags,
and use the previous flags for those flags that can't be set / cleared
using LOOP_SET_STATUS.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c  | 19 +--
 include/uapi/linux/loop.h | 10 --
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 31f10da4945e..13518ba191f5 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1036,9 +1036,7 @@ loop_set_status_from_info(struct loop_device *lo,
lo->transfer = xfer->transfer;
lo->ioctl = xfer->ioctl;
 
-   if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
-(info->lo_flags & LO_FLAGS_AUTOCLEAR))
-   lo->lo_flags ^= LO_FLAGS_AUTOCLEAR;
+   lo->lo_flags = info->lo_flags;
 
lo->lo_encrypt_key_size = info->lo_encrypt_key_size;
lo->lo_init[0] = info->lo_init[0];
@@ -1323,6 +1321,7 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
int err;
struct block_device *bdev;
kuid_t uid = current_uid();
+   int prev_lo_flags;
bool partscan = false;
bool size_changed = false;
 
@@ -1359,10 +1358,19 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
goto out_unfreeze;
}
 
+   prev_lo_flags = lo->lo_flags;
+
err = loop_set_status_from_info(lo, info);
if (err)
goto out_unfreeze;
 
+   /* Mask out flags that can't be set using LOOP_SET_STATUS. */
+   lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS;
+   /* For those flags, use the previous values instead */
+   lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS;
+   /* For flags that can't be cleared, use previous values too */
+   lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_CLEARABLE_FLAGS;
+
if (size_changed) {
loff_t new_size = get_size(lo->lo_offset, lo->lo_sizelimit,
   lo->lo_backing_file);
@@ -1377,9 +1385,8 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
 out_unfreeze:
blk_mq_unfreeze_queue(lo->lo_queue);
 
-   if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
-!(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
-   lo->lo_flags |= LO_FLAGS_PARTSCAN;
+   if (!err && (lo->lo_flags & LO_FLAGS_PARTSCAN) &&
+!(prev_lo_flags & LO_FLAGS_PARTSCAN)) {
lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
bdev = lo->lo_device;
partscan = true;
diff --git a/include/uapi/linux/loop.h b/include/uapi/linux/loop.h
index 080a8df134ef..6b32fee80ce0 100644
--- a/include/uapi/linux/loop.h
+++ b/include/uapi/linux/loop.h
@@ -25,6 +25,12 @@ enum {
LO_FLAGS_DIRECT_IO  = 16,
 };
 
+/* LO_FLAGS that can be set using LOOP_SET_STATUS(64) */
+#define LOOP_SET_STATUS_SETTABLE_FLAGS (LO_FLAGS_AUTOCLEAR | LO_FLAGS_PARTSCAN)
+
+/* LO_FLAGS that can be cleared using LOOP_SET_STATUS(64) */
+#define LOOP_SET_STATUS_CLEARABLE_FLAGS (LO_FLAGS_AUTOCLEAR)
+
 #include/* for __kernel_old_dev_t */
 #include/* for __u64 */
 
@@ -37,7 +43,7 @@ struct loop_info {
intlo_offset;
intlo_encrypt_type;
intlo_encrypt_key_size; /* ioctl w/o */
-   intlo_flags;/* ioctl r/o */
+   intlo_flags;
char   lo_name[LO_NAME_SIZE];
unsigned char  lo_encrypt_key[LO_KEY_SIZE]; /* ioctl w/o */
unsigned long  lo_init[2];
@@ -53,7 +59,7 @@ struct loop_info64 {
__u32  lo_number;   /* ioctl r/o */
__u32  lo_encrypt_type;
__u32  lo_encrypt_key_size; /* ioctl w/o */
-   __u32  lo_flags;/* ioctl r/o */
+   __u32  lo_flags;
__u8   lo_file_name[LO_NAME_SIZE];
__u8   lo_crypt_name[LO_NAME_SIZE];
__u8   lo_encrypt_key[LO_KEY_SIZE]; /* ioctl w/o */
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH v5 06/11] loop: Remove figure_loop_size()

2020-05-13 Thread Martijn Coenen
This function was now only used by loop_set_capacity(). Just open code
the remaining code in the caller instead.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index c134b3439483..e281a9f03d96 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -245,14 +245,6 @@ static void loop_set_size(struct loop_device *lo, loff_t 
size)
set_capacity_revalidate_and_notify(lo->lo_disk, size, false);
 }
 
-static void
-figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit)
-{
-   loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
-
-   loop_set_size(lo, size);
-}
-
 static inline int
 lo_do_transfer(struct loop_device *lo, int cmd,
   struct page *rpage, unsigned roffs,
@@ -1534,10 +1526,13 @@ loop_get_status64(struct loop_device *lo, struct 
loop_info64 __user *arg) {
 
 static int loop_set_capacity(struct loop_device *lo)
 {
+   loff_t size;
+
if (unlikely(lo->lo_state != Lo_bound))
return -ENXIO;
 
-   figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit);
+   size = get_loop_size(lo, lo->lo_backing_file);
+   loop_set_size(lo, size);
 
return 0;
 }
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH v5 09/11] loop: Rework lo_ioctl() __user argument casting

2020-05-13 Thread Martijn Coenen
In preparation for a new ioctl that needs to copy_from_user(); makes the
code easier to read as well.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4dc11d954169..31f10da4945e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1634,6 +1634,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t 
mode,
unsigned int cmd, unsigned long arg)
 {
struct loop_device *lo = bdev->bd_disk->private_data;
+   void __user *argp = (void __user *) arg;
int err;
 
switch (cmd) {
@@ -1646,21 +1647,19 @@ static int lo_ioctl(struct block_device *bdev, fmode_t 
mode,
case LOOP_SET_STATUS:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
-   err = loop_set_status_old(lo,
-   (struct loop_info __user *)arg);
+   err = loop_set_status_old(lo, argp);
}
break;
case LOOP_GET_STATUS:
-   return loop_get_status_old(lo, (struct loop_info __user *) arg);
+   return loop_get_status_old(lo, argp);
case LOOP_SET_STATUS64:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
-   err = loop_set_status64(lo,
-   (struct loop_info64 __user *) arg);
+   err = loop_set_status64(lo, argp);
}
break;
case LOOP_GET_STATUS64:
-   return loop_get_status64(lo, (struct loop_info64 __user *) arg);
+   return loop_get_status64(lo, argp);
case LOOP_SET_CAPACITY:
case LOOP_SET_DIRECT_IO:
case LOOP_SET_BLOCK_SIZE:
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH v5 03/11] loop: Factor out setting loop device size

2020-05-13 Thread Martijn Coenen
This code is used repeatedly.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 00de7fec0ed5..e69ff3c19eff 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -228,20 +228,35 @@ static void __loop_update_dio(struct loop_device *lo, 
bool dio)
blk_mq_unfreeze_queue(lo->lo_queue);
 }
 
+/**
+ * loop_set_size() - sets device size and notifies userspace
+ * @lo: struct loop_device to set the size for
+ * @size: new size of the loop device
+ *
+ * Callers must validate that the size passed into this function fits into
+ * a sector_t, eg using loop_validate_size()
+ */
+static void loop_set_size(struct loop_device *lo, loff_t size)
+{
+   struct block_device *bdev = lo->lo_device;
+
+   set_capacity(lo->lo_disk, size);
+   bd_set_size(bdev, size << SECTOR_SHIFT);
+   /* let user-space know about the new size */
+   kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
+}
+
 static void
 figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit)
 {
loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
-   struct block_device *bdev = lo->lo_device;
 
if (lo->lo_offset != offset)
lo->lo_offset = offset;
if (lo->lo_sizelimit != sizelimit)
lo->lo_sizelimit = sizelimit;
-   set_capacity(lo->lo_disk, size);
-   bd_set_size(bdev, (loff_t)get_capacity(bdev->bd_disk) << 9);
-   /* let user-space know about the new size */
-   kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
+
+   loop_set_size(lo, size);
 }
 
 static inline int
@@ -1034,11 +1049,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
 
loop_update_rotational(lo);
loop_update_dio(lo);
-   set_capacity(lo->lo_disk, size);
-   bd_set_size(bdev, size << 9);
loop_sysfs_init(lo);
-   /* let user-space know about the new size */
-   kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
+   loop_set_size(lo, size);
 
set_blocksize(bdev, S_ISBLK(inode->i_mode) ?
  block_size(inode->i_bdev) : PAGE_SIZE);
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH v5 07/11] loop: Factor out configuring loop from status

2020-05-13 Thread Martijn Coenen
Factor out this code into a separate function, so it can be reused by
other code more easily.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 117 +--
 1 file changed, 67 insertions(+), 50 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index e281a9f03d96..6a4c0ba225ca 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1251,75 +1251,43 @@ static int loop_clr_fd(struct loop_device *lo)
return __loop_clr_fd(lo, false);
 }
 
+/**
+ * loop_set_status_from_info - configure device from loop_info
+ * @lo: struct loop_device to configure
+ * @info: struct loop_info64 to configure the device with
+ *
+ * Configures the loop device parameters according to the passed
+ * in loop_info64 configuration.
+ */
 static int
-loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
+loop_set_status_from_info(struct loop_device *lo,
+ const struct loop_info64 *info)
 {
int err;
struct loop_func_table *xfer;
kuid_t uid = current_uid();
-   struct block_device *bdev;
-   bool partscan = false;
-   bool size_changed = false;
-
-   err = mutex_lock_killable(_ctl_mutex);
-   if (err)
-   return err;
-   if (lo->lo_encrypt_key_size &&
-   !uid_eq(lo->lo_key_owner, uid) &&
-   !capable(CAP_SYS_ADMIN)) {
-   err = -EPERM;
-   goto out_unlock;
-   }
-   if (lo->lo_state != Lo_bound) {
-   err = -ENXIO;
-   goto out_unlock;
-   }
-   if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) {
-   err = -EINVAL;
-   goto out_unlock;
-   }
-
-   if (lo->lo_offset != info->lo_offset ||
-   lo->lo_sizelimit != info->lo_sizelimit) {
-   size_changed = true;
-   sync_blockdev(lo->lo_device);
-   kill_bdev(lo->lo_device);
-   }
 
-   /* I/O need to be drained during transfer transition */
-   blk_mq_freeze_queue(lo->lo_queue);
-
-   if (size_changed && lo->lo_device->bd_inode->i_mapping->nrpages) {
-   /* If any pages were dirtied after kill_bdev(), try again */
-   err = -EAGAIN;
-   pr_warn("%s: loop%d (%s) has still dirty pages (nrpages=%lu)\n",
-   __func__, lo->lo_number, lo->lo_file_name,
-   lo->lo_device->bd_inode->i_mapping->nrpages);
-   goto out_unfreeze;
-   }
+   if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE)
+   return -EINVAL;
 
err = loop_release_xfer(lo);
if (err)
-   goto out_unfreeze;
+   return err;
 
if (info->lo_encrypt_type) {
unsigned int type = info->lo_encrypt_type;
 
-   if (type >= MAX_LO_CRYPT) {
-   err = -EINVAL;
-   goto out_unfreeze;
-   }
+   if (type >= MAX_LO_CRYPT)
+   return -EINVAL;
xfer = xfer_funcs[type];
-   if (xfer == NULL) {
-   err = -EINVAL;
-   goto out_unfreeze;
-   }
+   if (xfer == NULL)
+   return -EINVAL;
} else
xfer = NULL;
 
err = loop_init_xfer(lo, xfer, info);
if (err)
-   goto out_unfreeze;
+   return err;
 
lo->lo_offset = info->lo_offset;
lo->lo_sizelimit = info->lo_sizelimit;
@@ -1346,6 +1314,55 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
lo->lo_key_owner = uid;
}
 
+   return 0;
+}
+
+static int
+loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
+{
+   int err;
+   struct block_device *bdev;
+   kuid_t uid = current_uid();
+   bool partscan = false;
+   bool size_changed = false;
+
+   err = mutex_lock_killable(_ctl_mutex);
+   if (err)
+   return err;
+   if (lo->lo_encrypt_key_size &&
+   !uid_eq(lo->lo_key_owner, uid) &&
+   !capable(CAP_SYS_ADMIN)) {
+   err = -EPERM;
+   goto out_unlock;
+   }
+   if (lo->lo_state != Lo_bound) {
+   err = -ENXIO;
+   goto out_unlock;
+   }
+
+   if (lo->lo_offset != info->lo_offset ||
+   lo->lo_sizelimit != info->lo_sizelimit) {
+   size_changed = true;
+   sync_blockdev(lo->lo_device);
+   kill_bdev(lo->lo_device);
+   }
+
+   /* I/O need to be drained during transfer transition */
+   blk_mq_freeze_queue(lo->lo_queue);
+
+   if (size_changed &a

[PATCH v5 04/11] loop: Switch to set_capacity_revalidate_and_notify()

2020-05-13 Thread Martijn Coenen
This was recently added to block/genhd.c, and takes care of both
updating the capacity and notifying userspace of the new size.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index e69ff3c19eff..d9a756f8abd5 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -240,10 +240,9 @@ static void loop_set_size(struct loop_device *lo, loff_t 
size)
 {
struct block_device *bdev = lo->lo_device;
 
-   set_capacity(lo->lo_disk, size);
bd_set_size(bdev, size << SECTOR_SHIFT);
-   /* let user-space know about the new size */
-   kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
+
+   set_capacity_revalidate_and_notify(lo->lo_disk, size, false);
 }
 
 static void
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH v5 08/11] loop: Move loop_set_status_from_info() and friends up

2020-05-13 Thread Martijn Coenen
So we can use it without forward declaration. This is a separate commit
to make it easier to verify that this is just a move, without functional
modifications.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 206 +--
 1 file changed, 103 insertions(+), 103 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 6a4c0ba225ca..4dc11d954169 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -949,6 +949,109 @@ static void loop_update_rotational(struct loop_device *lo)
blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
 }
 
+static int
+loop_release_xfer(struct loop_device *lo)
+{
+   int err = 0;
+   struct loop_func_table *xfer = lo->lo_encryption;
+
+   if (xfer) {
+   if (xfer->release)
+   err = xfer->release(lo);
+   lo->transfer = NULL;
+   lo->lo_encryption = NULL;
+   module_put(xfer->owner);
+   }
+   return err;
+}
+
+static int
+loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer,
+  const struct loop_info64 *i)
+{
+   int err = 0;
+
+   if (xfer) {
+   struct module *owner = xfer->owner;
+
+   if (!try_module_get(owner))
+   return -EINVAL;
+   if (xfer->init)
+   err = xfer->init(lo, i);
+   if (err)
+   module_put(owner);
+   else
+   lo->lo_encryption = xfer;
+   }
+   return err;
+}
+
+/**
+ * loop_set_status_from_info - configure device from loop_info
+ * @lo: struct loop_device to configure
+ * @info: struct loop_info64 to configure the device with
+ *
+ * Configures the loop device parameters according to the passed
+ * in loop_info64 configuration.
+ */
+static int
+loop_set_status_from_info(struct loop_device *lo,
+ const struct loop_info64 *info)
+{
+   int err;
+   struct loop_func_table *xfer;
+   kuid_t uid = current_uid();
+
+   if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE)
+   return -EINVAL;
+
+   err = loop_release_xfer(lo);
+   if (err)
+   return err;
+
+   if (info->lo_encrypt_type) {
+   unsigned int type = info->lo_encrypt_type;
+
+   if (type >= MAX_LO_CRYPT)
+   return -EINVAL;
+   xfer = xfer_funcs[type];
+   if (xfer == NULL)
+   return -EINVAL;
+   } else
+   xfer = NULL;
+
+   err = loop_init_xfer(lo, xfer, info);
+   if (err)
+   return err;
+
+   lo->lo_offset = info->lo_offset;
+   lo->lo_sizelimit = info->lo_sizelimit;
+   memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
+   memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE);
+   lo->lo_file_name[LO_NAME_SIZE-1] = 0;
+   lo->lo_crypt_name[LO_NAME_SIZE-1] = 0;
+
+   if (!xfer)
+   xfer = _funcs;
+   lo->transfer = xfer->transfer;
+   lo->ioctl = xfer->ioctl;
+
+   if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
+(info->lo_flags & LO_FLAGS_AUTOCLEAR))
+   lo->lo_flags ^= LO_FLAGS_AUTOCLEAR;
+
+   lo->lo_encrypt_key_size = info->lo_encrypt_key_size;
+   lo->lo_init[0] = info->lo_init[0];
+   lo->lo_init[1] = info->lo_init[1];
+   if (info->lo_encrypt_key_size) {
+   memcpy(lo->lo_encrypt_key, info->lo_encrypt_key,
+  info->lo_encrypt_key_size);
+   lo->lo_key_owner = uid;
+   }
+
+   return 0;
+}
+
 static int loop_set_fd(struct loop_device *lo, fmode_t mode,
   struct block_device *bdev, unsigned int arg)
 {
@@ -1070,43 +1173,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
return error;
 }
 
-static int
-loop_release_xfer(struct loop_device *lo)
-{
-   int err = 0;
-   struct loop_func_table *xfer = lo->lo_encryption;
-
-   if (xfer) {
-   if (xfer->release)
-   err = xfer->release(lo);
-   lo->transfer = NULL;
-   lo->lo_encryption = NULL;
-   module_put(xfer->owner);
-   }
-   return err;
-}
-
-static int
-loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer,
-  const struct loop_info64 *i)
-{
-   int err = 0;
-
-   if (xfer) {
-   struct module *owner = xfer->owner;
-
-   if (!try_module_get(owner))
-   return -EINVAL;
-   if (xfer->init)
-   err = xfer->init(lo, i);
-   if (err)
-   module_put(owner);
-   

[PATCH v5 00/11] Add a new LOOP_CONFIGURE ioctl

2020-05-13 Thread Martijn Coenen
This series introduces a new ioctl that makes it possible to atomically
configure a loop device. Previously, if you wanted to set parameters
such as the offset on a loop device, this required calling LOOP_SET_FD
to set the backing file, and then LOOP_SET_STATUS to set the offset.
However, in between these two calls, the loop device is available and
would accept requests, which is generally not desirable. Similar issues
exist around setting the block size (LOOP_SET_BLOCK_SIZE) and requesting
direct I/O mode (LOOP_SET_DIRECT_IO).

There are also performance benefits with combining these ioctls into
one, which are described in more detail in the last change in the
series.

---
v5:
  - Added "loop: Call loop_config_discard() ..." as a first patch to
this series, as the rest of the refactoring done depends on it.
  - Removed sector_t truncation checks, as suggested by Ming Lei.

v4:
  - Addressed review comments from Christoph Hellwig:
-- Minor code cleanups
-- Clarified what lo_flags LOOP_SET_STATUS can set and clear, and
   made that more explicit in the code (see [10/11])
-- LOOP_CONFIGURE can now also be used to configure the block size
   and to explicitly request Direct I/O and read-only mode.
-- Explicitly reject lo_flags we don't know about in LOOP_CONFIGURE
-- Renamed LOOP_SET_FD_AND_STATUS to LOOP_CONFIGURE, since the ioctl
   can now do things LOOP_SET_STATUS couldn't do.
v3:
  - Addressed review comments from Christoph Hellwig:
-- Factored out loop_validate_size()
-- Split up the largish first patch in a few smaller ones
-- Use set_capacity_revalidate_and_notify()
  - Fixed a variable wrongly using size_t instead of loff_t
v2:
  - Addressed review comments from Bart van Assche:
-- Use SECTOR_SHIFT constant
-- Renamed loop_set_from_status() to loop_set_status_from_info()
-- Added kerneldoc for loop_set_status_from_info()
-- Removed dots in patch subject lines
  - Addressed review comments from Christoph Hellwig:
-- Added missing padding in struct loop_fd_and_status
-- Cleaned up some __user pointer handling in lo_ioctl
-- Pass in a stack-initialized loop_info64 for the legacy
   LOOP_SET_FD case

Martijn Coenen (11):
  loop: Call loop_config_discard() only after new config is applied
  loop: Remove sector_t truncation checks
  loop: Factor out setting loop device size
  loop: Switch to set_capacity_revalidate_and_notify()
  loop: Refactor loop_set_status() size calculation
  loop: Remove figure_loop_size()
  loop: Factor out configuring loop from status
  loop: Move loop_set_status_from_info() and friends up
  loop: Rework lo_ioctl() __user argument casting
  loop: Clean up LOOP_SET_STATUS lo_flags handling
  loop: Add LOOP_CONFIGURE ioctl

 drivers/block/loop.c  | 381 ++
 include/uapi/linux/loop.h |  31 +++-
 2 files changed, 255 insertions(+), 157 deletions(-)

-- 
2.26.2.645.ge9eca65c58-goog



[PATCH v5 05/11] loop: Refactor loop_set_status() size calculation

2020-05-13 Thread Martijn Coenen
figure_loop_size() calculates the loop size based on the passed in
parameters, but at the same time it updates the offset and sizelimit
parameters in the loop device configuration. That is a somewhat
unexpected side effect of a function with this name, and it is only only
needed by one of the two callers of this function - loop_set_status().

Move the lo_offset and lo_sizelimit assignment back into loop_set_status(),
and use the newly factored out functions to validate and apply the newly
calculated size. This allows us to get rid of figure_loop_size() in a
follow-up commit.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d9a756f8abd5..c134b3439483 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -250,11 +250,6 @@ figure_loop_size(struct loop_device *lo, loff_t offset, 
loff_t sizelimit)
 {
loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
 
-   if (lo->lo_offset != offset)
-   lo->lo_offset = offset;
-   if (lo->lo_sizelimit != sizelimit)
-   lo->lo_sizelimit = sizelimit;
-
loop_set_size(lo, size);
 }
 
@@ -1272,6 +1267,7 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
kuid_t uid = current_uid();
struct block_device *bdev;
bool partscan = false;
+   bool size_changed = false;
 
err = mutex_lock_killable(_ctl_mutex);
if (err)
@@ -1293,6 +1289,7 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
 
if (lo->lo_offset != info->lo_offset ||
lo->lo_sizelimit != info->lo_sizelimit) {
+   size_changed = true;
sync_blockdev(lo->lo_device);
kill_bdev(lo->lo_device);
}
@@ -1300,6 +1297,15 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
/* I/O need to be drained during transfer transition */
blk_mq_freeze_queue(lo->lo_queue);
 
+   if (size_changed && lo->lo_device->bd_inode->i_mapping->nrpages) {
+   /* If any pages were dirtied after kill_bdev(), try again */
+   err = -EAGAIN;
+   pr_warn("%s: loop%d (%s) has still dirty pages (nrpages=%lu)\n",
+   __func__, lo->lo_number, lo->lo_file_name,
+   lo->lo_device->bd_inode->i_mapping->nrpages);
+   goto out_unfreeze;
+   }
+
err = loop_release_xfer(lo);
if (err)
goto out_unfreeze;
@@ -1323,19 +1329,8 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
if (err)
goto out_unfreeze;
 
-   if (lo->lo_offset != info->lo_offset ||
-   lo->lo_sizelimit != info->lo_sizelimit) {
-   /* kill_bdev should have truncated all the pages */
-   if (lo->lo_device->bd_inode->i_mapping->nrpages) {
-   err = -EAGAIN;
-   pr_warn("%s: loop%d (%s) has still dirty pages 
(nrpages=%lu)\n",
-   __func__, lo->lo_number, lo->lo_file_name,
-   lo->lo_device->bd_inode->i_mapping->nrpages);
-   goto out_unfreeze;
-   }
-   figure_loop_size(lo, info->lo_offset, info->lo_sizelimit);
-   }
-
+   lo->lo_offset = info->lo_offset;
+   lo->lo_sizelimit = info->lo_sizelimit;
memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE);
lo->lo_file_name[LO_NAME_SIZE-1] = 0;
@@ -1359,6 +1354,12 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
lo->lo_key_owner = uid;
}
 
+   if (size_changed) {
+   loff_t new_size = get_size(lo->lo_offset, lo->lo_sizelimit,
+  lo->lo_backing_file);
+   loop_set_size(lo, new_size);
+   }
+
loop_config_discard(lo);
 
/* update dio if lo_offset or transfer is changed */
-- 
2.26.2.645.ge9eca65c58-goog



Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl

2020-05-13 Thread Martijn Coenen
On Wed, May 13, 2020 at 4:30 AM Jens Axboe  wrote:
> > Looks acceptable to me, but I'm getting a failure applying it to
> > for-5.8/drivers on this patch:
> >
> > Applying: loop: Refactor loop_set_status() size calculation
> >
> > So you'll probably want to respin on the right branch.

This series depends on a separate bugfix I sent to LKML earlier - see
https://lkml.org/lkml/2020/3/31/755 . I mentioned it in [00/10] of
this series, but perhaps I should have just included that patch.

I just verified that patch + this series still applies cleanly on your
for-5.8/drivers tree, but if you prefer I send a v5 with that patch
going first let me know.

Thanks,
Martijn



>
> Then you can also drop patch #1.
>
> --
> Jens Axboe
>


Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl

2020-05-12 Thread Martijn Coenen
Hi Jens,

What do you think of this series?

Thanks,
Martijn

On Wed, Apr 29, 2020 at 4:03 PM Martijn Coenen  wrote:
>
> This allows userspace to completely setup a loop device with a single
> ioctl, removing the in-between state where the device can be partially
> configured - eg the loop device has a backing file associated with it,
> but is reading from the wrong offset.
>
> Besides removing the intermediate state, another big benefit of this
> ioctl is that LOOP_SET_STATUS can be slow; the main reason for this
> slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to
> freeze the associated queue; this requires waiting for RCU
> synchronization, which I've measured can take about 15-20ms on this
> device on average.
>
> In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can
> also be used to:
> - Set the correct block size immediately by setting
>   loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE)
> - Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO
>   in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO)
> - Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY
>   in loop_config.info.lo_flags
>
> Here's setting up ~70 regular loop devices with an offset on an x86
> Android device, using LOOP_SET_FD and LOOP_SET_STATUS:
>
> vsoc_x86:/system/apex # time for i in `seq 30 100`;
> do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
> 0m03.40s real 0m00.02s user 0m00.03s system
>
> Here's configuring ~70 devices in the same way, but using a modified
> losetup that uses the new LOOP_CONFIGURE ioctl:
>
> vsoc_x86:/system/apex # time for i in `seq 30 100`;
> do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
> 0m01.94s real 0m00.01s user 0m00.01s system
>
> Signed-off-by: Martijn Coenen 
> ---
>  drivers/block/loop.c  | 107 +++---
>  include/uapi/linux/loop.h |  21 
>  2 files changed, 99 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index cfbdd99fdb1a..a353ce55fd18 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -241,6 +241,19 @@ loop_validate_size(loff_t size)
> return 0;
>  }
>
> +/**
> + * loop_validate_block_size() - validates the passed in block size
> + * @bsize: size to validate
> + */
> +static int
> +loop_validate_block_size(unsigned short bsize)
> +{
> +   if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
> +   return -EINVAL;
> +
> +   return 0;
> +}
> +
>  /**
>   * loop_set_size() - sets device size and notifies userspace
>   * @lo: struct loop_device to set the size for
> @@ -1063,23 +1076,24 @@ loop_set_status_from_info(struct loop_device *lo,
> return 0;
>  }
>
> -static int loop_set_fd(struct loop_device *lo, fmode_t mode,
> -  struct block_device *bdev, unsigned int arg)
> +static int loop_configure(struct loop_device *lo, fmode_t mode,
> + struct block_device *bdev,
> + const struct loop_config *config)
>  {
> struct file *file;
> struct inode*inode;
> struct address_space *mapping;
> struct block_device *claimed_bdev = NULL;
> -   int lo_flags = 0;
> int error;
> loff_t  size;
> boolpartscan;
> +   unsigned short  bsize;
>
> /* This is safe, since we have a reference from open(). */
> __module_get(THIS_MODULE);
>
> error = -EBADF;
> -   file = fget(arg);
> +   file = fget(config->fd);
> if (!file)
> goto out;
>
> @@ -1088,7 +1102,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
> mode,
>  * here to avoid changing device under exclusive owner.
>  */
> if (!(mode & FMODE_EXCL)) {
> -   claimed_bdev = bd_start_claiming(bdev, loop_set_fd);
> +   claimed_bdev = bd_start_claiming(bdev, loop_configure);
> if (IS_ERR(claimed_bdev)) {
> error = PTR_ERR(claimed_bdev);
> goto out_putf;
> @@ -1110,44 +1124,58 @@ static int loop_set_fd(struct loop_device *lo, 
> fmode_t mode,
> mapping = file->f_mapping;
> inode = mapping->host;
>
> -   if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) ||
> -   !file->f_op->write_iter)
> -   lo_flags |= LO_FLAGS_READ_ONLY;
> -
> size = get_loop_size(lo, file);
> error = lo

Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl

2020-05-06 Thread Martijn Coenen
On Wed, May 6, 2020 at 11:44 AM Michael Kerrisk (man-pages) >
> Can we have also a patch for the loop.4 manual page please?

Ack, will do when the series lands.

Best,
Martijn

>
> Thanks,
>
> Michael
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/


Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl

2020-05-06 Thread Martijn Coenen
On Wed, May 6, 2020 at 8:09 AM Christoph Hellwig  wrote:
>
> Thanks, this looks very nice!

Thanks! And thanks for all the feedback, it has made this series a lot cleaner.

>
> Reviewed-by: Christoph Hellwig 
>
> Are you also going to submit a patch to util-linux to use the new
> ioctl?

Yeah, I'm happy to - will do it as soon as the series lands.

Best,
Martijn


Re: [PATCH] loop: Call loop_config_discard() only after new config is applied.

2020-05-04 Thread Martijn Coenen
Hi Jens,

Are you ok with this one? One of my later series depends on it, but so
far I've kept it separate because it's a bug fix.

Thanks,
Martijn

On Sat, Apr 18, 2020 at 5:46 PM Bart Van Assche  wrote:
>
> On 2020-03-31 04:41, Martijn Coenen wrote:
> > loop_set_status() calls loop_config_discard() to configure discard for
> > the loop device; however, the discard configuration depends on whether
> > the loop device uses encryption, and when we call it the encryption
> > configuration has not been updated yet. Move the call down so we apply
> > the correct discard configuration based on the new configuration.
>
> Reviewed-by: Bart Van Assche 


Re: [PATCH v4 04/10] loop: Refactor loop_set_status() size calculation

2020-05-01 Thread Martijn Coenen
On Fri, May 1, 2020 at 7:30 PM Christoph Hellwig  wrote:
>
> For some reason this fails to apply for me against both
> Jens' for-5.8/block and Linus' current tree.
>
> What is the baseline for this series?

This was based on Linus' tree from a week or two ago or so, but I
think you're most likely missing this one?

https://lkml.org/lkml/2020/3/31/755

(I mentioned it in the cover letter, but can make it a part of this
series if you prefer).


Re: [PATCH v4 01/10] loop: Factor out loop size validation

2020-05-01 Thread Martijn Coenen
Hi Ming,

On Wed, Apr 29, 2020 at 4:12 PM Ming Lei  wrote:
> Now sector_t has been switched to u64 unconditionally, do we still need such
> validation?

I think you're right; I hadn't seen that change, but truncating
because of sector_t shouldn't be an issue anymore. I wondered if we
could actually have a smaller loff_t, but looks like that is 'long
long', which should always be 8 bytes as well. I might send this as a
separate patch, I don't want to drag this series on for too long.

Thanks,
Martijn

>
>
> Thanks,
> Ming
>
>


Re: [PATCH v3 0/9] Add a new LOOP_SET_FD_AND_STATUS ioctl

2020-04-29 Thread Martijn Coenen
On Tue, Apr 28, 2020 at 4:57 PM Martijn Coenen  wrote:
> and it allows requesting a partition scan. It makes sense to maintain
> that behavior, but what about LO_FLAGS_DIRECT_IO? I think you're
> proposing LOOP_SET_STATUS(64) should keep ignoring that like it used
> to?

I've just sent a v4 which basically implements that and your other suggestions.

Thanks,
Martijn

>
> Thanks,
> Martijn
>
> > and then in the main function reject anything not known.
> >
> > And then maybe add something like 64 bytes of padding to the end of the
> > new structure, so that we can use flags to expand to it.


[PATCH v4 01/10] loop: Factor out loop size validation

2020-04-29 Thread Martijn Coenen
Ensuring we don't truncate loff_t when casting to sector_t is done in
multiple places; factor it out.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f1754262fc94..396b8bd4d75c 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -228,15 +228,30 @@ static void __loop_update_dio(struct loop_device *lo, 
bool dio)
blk_mq_unfreeze_queue(lo->lo_queue);
 }
 
+/**
+ * loop_validate_size() - validates that the passed in size fits in a sector_t
+ * @size: size to validate
+ */
+static int
+loop_validate_size(loff_t size)
+{
+   if ((loff_t)(sector_t)size != size)
+   return -EFBIG;
+
+   return 0;
+}
+
 static int
 figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit)
 {
+   int err;
loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
-   sector_t x = (sector_t)size;
struct block_device *bdev = lo->lo_device;
 
-   if (unlikely((loff_t)x != size))
-   return -EFBIG;
+   err = loop_validate_size(size);
+   if (err)
+   return err;
+
if (lo->lo_offset != offset)
lo->lo_offset = offset;
if (lo->lo_sizelimit != sizelimit)
@@ -1003,9 +1018,9 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
!file->f_op->write_iter)
lo_flags |= LO_FLAGS_READ_ONLY;
 
-   error = -EFBIG;
size = get_loop_size(lo, file);
-   if ((loff_t)(sector_t)size != size)
+   error = loop_validate_size(size);
+   if (error)
goto out_unlock;
error = loop_prepare_queue(lo);
if (error)
-- 
2.26.2.303.gf8c07b1a785-goog



[PATCH v4 06/10] loop: Factor out configuring loop from status

2020-04-29 Thread Martijn Coenen
Factor out this code into a separate function, so it can be reused by
other code more easily.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 117 +--
 1 file changed, 67 insertions(+), 50 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 32755e874326..4a36a3f47503 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1266,13 +1266,78 @@ static int loop_clr_fd(struct loop_device *lo)
return __loop_clr_fd(lo, false);
 }
 
+/**
+ * loop_set_status_from_info - configure device from loop_info
+ * @lo: struct loop_device to configure
+ * @info: struct loop_info64 to configure the device with
+ *
+ * Configures the loop device parameters according to the passed
+ * in loop_info64 configuration.
+ */
 static int
-loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
+loop_set_status_from_info(struct loop_device *lo,
+ const struct loop_info64 *info)
 {
int err;
struct loop_func_table *xfer;
kuid_t uid = current_uid();
+
+   if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE)
+   return -EINVAL;
+
+   err = loop_release_xfer(lo);
+   if (err)
+   return err;
+
+   if (info->lo_encrypt_type) {
+   unsigned int type = info->lo_encrypt_type;
+
+   if (type >= MAX_LO_CRYPT)
+   return -EINVAL;
+   xfer = xfer_funcs[type];
+   if (xfer == NULL)
+   return -EINVAL;
+   } else
+   xfer = NULL;
+
+   err = loop_init_xfer(lo, xfer, info);
+   if (err)
+   return err;
+
+   lo->lo_offset = info->lo_offset;
+   lo->lo_sizelimit = info->lo_sizelimit;
+   memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
+   memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE);
+   lo->lo_file_name[LO_NAME_SIZE-1] = 0;
+   lo->lo_crypt_name[LO_NAME_SIZE-1] = 0;
+
+   if (!xfer)
+   xfer = _funcs;
+   lo->transfer = xfer->transfer;
+   lo->ioctl = xfer->ioctl;
+
+   if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
+(info->lo_flags & LO_FLAGS_AUTOCLEAR))
+   lo->lo_flags ^= LO_FLAGS_AUTOCLEAR;
+
+   lo->lo_encrypt_key_size = info->lo_encrypt_key_size;
+   lo->lo_init[0] = info->lo_init[0];
+   lo->lo_init[1] = info->lo_init[1];
+   if (info->lo_encrypt_key_size) {
+   memcpy(lo->lo_encrypt_key, info->lo_encrypt_key,
+  info->lo_encrypt_key_size);
+   lo->lo_key_owner = uid;
+   }
+
+   return 0;
+}
+
+static int
+loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
+{
+   int err;
struct block_device *bdev;
+   kuid_t uid = current_uid();
bool partscan = false;
bool size_changed = false;
loff_t validated_size;
@@ -1290,10 +1355,6 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
err = -ENXIO;
goto out_unlock;
}
-   if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) {
-   err = -EINVAL;
-   goto out_unlock;
-   }
 
if (lo->lo_offset != info->lo_offset ||
lo->lo_sizelimit != info->lo_sizelimit) {
@@ -1320,54 +1381,10 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
goto out_unfreeze;
}
 
-   err = loop_release_xfer(lo);
+   err = loop_set_status_from_info(lo, info);
if (err)
goto out_unfreeze;
 
-   if (info->lo_encrypt_type) {
-   unsigned int type = info->lo_encrypt_type;
-
-   if (type >= MAX_LO_CRYPT) {
-   err = -EINVAL;
-   goto out_unfreeze;
-   }
-   xfer = xfer_funcs[type];
-   if (xfer == NULL) {
-   err = -EINVAL;
-   goto out_unfreeze;
-   }
-   } else
-   xfer = NULL;
-
-   err = loop_init_xfer(lo, xfer, info);
-   if (err)
-   goto out_unfreeze;
-
-   lo->lo_offset = info->lo_offset;
-   lo->lo_sizelimit = info->lo_sizelimit;
-   memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
-   memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE);
-   lo->lo_file_name[LO_NAME_SIZE-1] = 0;
-   lo->lo_crypt_name[LO_NAME_SIZE-1] = 0;
-
-   if (!xfer)
-   xfer = _funcs;
-   lo->transfer = xfer->transfer;
-   lo->ioctl = xfer->ioctl;
-
-   if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
-(info->lo_flags & LO_FLAGS_AUTOCLEAR))
-   lo->lo_f

[PATCH v4 02/10] loop: Factor out setting loop device size

2020-04-29 Thread Martijn Coenen
This code is used repeatedly.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 396b8bd4d75c..6643e48ad71c 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -241,12 +241,29 @@ loop_validate_size(loff_t size)
return 0;
 }
 
+/**
+ * loop_set_size() - sets device size and notifies userspace
+ * @lo: struct loop_device to set the size for
+ * @size: new size of the loop device
+ *
+ * Callers must validate that the size passed into this function fits into
+ * a sector_t, eg using loop_validate_size()
+ */
+static void loop_set_size(struct loop_device *lo, loff_t size)
+{
+   struct block_device *bdev = lo->lo_device;
+
+   set_capacity(lo->lo_disk, size);
+   bd_set_size(bdev, size << SECTOR_SHIFT);
+   /* let user-space know about the new size */
+   kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
+}
+
 static int
 figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit)
 {
int err;
loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
-   struct block_device *bdev = lo->lo_device;
 
err = loop_validate_size(size);
if (err)
@@ -256,10 +273,9 @@ figure_loop_size(struct loop_device *lo, loff_t offset, 
loff_t sizelimit)
lo->lo_offset = offset;
if (lo->lo_sizelimit != sizelimit)
lo->lo_sizelimit = sizelimit;
-   set_capacity(lo->lo_disk, x);
-   bd_set_size(bdev, (loff_t)get_capacity(bdev->bd_disk) << 9);
-   /* let user-space know about the new size */
-   kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
+
+   loop_set_size(lo, size);
+
return 0;
 }
 
@@ -1055,11 +1071,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
 
loop_update_rotational(lo);
loop_update_dio(lo);
-   set_capacity(lo->lo_disk, size);
-   bd_set_size(bdev, size << 9);
loop_sysfs_init(lo);
-   /* let user-space know about the new size */
-   kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
+   loop_set_size(lo, size);
 
set_blocksize(bdev, S_ISBLK(inode->i_mode) ?
  block_size(inode->i_bdev) : PAGE_SIZE);
-- 
2.26.2.303.gf8c07b1a785-goog



[PATCH v4 05/10] loop: Remove figure_loop_size()

2020-04-29 Thread Martijn Coenen
This function was now only used by loop_set_capacity(). Just open code
the remaining code in the caller instead.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 9f5913879921..32755e874326 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -258,21 +258,6 @@ static void loop_set_size(struct loop_device *lo, loff_t 
size)
set_capacity_revalidate_and_notify(lo->lo_disk, size, false);
 }
 
-static int
-figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit)
-{
-   int err;
-   loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
-
-   err = loop_validate_size(size);
-   if (err)
-   return err;
-
-   loop_set_size(lo, size);
-
-   return 0;
-}
-
 static inline int
 lo_do_transfer(struct loop_device *lo, int cmd,
   struct page *rpage, unsigned roffs,
@@ -1560,10 +1545,21 @@ loop_get_status64(struct loop_device *lo, struct 
loop_info64 __user *arg) {
 
 static int loop_set_capacity(struct loop_device *lo)
 {
+   int err;
+   loff_t size;
+
if (unlikely(lo->lo_state != Lo_bound))
return -ENXIO;
 
-   return figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit);
+   size = get_loop_size(lo, lo->lo_backing_file);
+
+   err = loop_validate_size(size);
+   if (err)
+   return err;
+
+   loop_set_size(lo, size);
+
+   return 0;
 }
 
 static int loop_set_dio(struct loop_device *lo, unsigned long arg)
-- 
2.26.2.303.gf8c07b1a785-goog



[PATCH v4 09/10] loop: Clean up LOOP_SET_STATUS lo_flags handling.

2020-04-29 Thread Martijn Coenen
LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in
particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas
LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this
explicit by updating the UAPI to include the flags that can be
set/cleared using this ioctl.

The implementation can then blindly take over the passed in flags,
and use the previous flags for those flags that can't be set / cleared
using LOOP_SET_STATUS.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c  | 19 +--
 include/uapi/linux/loop.h | 10 --
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f172a64333aa..cfbdd99fdb1a 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1049,9 +1049,7 @@ loop_set_status_from_info(struct loop_device *lo,
lo->transfer = xfer->transfer;
lo->ioctl = xfer->ioctl;
 
-   if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
-(info->lo_flags & LO_FLAGS_AUTOCLEAR))
-   lo->lo_flags ^= LO_FLAGS_AUTOCLEAR;
+   lo->lo_flags = info->lo_flags;
 
lo->lo_encrypt_key_size = info->lo_encrypt_key_size;
lo->lo_init[0] = info->lo_init[0];
@@ -1338,6 +1336,7 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
int err;
struct block_device *bdev;
kuid_t uid = current_uid();
+   int prev_lo_flags;
bool partscan = false;
bool size_changed = false;
loff_t validated_size;
@@ -1381,10 +1380,19 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
goto out_unfreeze;
}
 
+   prev_lo_flags = lo->lo_flags;
+
err = loop_set_status_from_info(lo, info);
if (err)
goto out_unfreeze;
 
+   /* Mask out flags that can't be set using LOOP_SET_STATUS. */
+   lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS;
+   /* For those flags, use the previous values instead */
+   lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS;
+   /* For flags that can't be cleared, use previous values too */
+   lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_CLEARABLE_FLAGS;
+
if (size_changed)
loop_set_size(lo, validated_size);
 
@@ -1396,9 +1404,8 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
 out_unfreeze:
blk_mq_unfreeze_queue(lo->lo_queue);
 
-   if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
-!(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
-   lo->lo_flags |= LO_FLAGS_PARTSCAN;
+   if (!err && (lo->lo_flags & LO_FLAGS_PARTSCAN) &&
+!(prev_lo_flags & LO_FLAGS_PARTSCAN)) {
lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
bdev = lo->lo_device;
partscan = true;
diff --git a/include/uapi/linux/loop.h b/include/uapi/linux/loop.h
index 080a8df134ef..6b32fee80ce0 100644
--- a/include/uapi/linux/loop.h
+++ b/include/uapi/linux/loop.h
@@ -25,6 +25,12 @@ enum {
LO_FLAGS_DIRECT_IO  = 16,
 };
 
+/* LO_FLAGS that can be set using LOOP_SET_STATUS(64) */
+#define LOOP_SET_STATUS_SETTABLE_FLAGS (LO_FLAGS_AUTOCLEAR | LO_FLAGS_PARTSCAN)
+
+/* LO_FLAGS that can be cleared using LOOP_SET_STATUS(64) */
+#define LOOP_SET_STATUS_CLEARABLE_FLAGS (LO_FLAGS_AUTOCLEAR)
+
 #include/* for __kernel_old_dev_t */
 #include/* for __u64 */
 
@@ -37,7 +43,7 @@ struct loop_info {
intlo_offset;
intlo_encrypt_type;
intlo_encrypt_key_size; /* ioctl w/o */
-   intlo_flags;/* ioctl r/o */
+   intlo_flags;
char   lo_name[LO_NAME_SIZE];
unsigned char  lo_encrypt_key[LO_KEY_SIZE]; /* ioctl w/o */
unsigned long  lo_init[2];
@@ -53,7 +59,7 @@ struct loop_info64 {
__u32  lo_number;   /* ioctl r/o */
__u32  lo_encrypt_type;
__u32  lo_encrypt_key_size; /* ioctl w/o */
-   __u32  lo_flags;/* ioctl r/o */
+   __u32  lo_flags;
__u8   lo_file_name[LO_NAME_SIZE];
__u8   lo_crypt_name[LO_NAME_SIZE];
__u8   lo_encrypt_key[LO_KEY_SIZE]; /* ioctl w/o */
-- 
2.26.2.303.gf8c07b1a785-goog



[PATCH v4 08/10] loop: Rework lo_ioctl() __user argument casting

2020-04-29 Thread Martijn Coenen
In preparation for a new ioctl that needs to copy_from_user(); makes the
code easier to read as well.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 60ba1ed95d77..f172a64333aa 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1659,6 +1659,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t 
mode,
unsigned int cmd, unsigned long arg)
 {
struct loop_device *lo = bdev->bd_disk->private_data;
+   void __user *argp = (void __user *) arg;
int err;
 
switch (cmd) {
@@ -1671,21 +1672,19 @@ static int lo_ioctl(struct block_device *bdev, fmode_t 
mode,
case LOOP_SET_STATUS:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
-   err = loop_set_status_old(lo,
-   (struct loop_info __user *)arg);
+   err = loop_set_status_old(lo, argp);
}
break;
case LOOP_GET_STATUS:
-   return loop_get_status_old(lo, (struct loop_info __user *) arg);
+   return loop_get_status_old(lo, argp);
case LOOP_SET_STATUS64:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
-   err = loop_set_status64(lo,
-   (struct loop_info64 __user *) arg);
+   err = loop_set_status64(lo, argp);
}
break;
case LOOP_GET_STATUS64:
-   return loop_get_status64(lo, (struct loop_info64 __user *) arg);
+   return loop_get_status64(lo, argp);
case LOOP_SET_CAPACITY:
case LOOP_SET_DIRECT_IO:
case LOOP_SET_BLOCK_SIZE:
-- 
2.26.2.303.gf8c07b1a785-goog



[PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl

2020-04-29 Thread Martijn Coenen
This allows userspace to completely setup a loop device with a single
ioctl, removing the in-between state where the device can be partially
configured - eg the loop device has a backing file associated with it,
but is reading from the wrong offset.

Besides removing the intermediate state, another big benefit of this
ioctl is that LOOP_SET_STATUS can be slow; the main reason for this
slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to
freeze the associated queue; this requires waiting for RCU
synchronization, which I've measured can take about 15-20ms on this
device on average.

In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can
also be used to:
- Set the correct block size immediately by setting
  loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE)
- Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO
  in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO)
- Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY
  in loop_config.info.lo_flags

Here's setting up ~70 regular loop devices with an offset on an x86
Android device, using LOOP_SET_FD and LOOP_SET_STATUS:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
0m03.40s real 0m00.02s user 0m00.03s system

Here's configuring ~70 devices in the same way, but using a modified
losetup that uses the new LOOP_CONFIGURE ioctl:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
0m01.94s real 0m00.01s user 0m00.01s system

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c  | 107 +++---
 include/uapi/linux/loop.h |  21 
 2 files changed, 99 insertions(+), 29 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index cfbdd99fdb1a..a353ce55fd18 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -241,6 +241,19 @@ loop_validate_size(loff_t size)
return 0;
 }
 
+/**
+ * loop_validate_block_size() - validates the passed in block size
+ * @bsize: size to validate
+ */
+static int
+loop_validate_block_size(unsigned short bsize)
+{
+   if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
+   return -EINVAL;
+
+   return 0;
+}
+
 /**
  * loop_set_size() - sets device size and notifies userspace
  * @lo: struct loop_device to set the size for
@@ -1063,23 +1076,24 @@ loop_set_status_from_info(struct loop_device *lo,
return 0;
 }
 
-static int loop_set_fd(struct loop_device *lo, fmode_t mode,
-  struct block_device *bdev, unsigned int arg)
+static int loop_configure(struct loop_device *lo, fmode_t mode,
+ struct block_device *bdev,
+ const struct loop_config *config)
 {
struct file *file;
struct inode*inode;
struct address_space *mapping;
struct block_device *claimed_bdev = NULL;
-   int lo_flags = 0;
int error;
loff_t  size;
boolpartscan;
+   unsigned short  bsize;
 
/* This is safe, since we have a reference from open(). */
__module_get(THIS_MODULE);
 
error = -EBADF;
-   file = fget(arg);
+   file = fget(config->fd);
if (!file)
goto out;
 
@@ -1088,7 +1102,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
 * here to avoid changing device under exclusive owner.
 */
if (!(mode & FMODE_EXCL)) {
-   claimed_bdev = bd_start_claiming(bdev, loop_set_fd);
+   claimed_bdev = bd_start_claiming(bdev, loop_configure);
if (IS_ERR(claimed_bdev)) {
error = PTR_ERR(claimed_bdev);
goto out_putf;
@@ -1110,44 +1124,58 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
mapping = file->f_mapping;
inode = mapping->host;
 
-   if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) ||
-   !file->f_op->write_iter)
-   lo_flags |= LO_FLAGS_READ_ONLY;
-
size = get_loop_size(lo, file);
error = loop_validate_size(size);
if (error)
goto out_unlock;
+
+   if ((config->info.lo_flags & ~LOOP_CONFIGURE_SETTABLE_FLAGS) != 0) {
+   error = -EINVAL;
+   goto out_unlock;
+   }
+
+   if (config->block_size) {
+   error = loop_validate_block_size(config->block_size);
+   if (error)
+   goto out_unlock;
+   }
+
+   error = loop_set_status_from_info(lo, >info);
+   if (error)
+   goto out_unlock;
+
+   if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) ||
+   !file->f_op->write_iter)
+   lo->

[PATCH v4 07/10] loop: Move loop_set_status_from_info() and friends up

2020-04-29 Thread Martijn Coenen
So we can use it without forward declaration. This is a separate commit
to make it easier to verify that this is just a move, without functional
modifications.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 206 +--
 1 file changed, 103 insertions(+), 103 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4a36a3f47503..60ba1ed95d77 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -962,6 +962,109 @@ static void loop_update_rotational(struct loop_device *lo)
blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
 }
 
+static int
+loop_release_xfer(struct loop_device *lo)
+{
+   int err = 0;
+   struct loop_func_table *xfer = lo->lo_encryption;
+
+   if (xfer) {
+   if (xfer->release)
+   err = xfer->release(lo);
+   lo->transfer = NULL;
+   lo->lo_encryption = NULL;
+   module_put(xfer->owner);
+   }
+   return err;
+}
+
+static int
+loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer,
+  const struct loop_info64 *i)
+{
+   int err = 0;
+
+   if (xfer) {
+   struct module *owner = xfer->owner;
+
+   if (!try_module_get(owner))
+   return -EINVAL;
+   if (xfer->init)
+   err = xfer->init(lo, i);
+   if (err)
+   module_put(owner);
+   else
+   lo->lo_encryption = xfer;
+   }
+   return err;
+}
+
+/**
+ * loop_set_status_from_info - configure device from loop_info
+ * @lo: struct loop_device to configure
+ * @info: struct loop_info64 to configure the device with
+ *
+ * Configures the loop device parameters according to the passed
+ * in loop_info64 configuration.
+ */
+static int
+loop_set_status_from_info(struct loop_device *lo,
+ const struct loop_info64 *info)
+{
+   int err;
+   struct loop_func_table *xfer;
+   kuid_t uid = current_uid();
+
+   if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE)
+   return -EINVAL;
+
+   err = loop_release_xfer(lo);
+   if (err)
+   return err;
+
+   if (info->lo_encrypt_type) {
+   unsigned int type = info->lo_encrypt_type;
+
+   if (type >= MAX_LO_CRYPT)
+   return -EINVAL;
+   xfer = xfer_funcs[type];
+   if (xfer == NULL)
+   return -EINVAL;
+   } else
+   xfer = NULL;
+
+   err = loop_init_xfer(lo, xfer, info);
+   if (err)
+   return err;
+
+   lo->lo_offset = info->lo_offset;
+   lo->lo_sizelimit = info->lo_sizelimit;
+   memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
+   memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE);
+   lo->lo_file_name[LO_NAME_SIZE-1] = 0;
+   lo->lo_crypt_name[LO_NAME_SIZE-1] = 0;
+
+   if (!xfer)
+   xfer = _funcs;
+   lo->transfer = xfer->transfer;
+   lo->ioctl = xfer->ioctl;
+
+   if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
+(info->lo_flags & LO_FLAGS_AUTOCLEAR))
+   lo->lo_flags ^= LO_FLAGS_AUTOCLEAR;
+
+   lo->lo_encrypt_key_size = info->lo_encrypt_key_size;
+   lo->lo_init[0] = info->lo_init[0];
+   lo->lo_init[1] = info->lo_init[1];
+   if (info->lo_encrypt_key_size) {
+   memcpy(lo->lo_encrypt_key, info->lo_encrypt_key,
+  info->lo_encrypt_key_size);
+   lo->lo_key_owner = uid;
+   }
+
+   return 0;
+}
+
 static int loop_set_fd(struct loop_device *lo, fmode_t mode,
   struct block_device *bdev, unsigned int arg)
 {
@@ -1085,43 +1188,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
return error;
 }
 
-static int
-loop_release_xfer(struct loop_device *lo)
-{
-   int err = 0;
-   struct loop_func_table *xfer = lo->lo_encryption;
-
-   if (xfer) {
-   if (xfer->release)
-   err = xfer->release(lo);
-   lo->transfer = NULL;
-   lo->lo_encryption = NULL;
-   module_put(xfer->owner);
-   }
-   return err;
-}
-
-static int
-loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer,
-  const struct loop_info64 *i)
-{
-   int err = 0;
-
-   if (xfer) {
-   struct module *owner = xfer->owner;
-
-   if (!try_module_get(owner))
-   return -EINVAL;
-   if (xfer->init)
-   err = xfer->init(lo, i);
-   if (err)
-   module_put(owner);
-   else
-   lo->lo

[PATCH v4 00/10] Add a new LOOP_CONFIGURE ioctl

2020-04-29 Thread Martijn Coenen
This series introduces a new ioctl that makes it possible to atomically
configure a loop device. Previously, if you wanted to set parameters
such as the offset on a loop device, this required calling LOOP_SET_FD
to set the backing file, and then LOOP_SET_STATUS to set the offset.
However, in between these two calls, the loop device is available and
would accept requests, which is generally not desirable. Similar issues
exist around setting the block size (LOOP_SET_BLOCK_SIZE) and requesting
direct I/O mode (LOOP_SET_DIRECT_IO).

There are also performance benefits with combining these ioctls into
one, which are described in more detail in the last change in the
series.

Note that this series depends on
"loop: Call loop_config_discard() only after new config is applied."
[0], which I sent as a separate patch as it fixes an unrelated bug.

[0]: https://lkml.org/lkml/2020/3/31/755

---
v4:
  - Addressed review comments from Christoph Hellwig:
-- Minor code cleanups
-- Clarified what lo_flags LOOP_SET_STATUS can set and clear, and
   made that more explicit in the code (see [9/10])
-- LOOP_CONFIGURE can now also be used to configure the block size
   and to explicitly request Direct I/O and read-only mode.
-- Explicitly reject lo_flags we don't know about in LOOP_CONFIGURE
-- Renamed LOOP_SET_FD_AND_STATUS to LOOP_CONFIGURE, since the ioctl
   can now do things LOOP_SET_STATUS couldn't do.
v3:
  - Addressed review comments from Christoph Hellwig:
-- Factored out loop_validate_size()
-- Split up the largish first patch in a few smaller ones
-- Use set_capacity_revalidate_and_notify()
  - Fixed a variable wrongly using size_t instead of loff_t
v2:
  - Addressed review comments from Bart van Assche:
-- Use SECTOR_SHIFT constant
-- Renamed loop_set_from_status() to loop_set_status_from_info()
-- Added kerneldoc for loop_set_status_from_info()
-- Removed dots in patch subject lines
  - Addressed review comments from Christoph Hellwig:
-- Added missing padding in struct loop_fd_and_status
-- Cleaned up some __user pointer handling in lo_ioctl
-- Pass in a stack-initialized loop_info64 for the legacy
   LOOP_SET_FD case

Martijn Coenen (10):
  loop: Factor out loop size validation
  loop: Factor out setting loop device size
  loop: Switch to set_capacity_revalidate_and_notify()
  loop: Refactor loop_set_status() size calculation
  loop: Remove figure_loop_size()
  loop: Factor out configuring loop from status
  loop: Move loop_set_status_from_info() and friends up
  loop: Rework lo_ioctl() __user argument casting
  loop: Clean up LOOP_SET_STATUS lo_flags handling.
  loop: Add LOOP_CONFIGURE ioctl

 drivers/block/loop.c  | 407 +++---
 include/uapi/linux/loop.h |  31 ++-
 2 files changed, 281 insertions(+), 157 deletions(-)

-- 
2.26.2.303.gf8c07b1a785-goog



[PATCH v4 04/10] loop: Refactor loop_set_status() size calculation

2020-04-29 Thread Martijn Coenen
figure_loop_size() calculates the loop size based on the passed in
parameters, but at the same time it updates the offset and sizelimit
parameters in the loop device configuration. That is a somewhat
unexpected side effect of a function with this name, and it is only only
needed by one of the two callers of this function - loop_set_status().

Move the lo_offset and lo_sizelimit assignment back into loop_set_status(),
and use the newly factored out functions to validate and apply the newly
calculated size. This allows us to get rid of figure_loop_size() in a
follow-up commit.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 44 +++-
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 8f3194c2b8aa..9f5913879921 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -268,11 +268,6 @@ figure_loop_size(struct loop_device *lo, loff_t offset, 
loff_t sizelimit)
if (err)
return err;
 
-   if (lo->lo_offset != offset)
-   lo->lo_offset = offset;
-   if (lo->lo_sizelimit != sizelimit)
-   lo->lo_sizelimit = sizelimit;
-
loop_set_size(lo, size);
 
return 0;
@@ -1294,6 +1289,8 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
kuid_t uid = current_uid();
struct block_device *bdev;
bool partscan = false;
+   bool size_changed = false;
+   loff_t validated_size;
 
err = mutex_lock_killable(_ctl_mutex);
if (err)
@@ -1315,6 +1312,13 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
 
if (lo->lo_offset != info->lo_offset ||
lo->lo_sizelimit != info->lo_sizelimit) {
+   loff_t size = get_size(info->lo_offset, info->lo_sizelimit,
+  lo->lo_backing_file);
+   err = loop_validate_size(size);
+   if (err)
+   goto out_unlock;
+   size_changed = true;
+   validated_size = size;
sync_blockdev(lo->lo_device);
kill_bdev(lo->lo_device);
}
@@ -1322,6 +1326,15 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
/* I/O need to be drained during transfer transition */
blk_mq_freeze_queue(lo->lo_queue);
 
+   if (size_changed && lo->lo_device->bd_inode->i_mapping->nrpages) {
+   /* If any pages were dirtied after kill_bdev(), try again */
+   err = -EAGAIN;
+   pr_warn("%s: loop%d (%s) has still dirty pages (nrpages=%lu)\n",
+   __func__, lo->lo_number, lo->lo_file_name,
+   lo->lo_device->bd_inode->i_mapping->nrpages);
+   goto out_unfreeze;
+   }
+
err = loop_release_xfer(lo);
if (err)
goto out_unfreeze;
@@ -1345,22 +1358,8 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
if (err)
goto out_unfreeze;
 
-   if (lo->lo_offset != info->lo_offset ||
-   lo->lo_sizelimit != info->lo_sizelimit) {
-   /* kill_bdev should have truncated all the pages */
-   if (lo->lo_device->bd_inode->i_mapping->nrpages) {
-   err = -EAGAIN;
-   pr_warn("%s: loop%d (%s) has still dirty pages 
(nrpages=%lu)\n",
-   __func__, lo->lo_number, lo->lo_file_name,
-   lo->lo_device->bd_inode->i_mapping->nrpages);
-   goto out_unfreeze;
-   }
-   if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit)) {
-   err = -EFBIG;
-   goto out_unfreeze;
-   }
-   }
-
+   lo->lo_offset = info->lo_offset;
+   lo->lo_sizelimit = info->lo_sizelimit;
memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE);
lo->lo_file_name[LO_NAME_SIZE-1] = 0;
@@ -1384,6 +1383,9 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
lo->lo_key_owner = uid;
}
 
+   if (size_changed)
+   loop_set_size(lo, validated_size);
+
loop_config_discard(lo);
 
/* update dio if lo_offset or transfer is changed */
-- 
2.26.2.303.gf8c07b1a785-goog



[PATCH v4 03/10] loop: Switch to set_capacity_revalidate_and_notify()

2020-04-29 Thread Martijn Coenen
This was recently added to block/genhd.c, and takes care of both
updating the capacity and notifying userspace of the new size.

Signed-off-by: Martijn Coenen 
---
 drivers/block/loop.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 6643e48ad71c..8f3194c2b8aa 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -253,10 +253,9 @@ static void loop_set_size(struct loop_device *lo, loff_t 
size)
 {
struct block_device *bdev = lo->lo_device;
 
-   set_capacity(lo->lo_disk, size);
bd_set_size(bdev, size << SECTOR_SHIFT);
-   /* let user-space know about the new size */
-   kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
+
+   set_capacity_revalidate_and_notify(lo->lo_disk, size, false);
 }
 
 static int
-- 
2.26.2.303.gf8c07b1a785-goog



Re: [PATCH v3 0/9] Add a new LOOP_SET_FD_AND_STATUS ioctl

2020-04-28 Thread Martijn Coenen
On Tue, Apr 28, 2020 at 9:02 AM Christoph Hellwig  wrote:
> I think reusing LO_FLAGS_DIRECT_IO makes sense to me - we have 32
> flags in the existing flags field (at least for loop_info64), so
> we might as well use the field and the flags.  Then we need flags
> validation in that we don't accept new flags through the old
> interface, and the new one validates that no unknown flags are passed.
>
> E.g. in the LOOP_SET_STATUS / LOOP_SET_STATUS64 handler do:
>
> lo->lo_flags &= ~(LO_LEGACY_FLAGS);
>

mmm, I thought lo_flags was read-only in LOOP_SET_STATUS(64):

 __u32  lo_flags;/* ioctl r/o */

but it looks LO_FLAGS_AUTOCLEAR is writable:

 if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
   (info->lo_flags & LO_FLAGS_AUTOCLEAR))
  lo->lo_flags ^= LO_FLAGS_AUTOCLEAR;

and it allows requesting a partition scan. It makes sense to maintain
that behavior, but what about LO_FLAGS_DIRECT_IO? I think you're
proposing LOOP_SET_STATUS(64) should keep ignoring that like it used
to?

Thanks,
Martijn

> and then in the main function reject anything not known.
>
> And then maybe add something like 64 bytes of padding to the end of the
> new structure, so that we can use flags to expand to it.


Re: [PATCH 4.9 30/47] ANDROID: binder: remove waitqueue when thread exits.

2019-10-07 Thread Martijn Coenen
On Sun, Oct 6, 2019 at 7:23 PM Greg Kroah-Hartman
 wrote:
>
> From: Martijn Coenen 
>
> commit f5cb779ba16334b45ba8946d6bfa6d9834d1527f upstream.
>
> binder_poll() passes the thread->wait waitqueue that
> can be slept on for work. When a thread that uses
> epoll explicitly exits using BINDER_THREAD_EXIT,
> the waitqueue is freed, but it is never removed
> from the corresponding epoll data structure. When
> the process subsequently exits, the epoll cleanup
> code tries to access the waitlist, which results in
> a use-after-free.
>
> Prevent this by using POLLFREE when the thread exits.
>
> Signed-off-by: Martijn Coenen 
> Reported-by: syzbot 
> Cc: stable  # 4.14
> [backport BINDER_LOOPER_STATE_POLL logic as well]
> Signed-off-by: Mattias Nissler 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  drivers/android/binder.c |   17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -334,7 +334,8 @@ enum {
> BINDER_LOOPER_STATE_EXITED  = 0x04,
> BINDER_LOOPER_STATE_INVALID = 0x08,
> BINDER_LOOPER_STATE_WAITING = 0x10,
> -   BINDER_LOOPER_STATE_NEED_RETURN = 0x20
> +   BINDER_LOOPER_STATE_NEED_RETURN = 0x20,
> +   BINDER_LOOPER_STATE_POLL= 0x40,
>  };
>
>  struct binder_thread {
> @@ -2628,6 +2629,18 @@ static int binder_free_thread(struct bin
> } else
> BUG();
> }
> +
> +   /*
> +* If this thread used poll, make sure we remove the waitqueue
> +* from any epoll data structures holding it with POLLFREE.
> +* waitqueue_active() is safe to use here because we're holding
> +* the inner lock.

This should be "global lock" in 4.9 and 4.4 :)

Otherwise LGTM, thanks!

Martijn

> +*/
> +   if ((thread->looper & BINDER_LOOPER_STATE_POLL) &&
> +   waitqueue_active(>wait)) {
> +   wake_up_poll(>wait, POLLHUP | POLLFREE);
> +   }
> +
> if (send_reply)
> binder_send_failed_reply(send_reply, BR_DEAD_REPLY);
> binder_release_work(>todo);
> @@ -2651,6 +2664,8 @@ static unsigned int binder_poll(struct f
> return POLLERR;
> }
>
> +   thread->looper |= BINDER_LOOPER_STATE_POLL;
> +
> wait_for_proc_work = thread->transaction_stack == NULL &&
> list_empty(>todo) && thread->return_error == BR_OK;
>
>
>


Re: [PATCH 4.9 30/47] ANDROID: binder: remove waitqueue when thread exits.

2019-10-07 Thread Martijn Coenen
On Mon, Oct 7, 2019 at 8:28 AM Mattias Nissler  wrote:
> Jann's PoC calls the BINDER_THREAD_EXIT ioctl to free the
> binder_thread which will then cause the UAF, and this is cut off by
> the patch. IIUC, you are worried about a similar AUF on the proc->wait
> access. I am not 100% sure, but I think the binder_proc lifetime
> matches the corresponding struct file instance, so it shouldn't be
> possible to get the binder_proc deallocated while still being able to
> access it via filp->private_data.

Yes, I think this is correct; either the binder fd is closed first, in
which case eventpoll_release() removes the waitqueue from the list
before it is freed (before binder's release() is called); instead if
the epoll fd is closed first, it will likewise remove the waitqueue
itself, before binder_proc can be freed.. I don't know the __fput()
code that well, but at first glance it seems these two can't overlap.

The whole problem with BINDER_THREAD_EXIT was that the returned
waitqueue wasn't tied to the lifetime of the underlying file.

Apologies for not spotting this needed a backport BTW - I refactored
the wait code heavily somewhere between 4.9 and 4.14, and somehow
didn't realize the same problem existed in the old code.

Thanks,
Martijn

>
> > >
> > > wait_for_proc_work = thread->transaction_stack == NULL &&
> > > list_empty(>todo) && thread->return_error == 
> > > BR_OK;
> > >
> > > binder_unlock(__func__);
> > >
> > > if (wait_for_proc_work) {
> > > if (binder_has_proc_work(proc, thread))
> > > return POLLIN;
> > > poll_wait(filp, >wait, wait);
> > > if (binder_has_proc_work(proc, thread))
> > > return POLLIN;
> > > } else {
> > > if (binder_has_thread_work(thread))
> > > return POLLIN;
> > > poll_wait(filp, >wait, wait);
> > > if (binder_has_thread_work(thread))
> > > return POLLIN;
> > > }
> > > return 0;
> >
> > I _think_ the backport is correct, and I know someone has verified that
> > the 4.4.y backport works properly and I don't see much difference here
> > from that version.
> >
> > But I will defer to Todd and Martijn here, as they know this code _WAY_
> > better than I do.  The codebase has changed a lot from 4.9.y to 4.14.y
> > so it makes it hard to do equal comparisons simply.
> >
> > Todd and Martijn, thoughts?
> >
> > thanks,
> >
> > greg k-h


[PATCH] binder: Set end of SG buffer area properly.

2019-07-09 Thread Martijn Coenen
In case the target node requests a security context, the
extra_buffers_size is increased with the size of the security context.
But, that size is not available for use by regular scatter-gather
buffers; make sure the ending of that buffer is marked correctly.

Acked-by: Todd Kjos 
Fixes: ec74136ded79 ("binder: create node flag to request sender's
security context")
Signed-off-by: Martijn Coenen 
Cc: sta...@vger.kernel.org # 5.1+
---
 drivers/android/binder.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 38a59a630cd4c..5bde08603fbc2 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3239,7 +3239,8 @@ static void binder_transaction(struct binder_proc *proc,
buffer_offset = off_start_offset;
off_end_offset = off_start_offset + tr->offsets_size;
sg_buf_offset = ALIGN(off_end_offset, sizeof(void *));
-   sg_buf_end_offset = sg_buf_offset + extra_buffers_size;
+   sg_buf_end_offset = sg_buf_offset + extra_buffers_size -
+   ALIGN(secctx_sz, sizeof(u64));
off_min = 0;
for (buffer_offset = off_start_offset; buffer_offset < off_end_offset;
 buffer_offset += sizeof(binder_size_t)) {
-- 
2.22.0.410.gd8fdbe21b5-goog



Re: [PATCH] MAINTAINERS: update Android binder

2018-12-20 Thread Martijn Coenen
Thanks Christian for signing up!

On Thu, Dec 20, 2018 at 12:09 PM Greg KH  wrote:
> Ok, no objection from me, Todd/Martijn/Joel/Arve, can I get an ack from
> someone?

Acked-By: Martijn Coenen 

>
> thanks,
>
> greg k-h


[PATCH v2] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.

2018-09-07 Thread Martijn Coenen
This allows the context manager to retrieve information about nodes
that it holds a reference to, such as the current number of
references to those nodes.

Such information can for example be used to determine whether the
servicemanager is the only process holding a reference to a node.
This information can then be passed on to the process holding the
node, which can in turn decide whether it wants to shut down to
reduce resource usage.

Signed-off-by: Martijn Coenen 
---
v2: made sure reserved fields are aligned, and enforce caller zeroes
all fields except handle, as suggested by Dan Carpenter.

 drivers/android/binder.c| 55 +
 include/uapi/linux/android/binder.h | 10 ++
 2 files changed, 65 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d58763b6b0090..5b25412e15ccf 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4544,6 +4544,42 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp)
return ret;
 }
 
+static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc,
+   struct binder_node_info_for_ref *info)
+{
+   struct binder_node *node;
+   struct binder_context *context = proc->context;
+   __u32 handle = info->handle;
+
+   if (info->strong_count || info->weak_count || info->reserved1 ||
+   info->reserved2 || info->reserved3) {
+   binder_user_error("%d BINDER_GET_NODE_INFO_FOR_REF: only handle 
may be non-zero.",
+ proc->pid);
+   return -EINVAL;
+   }
+
+   /* This ioctl may only be used by the context manager */
+   mutex_lock(>context_mgr_node_lock);
+   if (!context->binder_context_mgr_node ||
+   context->binder_context_mgr_node->proc != proc) {
+   mutex_unlock(>context_mgr_node_lock);
+   return -EPERM;
+   }
+   mutex_unlock(>context_mgr_node_lock);
+
+   node = binder_get_node_from_ref(proc, handle, true, NULL);
+   if (!node)
+   return -EINVAL;
+
+   info->strong_count = node->local_strong_refs +
+   node->internal_strong_refs;
+   info->weak_count = node->local_weak_refs;
+
+   binder_put_node(node);
+
+   return 0;
+}
+
 static int binder_ioctl_get_node_debug_info(struct binder_proc *proc,
struct binder_node_debug_info *info)
 {
@@ -4638,6 +4674,25 @@ static long binder_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
}
break;
}
+   case BINDER_GET_NODE_INFO_FOR_REF: {
+   struct binder_node_info_for_ref info;
+
+   if (copy_from_user(, ubuf, sizeof(info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   ret = binder_ioctl_get_node_info_for_ref(proc, );
+   if (ret < 0)
+   goto err;
+
+   if (copy_to_user(ubuf, , sizeof(info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   break;
+   }
case BINDER_GET_NODE_DEBUG_INFO: {
struct binder_node_debug_info info;
 
diff --git a/include/uapi/linux/android/binder.h 
b/include/uapi/linux/android/binder.h
index bfaec6903b8bc..b9ba520f7e4bb 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -200,6 +200,15 @@ struct binder_node_debug_info {
__u32has_weak_ref;
 };
 
+struct binder_node_info_for_ref {
+   __u32handle;
+   __u32strong_count;
+   __u32weak_count;
+   __u32reserved1;
+   __u32reserved2;
+   __u32reserved3;
+};
+
 #define BINDER_WRITE_READ  _IOWR('b', 1, struct binder_write_read)
 #define BINDER_SET_IDLE_TIMEOUT_IOW('b', 3, __s64)
 #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32)
@@ -208,6 +217,7 @@ struct binder_node_debug_info {
 #define BINDER_THREAD_EXIT _IOW('b', 8, __s32)
 #define BINDER_VERSION _IOWR('b', 9, struct binder_version)
 #define BINDER_GET_NODE_DEBUG_INFO _IOWR('b', 11, struct 
binder_node_debug_info)
+#define BINDER_GET_NODE_INFO_FOR_REF   _IOWR('b', 12, struct 
binder_node_info_for_ref)
 
 /*
  * NOTE: Two special error codes you should check for when calling
-- 
2.19.0.rc2.392.g5ba43deb5a-goog



[PATCH v2] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.

2018-09-07 Thread Martijn Coenen
This allows the context manager to retrieve information about nodes
that it holds a reference to, such as the current number of
references to those nodes.

Such information can for example be used to determine whether the
servicemanager is the only process holding a reference to a node.
This information can then be passed on to the process holding the
node, which can in turn decide whether it wants to shut down to
reduce resource usage.

Signed-off-by: Martijn Coenen 
---
v2: made sure reserved fields are aligned, and enforce caller zeroes
all fields except handle, as suggested by Dan Carpenter.

 drivers/android/binder.c| 55 +
 include/uapi/linux/android/binder.h | 10 ++
 2 files changed, 65 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d58763b6b0090..5b25412e15ccf 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4544,6 +4544,42 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp)
return ret;
 }
 
+static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc,
+   struct binder_node_info_for_ref *info)
+{
+   struct binder_node *node;
+   struct binder_context *context = proc->context;
+   __u32 handle = info->handle;
+
+   if (info->strong_count || info->weak_count || info->reserved1 ||
+   info->reserved2 || info->reserved3) {
+   binder_user_error("%d BINDER_GET_NODE_INFO_FOR_REF: only handle 
may be non-zero.",
+ proc->pid);
+   return -EINVAL;
+   }
+
+   /* This ioctl may only be used by the context manager */
+   mutex_lock(>context_mgr_node_lock);
+   if (!context->binder_context_mgr_node ||
+   context->binder_context_mgr_node->proc != proc) {
+   mutex_unlock(>context_mgr_node_lock);
+   return -EPERM;
+   }
+   mutex_unlock(>context_mgr_node_lock);
+
+   node = binder_get_node_from_ref(proc, handle, true, NULL);
+   if (!node)
+   return -EINVAL;
+
+   info->strong_count = node->local_strong_refs +
+   node->internal_strong_refs;
+   info->weak_count = node->local_weak_refs;
+
+   binder_put_node(node);
+
+   return 0;
+}
+
 static int binder_ioctl_get_node_debug_info(struct binder_proc *proc,
struct binder_node_debug_info *info)
 {
@@ -4638,6 +4674,25 @@ static long binder_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
}
break;
}
+   case BINDER_GET_NODE_INFO_FOR_REF: {
+   struct binder_node_info_for_ref info;
+
+   if (copy_from_user(, ubuf, sizeof(info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   ret = binder_ioctl_get_node_info_for_ref(proc, );
+   if (ret < 0)
+   goto err;
+
+   if (copy_to_user(ubuf, , sizeof(info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   break;
+   }
case BINDER_GET_NODE_DEBUG_INFO: {
struct binder_node_debug_info info;
 
diff --git a/include/uapi/linux/android/binder.h 
b/include/uapi/linux/android/binder.h
index bfaec6903b8bc..b9ba520f7e4bb 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -200,6 +200,15 @@ struct binder_node_debug_info {
__u32has_weak_ref;
 };
 
+struct binder_node_info_for_ref {
+   __u32handle;
+   __u32strong_count;
+   __u32weak_count;
+   __u32reserved1;
+   __u32reserved2;
+   __u32reserved3;
+};
+
 #define BINDER_WRITE_READ  _IOWR('b', 1, struct binder_write_read)
 #define BINDER_SET_IDLE_TIMEOUT_IOW('b', 3, __s64)
 #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32)
@@ -208,6 +217,7 @@ struct binder_node_debug_info {
 #define BINDER_THREAD_EXIT _IOW('b', 8, __s32)
 #define BINDER_VERSION _IOWR('b', 9, struct binder_version)
 #define BINDER_GET_NODE_DEBUG_INFO _IOWR('b', 11, struct 
binder_node_debug_info)
+#define BINDER_GET_NODE_INFO_FOR_REF   _IOWR('b', 12, struct 
binder_node_info_for_ref)
 
 /*
  * NOTE: Two special error codes you should check for when calling
-- 
2.19.0.rc2.392.g5ba43deb5a-goog



Re: [PATCH] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.

2018-09-05 Thread Martijn Coenen
On Wed, Sep 5, 2018 at 11:09 AM, Dan Carpenter  wrote:
> What's the reserved for?  On 64 bit systems there is a 4 byte struct
> hole between weak_count and reserved.

There's many more pieces of information that we hold for a node. While
we don't have a use for most of that now, we may want some of it in
the future, and so I thought it would be wise to reserve some space
here so we don't need a new ioctl when that happens. I'm actually not
sure it's common to do things this way.

> Why not just make reserved a
> __u32 and get rid of the hole?  (Not rhetorical, I have no idea).

Because I thought 8 bytes of reserved space would be nice :-) But you
have a good point re:alignment, I should make it two __u32's then.

>
> Btw, people sometimes complain about that we don't check that user input
> is zeroed in ioctls.  Like for example maybe they're passing random data
> in the the strong_count field and then later we decide that actually
> that field should mean something but we can't make it mean anything
> because we've been letting the user put whatever they want there.  These
> are just random thoughts in my head, not necessarily important.

That's a good point, I will change the code to check for that.

Thanks,
Martijn

>
> regards,
> dan carpenter
>


Re: [PATCH] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.

2018-09-05 Thread Martijn Coenen
On Wed, Sep 5, 2018 at 11:09 AM, Dan Carpenter  wrote:
> What's the reserved for?  On 64 bit systems there is a 4 byte struct
> hole between weak_count and reserved.

There's many more pieces of information that we hold for a node. While
we don't have a use for most of that now, we may want some of it in
the future, and so I thought it would be wise to reserve some space
here so we don't need a new ioctl when that happens. I'm actually not
sure it's common to do things this way.

> Why not just make reserved a
> __u32 and get rid of the hole?  (Not rhetorical, I have no idea).

Because I thought 8 bytes of reserved space would be nice :-) But you
have a good point re:alignment, I should make it two __u32's then.

>
> Btw, people sometimes complain about that we don't check that user input
> is zeroed in ioctls.  Like for example maybe they're passing random data
> in the the strong_count field and then later we decide that actually
> that field should mean something but we can't make it mean anything
> because we've been letting the user put whatever they want there.  These
> are just random thoughts in my head, not necessarily important.

That's a good point, I will change the code to check for that.

Thanks,
Martijn

>
> regards,
> dan carpenter
>


[PATCH] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.

2018-09-05 Thread Martijn Coenen
This allows the context manager to retrieve information about nodes
that it holds a reference to, such as the current number of
references to those nodes.

Such information can for example be used to determine whether the
servicemanager is the only process holding a reference to a node.
This information can then be passed on to the process holding the
node, which can in turn decide whether it wants to shut down to
reduce resource usage.

Signed-off-by: Martijn Coenen 
---
 drivers/android/binder.c| 50 +
 include/uapi/linux/android/binder.h |  8 +
 2 files changed, 58 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d58763b6b0090..1c7e965241fb8 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4544,6 +4544,37 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp)
return ret;
 }
 
+static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc,
+   struct binder_node_info_for_ref *info)
+{
+   struct binder_node *node;
+   struct binder_context *context = proc->context;
+   __u32 handle = info->handle;
+
+   memset(info, 0, sizeof(*info));
+
+   /* This ioctl may only be used by the context manager */
+   mutex_lock(>context_mgr_node_lock);
+   if (!context->binder_context_mgr_node ||
+   context->binder_context_mgr_node->proc != proc) {
+   mutex_unlock(>context_mgr_node_lock);
+   return -EPERM;
+   }
+   mutex_unlock(>context_mgr_node_lock);
+
+   node = binder_get_node_from_ref(proc, handle, true, NULL);
+   if (!node)
+   return -EINVAL;
+
+   info->strong_count = node->local_strong_refs +
+   node->internal_strong_refs;
+   info->weak_count = node->local_weak_refs;
+
+   binder_put_node(node);
+
+   return 0;
+}
+
 static int binder_ioctl_get_node_debug_info(struct binder_proc *proc,
struct binder_node_debug_info *info)
 {
@@ -4638,6 +4669,25 @@ static long binder_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
}
break;
}
+   case BINDER_GET_NODE_INFO_FOR_REF: {
+   struct binder_node_info_for_ref info;
+
+   if (copy_from_user(, ubuf, sizeof(info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   ret = binder_ioctl_get_node_info_for_ref(proc, );
+   if (ret < 0)
+   goto err;
+
+   if (copy_to_user(ubuf, , sizeof(info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   break;
+   }
case BINDER_GET_NODE_DEBUG_INFO: {
struct binder_node_debug_info info;
 
diff --git a/include/uapi/linux/android/binder.h 
b/include/uapi/linux/android/binder.h
index bfaec6903b8bc..a54a680ff2936 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -200,6 +200,13 @@ struct binder_node_debug_info {
__u32has_weak_ref;
 };
 
+struct binder_node_info_for_ref {
+   __u32handle;
+   __u32strong_count;
+   __u32weak_count;
+   __u64reserved;
+};
+
 #define BINDER_WRITE_READ  _IOWR('b', 1, struct binder_write_read)
 #define BINDER_SET_IDLE_TIMEOUT_IOW('b', 3, __s64)
 #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32)
@@ -208,6 +215,7 @@ struct binder_node_debug_info {
 #define BINDER_THREAD_EXIT _IOW('b', 8, __s32)
 #define BINDER_VERSION _IOWR('b', 9, struct binder_version)
 #define BINDER_GET_NODE_DEBUG_INFO _IOWR('b', 11, struct 
binder_node_debug_info)
+#define BINDER_GET_NODE_INFO_FOR_REF   _IOWR('b', 12, struct 
binder_node_info_for_ref)
 
 /*
  * NOTE: Two special error codes you should check for when calling
-- 
2.19.0.rc1.350.ge57e33dbd1-goog



[PATCH] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.

2018-09-05 Thread Martijn Coenen
This allows the context manager to retrieve information about nodes
that it holds a reference to, such as the current number of
references to those nodes.

Such information can for example be used to determine whether the
servicemanager is the only process holding a reference to a node.
This information can then be passed on to the process holding the
node, which can in turn decide whether it wants to shut down to
reduce resource usage.

Signed-off-by: Martijn Coenen 
---
 drivers/android/binder.c| 50 +
 include/uapi/linux/android/binder.h |  8 +
 2 files changed, 58 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d58763b6b0090..1c7e965241fb8 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4544,6 +4544,37 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp)
return ret;
 }
 
+static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc,
+   struct binder_node_info_for_ref *info)
+{
+   struct binder_node *node;
+   struct binder_context *context = proc->context;
+   __u32 handle = info->handle;
+
+   memset(info, 0, sizeof(*info));
+
+   /* This ioctl may only be used by the context manager */
+   mutex_lock(>context_mgr_node_lock);
+   if (!context->binder_context_mgr_node ||
+   context->binder_context_mgr_node->proc != proc) {
+   mutex_unlock(>context_mgr_node_lock);
+   return -EPERM;
+   }
+   mutex_unlock(>context_mgr_node_lock);
+
+   node = binder_get_node_from_ref(proc, handle, true, NULL);
+   if (!node)
+   return -EINVAL;
+
+   info->strong_count = node->local_strong_refs +
+   node->internal_strong_refs;
+   info->weak_count = node->local_weak_refs;
+
+   binder_put_node(node);
+
+   return 0;
+}
+
 static int binder_ioctl_get_node_debug_info(struct binder_proc *proc,
struct binder_node_debug_info *info)
 {
@@ -4638,6 +4669,25 @@ static long binder_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
}
break;
}
+   case BINDER_GET_NODE_INFO_FOR_REF: {
+   struct binder_node_info_for_ref info;
+
+   if (copy_from_user(, ubuf, sizeof(info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   ret = binder_ioctl_get_node_info_for_ref(proc, );
+   if (ret < 0)
+   goto err;
+
+   if (copy_to_user(ubuf, , sizeof(info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   break;
+   }
case BINDER_GET_NODE_DEBUG_INFO: {
struct binder_node_debug_info info;
 
diff --git a/include/uapi/linux/android/binder.h 
b/include/uapi/linux/android/binder.h
index bfaec6903b8bc..a54a680ff2936 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -200,6 +200,13 @@ struct binder_node_debug_info {
__u32has_weak_ref;
 };
 
+struct binder_node_info_for_ref {
+   __u32handle;
+   __u32strong_count;
+   __u32weak_count;
+   __u64reserved;
+};
+
 #define BINDER_WRITE_READ  _IOWR('b', 1, struct binder_write_read)
 #define BINDER_SET_IDLE_TIMEOUT_IOW('b', 3, __s64)
 #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32)
@@ -208,6 +215,7 @@ struct binder_node_debug_info {
 #define BINDER_THREAD_EXIT _IOW('b', 8, __s32)
 #define BINDER_VERSION _IOWR('b', 9, struct binder_version)
 #define BINDER_GET_NODE_DEBUG_INFO _IOWR('b', 11, struct 
binder_node_debug_info)
+#define BINDER_GET_NODE_INFO_FOR_REF   _IOWR('b', 12, struct 
binder_node_info_for_ref)
 
 /*
  * NOTE: Two special error codes you should check for when calling
-- 
2.19.0.rc1.350.ge57e33dbd1-goog



Re: KASAN: null-ptr-deref Write in binder_update_page_range

2018-08-27 Thread Martijn Coenen
Thanks Minchan!

On Thu, Aug 23, 2018 at 7:29 AM, Minchan Kim  wrote:
> Signed-off-by: Todd Kjos 
> Signed-off-by: Minchan Kim 
Reviewed-by: Martijn Coenen 

> ---
>  drivers/android/binder_alloc.c | 43 +++---
>  1 file changed, 35 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 3f3b7b253445..64fd96eada31 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -332,6 +332,35 @@ static int binder_update_page_range(struct binder_alloc 
> *alloc, int allocate,
> return vma ? -ENOMEM : -ESRCH;
>  }
>
> +
> +static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
> +   struct vm_area_struct *vma)
> +{
> +   if (vma)
> +   alloc->vma_vm_mm = vma->vm_mm;
> +   /*
> +* If we see alloc->vma is not NULL, buffer data structures set up
> +* completely. Look at smp_rmb side binder_alloc_get_vma.
> +* We also want to guarantee new alloc->vma_vm_mm is always visible
> +* if alloc->vma is set.
> +*/
> +   smp_wmb();
> +   alloc->vma = vma;
> +}
> +
> +static inline struct vm_area_struct *binder_alloc_get_vma(
> +   struct binder_alloc *alloc)
> +{
> +   struct vm_area_struct *vma = NULL;
> +
> +   if (alloc->vma) {
> +   /* Look at description in binder_alloc_set_vma */
> +   smp_rmb();
> +   vma = alloc->vma;
> +   }
> +   return vma;
> +}
> +
>  static struct binder_buffer *binder_alloc_new_buf_locked(
> struct binder_alloc *alloc,
> size_t data_size,
> @@ -348,7 +377,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
> size_t size, data_offsets_size;
> int ret;
>
> -   if (alloc->vma == NULL) {
> +   if (!binder_alloc_get_vma(alloc)) {
> binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
>"%d: binder_alloc_buf, no vma\n",
>alloc->pid);
> @@ -723,9 +752,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
> buffer->free = 1;
> binder_insert_free_buffer(alloc, buffer);
> alloc->free_async_space = alloc->buffer_size / 2;
> -   barrier();
> -   alloc->vma = vma;
> -   alloc->vma_vm_mm = vma->vm_mm;
> +   binder_alloc_set_vma(alloc, vma);
> mmgrab(alloc->vma_vm_mm);
>
> return 0;
> @@ -754,10 +781,10 @@ void binder_alloc_deferred_release(struct binder_alloc 
> *alloc)
> int buffers, page_count;
> struct binder_buffer *buffer;
>
> -   BUG_ON(alloc->vma);
> -
> buffers = 0;
> mutex_lock(>mutex);
> +   BUG_ON(alloc->vma);
> +
> while ((n = rb_first(>allocated_buffers))) {
> buffer = rb_entry(n, struct binder_buffer, rb_node);
>
> @@ -900,7 +927,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc 
> *alloc)
>   */
>  void binder_alloc_vma_close(struct binder_alloc *alloc)
>  {
> -   WRITE_ONCE(alloc->vma, NULL);
> +   binder_alloc_set_vma(alloc, NULL);
>  }
>
>  /**
> @@ -935,7 +962,7 @@ enum lru_status binder_alloc_free_page(struct list_head 
> *item,
>
> index = page - alloc->pages;
> page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE;
> -   vma = alloc->vma;
> +   vma = binder_alloc_get_vma(alloc);
> if (vma) {
> if (!mmget_not_zero(alloc->vma_vm_mm))
> goto err_mmget;
> --
> 2.18.0.1017.ga543ac7ca45-goog
>
>
>
>


Re: KASAN: null-ptr-deref Write in binder_update_page_range

2018-08-27 Thread Martijn Coenen
Thanks Minchan!

On Thu, Aug 23, 2018 at 7:29 AM, Minchan Kim  wrote:
> Signed-off-by: Todd Kjos 
> Signed-off-by: Minchan Kim 
Reviewed-by: Martijn Coenen 

> ---
>  drivers/android/binder_alloc.c | 43 +++---
>  1 file changed, 35 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 3f3b7b253445..64fd96eada31 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -332,6 +332,35 @@ static int binder_update_page_range(struct binder_alloc 
> *alloc, int allocate,
> return vma ? -ENOMEM : -ESRCH;
>  }
>
> +
> +static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
> +   struct vm_area_struct *vma)
> +{
> +   if (vma)
> +   alloc->vma_vm_mm = vma->vm_mm;
> +   /*
> +* If we see alloc->vma is not NULL, buffer data structures set up
> +* completely. Look at smp_rmb side binder_alloc_get_vma.
> +* We also want to guarantee new alloc->vma_vm_mm is always visible
> +* if alloc->vma is set.
> +*/
> +   smp_wmb();
> +   alloc->vma = vma;
> +}
> +
> +static inline struct vm_area_struct *binder_alloc_get_vma(
> +   struct binder_alloc *alloc)
> +{
> +   struct vm_area_struct *vma = NULL;
> +
> +   if (alloc->vma) {
> +   /* Look at description in binder_alloc_set_vma */
> +   smp_rmb();
> +   vma = alloc->vma;
> +   }
> +   return vma;
> +}
> +
>  static struct binder_buffer *binder_alloc_new_buf_locked(
> struct binder_alloc *alloc,
> size_t data_size,
> @@ -348,7 +377,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
> size_t size, data_offsets_size;
> int ret;
>
> -   if (alloc->vma == NULL) {
> +   if (!binder_alloc_get_vma(alloc)) {
> binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
>"%d: binder_alloc_buf, no vma\n",
>alloc->pid);
> @@ -723,9 +752,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
> buffer->free = 1;
> binder_insert_free_buffer(alloc, buffer);
> alloc->free_async_space = alloc->buffer_size / 2;
> -   barrier();
> -   alloc->vma = vma;
> -   alloc->vma_vm_mm = vma->vm_mm;
> +   binder_alloc_set_vma(alloc, vma);
> mmgrab(alloc->vma_vm_mm);
>
> return 0;
> @@ -754,10 +781,10 @@ void binder_alloc_deferred_release(struct binder_alloc 
> *alloc)
> int buffers, page_count;
> struct binder_buffer *buffer;
>
> -   BUG_ON(alloc->vma);
> -
> buffers = 0;
> mutex_lock(>mutex);
> +   BUG_ON(alloc->vma);
> +
> while ((n = rb_first(>allocated_buffers))) {
> buffer = rb_entry(n, struct binder_buffer, rb_node);
>
> @@ -900,7 +927,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc 
> *alloc)
>   */
>  void binder_alloc_vma_close(struct binder_alloc *alloc)
>  {
> -   WRITE_ONCE(alloc->vma, NULL);
> +   binder_alloc_set_vma(alloc, NULL);
>  }
>
>  /**
> @@ -935,7 +962,7 @@ enum lru_status binder_alloc_free_page(struct list_head 
> *item,
>
> index = page - alloc->pages;
> page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE;
> -   vma = alloc->vma;
> +   vma = binder_alloc_get_vma(alloc);
> if (vma) {
> if (!mmget_not_zero(alloc->vma_vm_mm))
> goto err_mmget;
> --
> 2.18.0.1017.ga543ac7ca45-goog
>
>
>
>


Re: [PATCH] android: binder: no outgoing transaction when thread todo has transaction

2018-08-14 Thread Martijn Coenen
Sherry, this was found by syzkaller, right? In that case, can you add
the  tag so the issue gets closed automatically when this
gets merged?


On Tue, Aug 14, 2018 at 2:28 AM, Sherry Yang  wrote:
> When a process dies, failed reply is sent to the sender of any transaction
> queued on a dead thread's todo list. The sender asserts that the
> received failed reply corresponds to the head of the transaction stack.
> This assert can fail if the dead thread is allowed to send outgoing
> transactions when there is already a transaction on its todo list,
> because this new transaction can end up on the transaction stack of the
> original sender. The following steps illustrate how this assertion can
> fail.
>
> 1. Thread1 sends txn19 to Thread2
>(T1->transaction_stack=txn19, T2->todo+=txn19)
> 2. Without processing todo list, Thread2 sends txn20 to Thread1
>(T1->todo+=txn20, T2->transaction_stack=txn20)
> 3. T1 processes txn20 on its todo list
>(T1->transaction_stack=txn20->txn19, T1->todo=)
> 4. T2 dies, T2->todo cleanup attempts to send failed reply for txn19, but
>T1->transaction_stack points to txn20 -- assertion failes
>
> Step 2. is the incorrect behavior. When there is a transaction on a
> thread's todo list, this thread should not be able to send any outgoing
> synchronous transactions. Only the head of the todo list needs to be
> checked because only threads that are waiting for proc work can directly
> receive work from another thread, and no work is allowed to be queued
> on such a thread without waking up the thread. This patch also enforces
> that a thread is not waiting for proc work when a work is directly
> enqueued to its todo list.
>
> Acked-by: Arve Hjønnevåg 
> Signed-off-by: Sherry Yang 

Reviewed-by: Martijn Coenen 

Thanks,
Martijn

> ---
>  drivers/android/binder.c | 44 +---
>  1 file changed, 32 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index d58763b6b009..f2081934eb8b 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -822,6 +822,7 @@ static void
>  binder_enqueue_deferred_thread_work_ilocked(struct binder_thread *thread,
> struct binder_work *work)
>  {
> +   WARN_ON(!list_empty(>waiting_thread_node));
> binder_enqueue_work_ilocked(work, >todo);
>  }
>
> @@ -839,6 +840,7 @@ static void
>  binder_enqueue_thread_work_ilocked(struct binder_thread *thread,
>struct binder_work *work)
>  {
> +   WARN_ON(!list_empty(>waiting_thread_node));
> binder_enqueue_work_ilocked(work, >todo);
> thread->process_todo = true;
>  }
> @@ -1270,19 +1272,12 @@ static int binder_inc_node_nilocked(struct 
> binder_node *node, int strong,
> } else
> node->local_strong_refs++;
> if (!node->has_strong_ref && target_list) {
> +   struct binder_thread *thread = 
> container_of(target_list,
> +   struct binder_thread, 
> todo);
> binder_dequeue_work_ilocked(>work);
> -   /*
> -* Note: this function is the only place where we 
> queue
> -* directly to a thread->todo without using the
> -* corresponding binder_enqueue_thread_work() helper
> -* functions; in this case it's ok to not set the
> -* process_todo flag, since we know this node work 
> will
> -* always be followed by other work that starts queue
> -* processing: in case of synchronous transactions, a
> -* BR_REPLY or BR_ERROR; in case of oneway
> -* transactions, a BR_TRANSACTION_COMPLETE.
> -*/
> -   binder_enqueue_work_ilocked(>work, target_list);
> +   BUG_ON(>todo != target_list);
> +   binder_enqueue_deferred_thread_work_ilocked(thread,
> +  
> >work);
> }
> } else {
> if (!internal)
> @@ -2723,6 +2718,7 @@ static void binder_transaction(struct binder_proc *proc,
>  {
> int ret;
> struct binder_transaction *t;
> +   struct binder_work *w;
> struct binder_work *tcomplete;
> binder_size_t *offp, *off_end, *off_start;
> binder_size_t off_min;
> @@ -2864,6 +2860,2

Re: [PATCH] android: binder: no outgoing transaction when thread todo has transaction

2018-08-14 Thread Martijn Coenen
Sherry, this was found by syzkaller, right? In that case, can you add
the  tag so the issue gets closed automatically when this
gets merged?


On Tue, Aug 14, 2018 at 2:28 AM, Sherry Yang  wrote:
> When a process dies, failed reply is sent to the sender of any transaction
> queued on a dead thread's todo list. The sender asserts that the
> received failed reply corresponds to the head of the transaction stack.
> This assert can fail if the dead thread is allowed to send outgoing
> transactions when there is already a transaction on its todo list,
> because this new transaction can end up on the transaction stack of the
> original sender. The following steps illustrate how this assertion can
> fail.
>
> 1. Thread1 sends txn19 to Thread2
>(T1->transaction_stack=txn19, T2->todo+=txn19)
> 2. Without processing todo list, Thread2 sends txn20 to Thread1
>(T1->todo+=txn20, T2->transaction_stack=txn20)
> 3. T1 processes txn20 on its todo list
>(T1->transaction_stack=txn20->txn19, T1->todo=)
> 4. T2 dies, T2->todo cleanup attempts to send failed reply for txn19, but
>T1->transaction_stack points to txn20 -- assertion failes
>
> Step 2. is the incorrect behavior. When there is a transaction on a
> thread's todo list, this thread should not be able to send any outgoing
> synchronous transactions. Only the head of the todo list needs to be
> checked because only threads that are waiting for proc work can directly
> receive work from another thread, and no work is allowed to be queued
> on such a thread without waking up the thread. This patch also enforces
> that a thread is not waiting for proc work when a work is directly
> enqueued to its todo list.
>
> Acked-by: Arve Hjønnevåg 
> Signed-off-by: Sherry Yang 

Reviewed-by: Martijn Coenen 

Thanks,
Martijn

> ---
>  drivers/android/binder.c | 44 +---
>  1 file changed, 32 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index d58763b6b009..f2081934eb8b 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -822,6 +822,7 @@ static void
>  binder_enqueue_deferred_thread_work_ilocked(struct binder_thread *thread,
> struct binder_work *work)
>  {
> +   WARN_ON(!list_empty(>waiting_thread_node));
> binder_enqueue_work_ilocked(work, >todo);
>  }
>
> @@ -839,6 +840,7 @@ static void
>  binder_enqueue_thread_work_ilocked(struct binder_thread *thread,
>struct binder_work *work)
>  {
> +   WARN_ON(!list_empty(>waiting_thread_node));
> binder_enqueue_work_ilocked(work, >todo);
> thread->process_todo = true;
>  }
> @@ -1270,19 +1272,12 @@ static int binder_inc_node_nilocked(struct 
> binder_node *node, int strong,
> } else
> node->local_strong_refs++;
> if (!node->has_strong_ref && target_list) {
> +   struct binder_thread *thread = 
> container_of(target_list,
> +   struct binder_thread, 
> todo);
> binder_dequeue_work_ilocked(>work);
> -   /*
> -* Note: this function is the only place where we 
> queue
> -* directly to a thread->todo without using the
> -* corresponding binder_enqueue_thread_work() helper
> -* functions; in this case it's ok to not set the
> -* process_todo flag, since we know this node work 
> will
> -* always be followed by other work that starts queue
> -* processing: in case of synchronous transactions, a
> -* BR_REPLY or BR_ERROR; in case of oneway
> -* transactions, a BR_TRANSACTION_COMPLETE.
> -*/
> -   binder_enqueue_work_ilocked(>work, target_list);
> +   BUG_ON(>todo != target_list);
> +   binder_enqueue_deferred_thread_work_ilocked(thread,
> +  
> >work);
> }
> } else {
> if (!internal)
> @@ -2723,6 +2718,7 @@ static void binder_transaction(struct binder_proc *proc,
>  {
> int ret;
> struct binder_transaction *t;
> +   struct binder_work *w;
> struct binder_work *tcomplete;
> binder_size_t *offp, *off_end, *off_start;
> binder_size_t off_min;
> @@ -2864,6 +2860,2

Re: [PATCH 1/4] file: export __alloc_fd()

2018-07-31 Thread Martijn Coenen
On Tue, Jul 31, 2018 at 12:07 PM, Christian Brauner
 wrote:
> Ah, this wasn't brought up in the original thread when discussing to
> turn this into a module. If using internal functions like this is going
> away it makes sense to wait for this work to happen first. Is there a
> time-frame for this?

Not yet, it depends on the solution and who has cycles to work on it.

Martijn

>
> Thanks!
> Christian
>
>>
>> >
>> > But I think the binder user-space API relies on this.  The userspace API
>> > seems to rely on passing fd numbers around ... but I'm having trouble
>> > figuring most of this user API out.  Perhaps Martijn can help here.
>>
>> The UAPI does expect a file descriptor, but we may be able to do the
>> mapping from fd to struct file (and vice-versa) in the kernel driver,
>> so userspace wouldn't notice.
>>
>> Thanks,
>> Martijn


Re: [PATCH 1/4] file: export __alloc_fd()

2018-07-31 Thread Martijn Coenen
On Tue, Jul 31, 2018 at 12:07 PM, Christian Brauner
 wrote:
> Ah, this wasn't brought up in the original thread when discussing to
> turn this into a module. If using internal functions like this is going
> away it makes sense to wait for this work to happen first. Is there a
> time-frame for this?

Not yet, it depends on the solution and who has cycles to work on it.

Martijn

>
> Thanks!
> Christian
>
>>
>> >
>> > But I think the binder user-space API relies on this.  The userspace API
>> > seems to rely on passing fd numbers around ... but I'm having trouble
>> > figuring most of this user API out.  Perhaps Martijn can help here.
>>
>> The UAPI does expect a file descriptor, but we may be able to do the
>> mapping from fd to struct file (and vice-versa) in the kernel driver,
>> so userspace wouldn't notice.
>>
>> Thanks,
>> Martijn


Re: [PATCH 1/4] file: export __alloc_fd()

2018-07-31 Thread Martijn Coenen
On Mon, Jul 30, 2018 at 10:36 PM, Matthew Wilcox  wrote:
> I'm not entirely sure I understand the binder code (... does anyone?)
> but from what I can see, it intends to open a file descriptor in the
> process which is the target of the message being sent.

You're right.

> That strikes
> me as wrong-headed; it should be allocating a struct file and passing
> that file to the other process.  When that process receives the message,
> *it* allocates a file descriptor for itself.ho

We're looking into cleaning this up (historically it was done this way
because VIVT caches made this not very efficient), and this is indeed
a very good candidate for fixing it.

>
> But I think the binder user-space API relies on this.  The userspace API
> seems to rely on passing fd numbers around ... but I'm having trouble
> figuring most of this user API out.  Perhaps Martijn can help here.

The UAPI does expect a file descriptor, but we may be able to do the
mapping from fd to struct file (and vice-versa) in the kernel driver,
so userspace wouldn't notice.

Thanks,
Martijn


Re: [PATCH 1/4] file: export __alloc_fd()

2018-07-31 Thread Martijn Coenen
On Mon, Jul 30, 2018 at 10:36 PM, Matthew Wilcox  wrote:
> I'm not entirely sure I understand the binder code (... does anyone?)
> but from what I can see, it intends to open a file descriptor in the
> process which is the target of the message being sent.

You're right.

> That strikes
> me as wrong-headed; it should be allocating a struct file and passing
> that file to the other process.  When that process receives the message,
> *it* allocates a file descriptor for itself.ho

We're looking into cleaning this up (historically it was done this way
because VIVT caches made this not very efficient), and this is indeed
a very good candidate for fixing it.

>
> But I think the binder user-space API relies on this.  The userspace API
> seems to rely on passing fd numbers around ... but I'm having trouble
> figuring most of this user API out.  Perhaps Martijn can help here.

The UAPI does expect a file descriptor, but we may be able to do the
mapping from fd to struct file (and vice-versa) in the kernel driver,
so userspace wouldn't notice.

Thanks,
Martijn


Re: [PATCH 2/6] module: add support for symbol namespaces.

2018-07-26 Thread Martijn Coenen
On Wed, Jul 25, 2018 at 6:48 PM, Lucas De Marchi
 wrote:
> this might be because you are looking to the wrong project
> (module-init-tools) rather than kmod that replaced it some years ago?

Oh yes indeed, thanks for pointing that out :-)


>> Just scanning the depmod source code, it seems really trivial to just
>> have depmod accommodate the new symbol name format. Adding Lucas (kmod
>> maintainer) to CC.
>
> Yep, that would be easy and allow depmod to be backward compatible
> since we would do nothing if the symbol doesn't have the suffix.
> As for dependency on a new version, this seems trivial enough to be
> backported to previous releases used on distros so even if they are
> not rolling they would get a compatible depmod.

Thanks for the suggestion. I had also considered modifying the script
that generates System.map to strip out the namespace; but that seems a
bit hacky since it will then mismatch with the actual symbol table
that's in the kernel image. The situation with depmod has gotten me a
bit worried that perhaps there are more tools that look at System.map
or the kernel's symbol table that won't be able to understand the new
format. But of course those could be updated, too.

Another alternative I was considering is leaving the namespace out of
the __ksymtab symbols, and instead for the symbols that have a
namespace, have an entry in __ksymtab (without namespace) and an entry
in a new section __ksymtab_ns (with namespace). We can then discard
__ksymtab_ns when building the final executable/modules, since that
information is not needed at runtime (it's encoded in struct
kernel_symbol). What makes this a bit more complex is that modpost
takes the fully linked version of vmlinux which would already have
__ksymtab_ns stripped, so we'd need to use objcopy to copy out that
section so modpost can use it to read vmlinux's namespaced symbols.
Modules wouldn't have that problem, since modpost processes modules
before their final linking step.

This last approach is more complex than just modifying depmod, but it
does have the benefit that we won't be break userspace tools depending
on the kernel __ksymtab section symbols. On the other hand, perhaps
userspace tools would like to have the namespace information at some
point.

For now I'll keep things as is and write up a patch for depmod, but if
others prefer the __ksymtab_ns approach I described above I don't mind
making that work.

Thanks,
Martijn

>
> CC'ing linux-modu...@vger.kernel.org to keep track of this there
>
>
> thanks
> Lucas De Marchi
>
>>
>> Thanks,
>>
>> Jessica
>>
>> >
>> >>
>> >> On x86_64 I saw no difference in binary size (compression), but at
>> >> runtime this will require a word of memory per export to hold the
>> >> namespace. An alternative could be to store namespaced symbols in their
>> >> own section and use a separate 'struct namespaced_kernel_symbol' for
>> >> that section, at the cost of making the module loader more complex.
>> >>
>> >> Signed-off-by: Martijn Coenen 
>> >> ---
>> >>  include/asm-generic/export.h |  2 +-
>> >>  include/linux/export.h   | 83 +++-
>> >>  include/linux/module.h   | 13 ++
>> >>  kernel/module.c  | 79 ++
>> >>  4 files changed, 156 insertions(+), 21 deletions(-)
>> >>
>> >> diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h
>> >> index 68efb950a918..4c3d1afb702f 100644
>> >> --- a/include/asm-generic/export.h
>> >> +++ b/include/asm-generic/export.h
>> >> @@ -29,7 +29,7 @@
>> >> .section ___ksymtab\sec+\name,"a"
>> >> .balign KSYM_ALIGN
>> >>  __ksymtab_\name:
>> >> -   __put \val, __kstrtab_\name
>> >> +   __put \val, __kstrtab_\name, 0
>> >> .previous
>> >> .section __ksymtab_strings,"a"
>> >>  __kstrtab_\name:
>> >> diff --git a/include/linux/export.h b/include/linux/export.h
>> >> index ad6b8e697b27..9f6e70eeb85f 100644
>> >> --- a/include/linux/export.h
>> >> +++ b/include/linux/export.h
>> >> @@ -22,6 +22,11 @@ struct kernel_symbol
>> >>  {
>> >> unsigned long value;
>> >> const char *name;
>> >> +   const char *namespace;
>> >> +};
>> >> +
>> >> +struct namespace_import {
>> >> +   const char *namespace;
>> >>  };
>> >>
>> >>  #ifdef MODULE
>> >> @@ -54,18 +59,28 @

Re: [PATCH 2/6] module: add support for symbol namespaces.

2018-07-26 Thread Martijn Coenen
On Wed, Jul 25, 2018 at 6:48 PM, Lucas De Marchi
 wrote:
> this might be because you are looking to the wrong project
> (module-init-tools) rather than kmod that replaced it some years ago?

Oh yes indeed, thanks for pointing that out :-)


>> Just scanning the depmod source code, it seems really trivial to just
>> have depmod accommodate the new symbol name format. Adding Lucas (kmod
>> maintainer) to CC.
>
> Yep, that would be easy and allow depmod to be backward compatible
> since we would do nothing if the symbol doesn't have the suffix.
> As for dependency on a new version, this seems trivial enough to be
> backported to previous releases used on distros so even if they are
> not rolling they would get a compatible depmod.

Thanks for the suggestion. I had also considered modifying the script
that generates System.map to strip out the namespace; but that seems a
bit hacky since it will then mismatch with the actual symbol table
that's in the kernel image. The situation with depmod has gotten me a
bit worried that perhaps there are more tools that look at System.map
or the kernel's symbol table that won't be able to understand the new
format. But of course those could be updated, too.

Another alternative I was considering is leaving the namespace out of
the __ksymtab symbols, and instead for the symbols that have a
namespace, have an entry in __ksymtab (without namespace) and an entry
in a new section __ksymtab_ns (with namespace). We can then discard
__ksymtab_ns when building the final executable/modules, since that
information is not needed at runtime (it's encoded in struct
kernel_symbol). What makes this a bit more complex is that modpost
takes the fully linked version of vmlinux which would already have
__ksymtab_ns stripped, so we'd need to use objcopy to copy out that
section so modpost can use it to read vmlinux's namespaced symbols.
Modules wouldn't have that problem, since modpost processes modules
before their final linking step.

This last approach is more complex than just modifying depmod, but it
does have the benefit that we won't be break userspace tools depending
on the kernel __ksymtab section symbols. On the other hand, perhaps
userspace tools would like to have the namespace information at some
point.

For now I'll keep things as is and write up a patch for depmod, but if
others prefer the __ksymtab_ns approach I described above I don't mind
making that work.

Thanks,
Martijn

>
> CC'ing linux-modu...@vger.kernel.org to keep track of this there
>
>
> thanks
> Lucas De Marchi
>
>>
>> Thanks,
>>
>> Jessica
>>
>> >
>> >>
>> >> On x86_64 I saw no difference in binary size (compression), but at
>> >> runtime this will require a word of memory per export to hold the
>> >> namespace. An alternative could be to store namespaced symbols in their
>> >> own section and use a separate 'struct namespaced_kernel_symbol' for
>> >> that section, at the cost of making the module loader more complex.
>> >>
>> >> Signed-off-by: Martijn Coenen 
>> >> ---
>> >>  include/asm-generic/export.h |  2 +-
>> >>  include/linux/export.h   | 83 +++-
>> >>  include/linux/module.h   | 13 ++
>> >>  kernel/module.c  | 79 ++
>> >>  4 files changed, 156 insertions(+), 21 deletions(-)
>> >>
>> >> diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h
>> >> index 68efb950a918..4c3d1afb702f 100644
>> >> --- a/include/asm-generic/export.h
>> >> +++ b/include/asm-generic/export.h
>> >> @@ -29,7 +29,7 @@
>> >> .section ___ksymtab\sec+\name,"a"
>> >> .balign KSYM_ALIGN
>> >>  __ksymtab_\name:
>> >> -   __put \val, __kstrtab_\name
>> >> +   __put \val, __kstrtab_\name, 0
>> >> .previous
>> >> .section __ksymtab_strings,"a"
>> >>  __kstrtab_\name:
>> >> diff --git a/include/linux/export.h b/include/linux/export.h
>> >> index ad6b8e697b27..9f6e70eeb85f 100644
>> >> --- a/include/linux/export.h
>> >> +++ b/include/linux/export.h
>> >> @@ -22,6 +22,11 @@ struct kernel_symbol
>> >>  {
>> >> unsigned long value;
>> >> const char *name;
>> >> +   const char *namespace;
>> >> +};
>> >> +
>> >> +struct namespace_import {
>> >> +   const char *namespace;
>> >>  };
>> >>
>> >>  #ifdef MODULE
>> >> @@ -54,18 +59,28 @

Re: drivers/android/.tmp_gl_binder.o:undefined reference to `__user_bad'

2018-07-02 Thread Martijn Coenen
Hi Michal,

See below - this error was introduced because the binder driver uses a
64-bit get_user(), even on a 32-bit architecture. Would it be possible
to implement a 64-bit get_user() for microblaze?

Thanks,
Martijn

On Mon, Jul 2, 2018 at 6:53 AM kbuild test robot  wrote:
>
> Hi Martijn,
>
> FYI, the error/warning still remains.
>
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   021c91791a5e7e85c567452f1be3e4c2c6cb6063
> commit: 1190b4e38f97023154e6b3bef61b251aa5f970d0 ANDROID: binder: remove 
> 32-bit binder interface.
> date:   7 weeks ago
> config: microblaze-allmodconfig (attached as .config)
> compiler: microblaze-linux-gcc (GCC) 8.1.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout 1190b4e38f97023154e6b3bef61b251aa5f970d0
> # save the attached .config to linux build tree
> GCC_VERSION=8.1.0 make.cross ARCH=microblaze
>
> All errors (new ones prefixed by >>):
>
>drivers/android/binder.o: In function `binder_thread_write':
> >> drivers/android/.tmp_gl_binder.o:(.text+0xcbb0): undefined reference to 
> >> `__user_bad'
>drivers/android/.tmp_gl_binder.o:(.text+0xcbdc): undefined reference to 
> `__user_bad'
>drivers/android/.tmp_gl_binder.o:(.text+0xcfc4): undefined reference to 
> `__user_bad'
>drivers/android/.tmp_gl_binder.o:(.text+0xd650): undefined reference to 
> `__user_bad'
>drivers/android/.tmp_gl_binder.o:(.text+0xdbc8): undefined reference to 
> `__user_bad'
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: drivers/android/.tmp_gl_binder.o:undefined reference to `__user_bad'

2018-07-02 Thread Martijn Coenen
Hi Michal,

See below - this error was introduced because the binder driver uses a
64-bit get_user(), even on a 32-bit architecture. Would it be possible
to implement a 64-bit get_user() for microblaze?

Thanks,
Martijn

On Mon, Jul 2, 2018 at 6:53 AM kbuild test robot  wrote:
>
> Hi Martijn,
>
> FYI, the error/warning still remains.
>
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   021c91791a5e7e85c567452f1be3e4c2c6cb6063
> commit: 1190b4e38f97023154e6b3bef61b251aa5f970d0 ANDROID: binder: remove 
> 32-bit binder interface.
> date:   7 weeks ago
> config: microblaze-allmodconfig (attached as .config)
> compiler: microblaze-linux-gcc (GCC) 8.1.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout 1190b4e38f97023154e6b3bef61b251aa5f970d0
> # save the attached .config to linux build tree
> GCC_VERSION=8.1.0 make.cross ARCH=microblaze
>
> All errors (new ones prefixed by >>):
>
>drivers/android/binder.o: In function `binder_thread_write':
> >> drivers/android/.tmp_gl_binder.o:(.text+0xcbb0): undefined reference to 
> >> `__user_bad'
>drivers/android/.tmp_gl_binder.o:(.text+0xcbdc): undefined reference to 
> `__user_bad'
>drivers/android/.tmp_gl_binder.o:(.text+0xcfc4): undefined reference to 
> `__user_bad'
>drivers/android/.tmp_gl_binder.o:(.text+0xd650): undefined reference to 
> `__user_bad'
>drivers/android/.tmp_gl_binder.o:(.text+0xdbc8): undefined reference to 
> `__user_bad'
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH 2/2 v2] staging: android: ashmem: Fix mmap size validation

2018-06-22 Thread Martijn Coenen
On Thu, Jun 21, 2018 at 1:29 AM, Joel Fernandes  wrote:
> Also if you look at the kernel sources, there are dozens of drivers that
> check for correct VMA size in mmap handler and fail if it isn't sized
> correctly.

If that's the case, we should definitely do it this way for ashmem as
well. Since its size is fixed after creation, preventing anyone from
mapping a larger size seems reasonable to me.

Reviewed-by: Martijn Coenen 

>
> thanks!
>
>  - Joel
>


Re: [PATCH 2/2 v2] staging: android: ashmem: Fix mmap size validation

2018-06-22 Thread Martijn Coenen
On Thu, Jun 21, 2018 at 1:29 AM, Joel Fernandes  wrote:
> Also if you look at the kernel sources, there are dozens of drivers that
> check for correct VMA size in mmap handler and fail if it isn't sized
> correctly.

If that's the case, we should definitely do it this way for ashmem as
well. Since its size is fixed after creation, preventing anyone from
mapping a larger size seems reasonable to me.

Reviewed-by: Martijn Coenen 

>
> thanks!
>
>  - Joel
>


Re: [PATCH v6] ANDROID: binder: change down_write to down_read

2018-05-15 Thread Martijn Coenen
On Tue, May 15, 2018 at 8:27 AM, Minchan Kim <minchan.ker...@gmail.com> wrote:
> Hi Joel,
>
> Sorry for the late response. I was off.
>
> On Wed, May 09, 2018 at 04:19:41PM -0700, Joel Fernandes wrote:
>> On Wed, May 09, 2018 at 03:40:23PM +0900, Minchan Kim wrote:
>> > On Tue, May 08, 2018 at 04:08:13PM -0700, Joel Fernandes wrote:
>> > > On Tue, May 08, 2018 at 07:51:01PM +0900, Minchan Kim wrote:
>> > > > On Mon, May 07, 2018 at 10:28:29AM -0700, Joel Fernandes wrote:
>> > > > > On Mon, May 07, 2018 at 11:15:37PM +0900, Minchan Kim wrote:
>> > > > > > binder_update_page_range needs down_write of mmap_sem because
>> > > > > > vm_insert_page need to change vma->vm_flags to VM_MIXEDMAP unless
>> > > > > > it is set. However, when I profile binder working, it seems
>> > > > > > every binder buffers should be mapped in advance by binder_mmap.
>> > > > > > It means we could set VM_MIXEDMAP in binder_mmap time which is
>> > > > > > already hold a mmap_sem as down_write so binder_update_page_range
>> > > > > > doesn't need to hold a mmap_sem as down_write.
>> > > > > > Please use proper API down_read. It would help mmap_sem contention
>> > > > > > problem as well as fixing down_write abuse.
>> > > > > >
>> > > > > > Ganesh Mahendran tested app launching and binder throughput test
>> > > > > > and he said he couldn't find any problem and I did binder latency
>> > > > > > test per Greg KH request(Thanks Martijn to teach me how I can do)
>> > > > > > I cannot find any problem, too.
>> > > > > >
>> > > > > > Cc: Ganesh Mahendran <opensource.gan...@gmail.com>
>> > > > > > Cc: Joe Perches <j...@perches.com>
>> > > > > > Cc: Arve Hjønnevåg <a...@android.com>
>> > > > > > Cc: Todd Kjos <tk...@google.com>
>> > > > > > Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>> > > > > > Reviewed-by: Martijn Coenen <m...@android.com>
>> > > > > > Signed-off-by: Minchan Kim <minc...@kernel.org>
>> > > > > > ---
>> > > > > >  drivers/android/binder.c   | 4 +++-
>> > > > > >  drivers/android/binder_alloc.c | 6 +++---
>> > > > > >  2 files changed, 6 insertions(+), 4 deletions(-)
>> > > > > >
>> > > > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
>> > > > > > index 4eab5be3d00f..7b8e96f60719 100644
>> > > > > > --- a/drivers/android/binder.c
>> > > > > > +++ b/drivers/android/binder.c
>> > > > > > @@ -4730,7 +4730,9 @@ static int binder_mmap(struct file *filp, 
>> > > > > > struct vm_area_struct *vma)
>> > > > > > failure_string = "bad vm_flags";
>> > > > > > goto err_bad_arg;
>> > > > > > }
>> > > > > > -   vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE;
>> > > > > > +   vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP;
>> > > > > > +   vma->vm_flags &= ~VM_MAYWRITE;
>> > > > > > +
>> > > > > > vma->vm_ops = _vm_ops;
>> > > > > > vma->vm_private_data = proc;
>> > > > > >
>> > > > > > diff --git a/drivers/android/binder_alloc.c 
>> > > > > > b/drivers/android/binder_alloc.c
>> > > > > > index 5a426c877dfb..4f382d51def1 100644
>> > > > > > --- a/drivers/android/binder_alloc.c
>> > > > > > +++ b/drivers/android/binder_alloc.c
>> > > > > > @@ -219,7 +219,7 @@ static int binder_update_page_range(struct 
>> > > > > > binder_alloc *alloc, int allocate,
>> > > > > > mm = alloc->vma_vm_mm;
>> > > > > >
>> > > > > > if (mm) {
>> > > > > > -   down_write(>mmap_sem);
>> > > > > > +   down_read(>mmap_sem);
>> > > > >
>> > > > >
>> > > > > Nice. Is there a need to hold the reader-lock at all here? Just 
>> > > > > curious what
>> > > > > else is it pr

Re: [PATCH v6] ANDROID: binder: change down_write to down_read

2018-05-15 Thread Martijn Coenen
On Tue, May 15, 2018 at 8:27 AM, Minchan Kim  wrote:
> Hi Joel,
>
> Sorry for the late response. I was off.
>
> On Wed, May 09, 2018 at 04:19:41PM -0700, Joel Fernandes wrote:
>> On Wed, May 09, 2018 at 03:40:23PM +0900, Minchan Kim wrote:
>> > On Tue, May 08, 2018 at 04:08:13PM -0700, Joel Fernandes wrote:
>> > > On Tue, May 08, 2018 at 07:51:01PM +0900, Minchan Kim wrote:
>> > > > On Mon, May 07, 2018 at 10:28:29AM -0700, Joel Fernandes wrote:
>> > > > > On Mon, May 07, 2018 at 11:15:37PM +0900, Minchan Kim wrote:
>> > > > > > binder_update_page_range needs down_write of mmap_sem because
>> > > > > > vm_insert_page need to change vma->vm_flags to VM_MIXEDMAP unless
>> > > > > > it is set. However, when I profile binder working, it seems
>> > > > > > every binder buffers should be mapped in advance by binder_mmap.
>> > > > > > It means we could set VM_MIXEDMAP in binder_mmap time which is
>> > > > > > already hold a mmap_sem as down_write so binder_update_page_range
>> > > > > > doesn't need to hold a mmap_sem as down_write.
>> > > > > > Please use proper API down_read. It would help mmap_sem contention
>> > > > > > problem as well as fixing down_write abuse.
>> > > > > >
>> > > > > > Ganesh Mahendran tested app launching and binder throughput test
>> > > > > > and he said he couldn't find any problem and I did binder latency
>> > > > > > test per Greg KH request(Thanks Martijn to teach me how I can do)
>> > > > > > I cannot find any problem, too.
>> > > > > >
>> > > > > > Cc: Ganesh Mahendran 
>> > > > > > Cc: Joe Perches 
>> > > > > > Cc: Arve Hjønnevåg 
>> > > > > > Cc: Todd Kjos 
>> > > > > > Cc: Greg Kroah-Hartman 
>> > > > > > Reviewed-by: Martijn Coenen 
>> > > > > > Signed-off-by: Minchan Kim 
>> > > > > > ---
>> > > > > >  drivers/android/binder.c   | 4 +++-
>> > > > > >  drivers/android/binder_alloc.c | 6 +++---
>> > > > > >  2 files changed, 6 insertions(+), 4 deletions(-)
>> > > > > >
>> > > > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
>> > > > > > index 4eab5be3d00f..7b8e96f60719 100644
>> > > > > > --- a/drivers/android/binder.c
>> > > > > > +++ b/drivers/android/binder.c
>> > > > > > @@ -4730,7 +4730,9 @@ static int binder_mmap(struct file *filp, 
>> > > > > > struct vm_area_struct *vma)
>> > > > > > failure_string = "bad vm_flags";
>> > > > > > goto err_bad_arg;
>> > > > > > }
>> > > > > > -   vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE;
>> > > > > > +   vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP;
>> > > > > > +   vma->vm_flags &= ~VM_MAYWRITE;
>> > > > > > +
>> > > > > > vma->vm_ops = _vm_ops;
>> > > > > > vma->vm_private_data = proc;
>> > > > > >
>> > > > > > diff --git a/drivers/android/binder_alloc.c 
>> > > > > > b/drivers/android/binder_alloc.c
>> > > > > > index 5a426c877dfb..4f382d51def1 100644
>> > > > > > --- a/drivers/android/binder_alloc.c
>> > > > > > +++ b/drivers/android/binder_alloc.c
>> > > > > > @@ -219,7 +219,7 @@ static int binder_update_page_range(struct 
>> > > > > > binder_alloc *alloc, int allocate,
>> > > > > > mm = alloc->vma_vm_mm;
>> > > > > >
>> > > > > > if (mm) {
>> > > > > > -   down_write(>mmap_sem);
>> > > > > > +   down_read(>mmap_sem);
>> > > > >
>> > > > >
>> > > > > Nice. Is there a need to hold the reader-lock at all here? Just 
>> > > > > curious what
>> > > > > else is it protecting (here or in vm_insert_page).
>> > > >
>> > > > It should protect vm_area_struct. IOW, when we try insert page into 
>> > > > virtual address area,
>> > >

Re: [PATCH] m68k/uaccess: Revive 64-bit get_user()

2018-05-14 Thread Martijn Coenen
On Mon, May 14, 2018 at 3:57 PM, Geert Uytterhoeven
<ge...@linux-m68k.org> wrote:
> Revive support for 64-bit get_user(), which was disabled in commit
> d94af931af42152e ("[PATCH] m68k: clean up uaccess.h") due to a "broken"
> typeof in (then brand new) gcc-4.1.
>
>   - Keep on using u64 for the temporary, as __typeof__() doesn't drop
> the const qualifier,
>   - Move it into a union (like mips32 does) to get rid of the cast, as
> using get_user() to fetch a __user pointer would cause a "cast to
> pointer from integer of different size" warning otherwise.

Can't vouch for the existing asm, but the changes to it look good to
me otherwise.

>
> Signed-off-by: Geert Uytterhoeven <ge...@linux-m68k.org>

Acked-by: Martijn Coenen <m...@android.com>

> ---
>  arch/m68k/include/asm/uaccess_mm.h | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/arch/m68k/include/asm/uaccess_mm.h 
> b/arch/m68k/include/asm/uaccess_mm.h
> index 75c172e909acdf18..c4cb889660aa0c35 100644
> --- a/arch/m68k/include/asm/uaccess_mm.h
> +++ b/arch/m68k/include/asm/uaccess_mm.h
> @@ -141,10 +141,12 @@ asm volatile ("\n"  
>   \
> case 4: \
> __get_user_asm(__gu_err, x, ptr, u32, l, r, -EFAULT);   \
> break;  \
> -/* case 8: disabled because gcc-4.1 has a broken typeof\
> -   {   \
> -   const void *__gu_ptr = (ptr);   \
> -   u64 __gu_val;   \
> +   case 8: {   \
> +   const void *__gu_ptr = (ptr);   \
> +   union { \
> +   u64 l;  \
> +   __typeof__(*(ptr)) t;   \
> +   } __gu_val; \
> asm volatile ("\n"  \
> "1: "MOVES".l   (%2)+,%1\n" \
> "2: "MOVES".l   (%2),%R1\n" \
> @@ -162,13 +164,13 @@ asm volatile ("\n"  
>   \
> "   .long   1b,10b\n"   \
> "   .long   2b,10b\n"   \
> "   .previous"  \
> -   : "+d" (__gu_err), "=" (__gu_val),\
> +   : "+d" (__gu_err), "=" (__gu_val.l),  \
>   "+a" (__gu_ptr)   \
> : "i" (-EFAULT) \
> : "memory");\
> -   (x) = (__force typeof(*(ptr)))__gu_val; \
> +   (x) = __gu_val.t;   \
> break;  \
> -   }   */  \
> +   }   \
> default:\
> __gu_err = __get_user_bad();\
> break;  \
> --
> 2.7.4
>


Re: [PATCH] m68k/uaccess: Revive 64-bit get_user()

2018-05-14 Thread Martijn Coenen
On Mon, May 14, 2018 at 3:57 PM, Geert Uytterhoeven
 wrote:
> Revive support for 64-bit get_user(), which was disabled in commit
> d94af931af42152e ("[PATCH] m68k: clean up uaccess.h") due to a "broken"
> typeof in (then brand new) gcc-4.1.
>
>   - Keep on using u64 for the temporary, as __typeof__() doesn't drop
> the const qualifier,
>   - Move it into a union (like mips32 does) to get rid of the cast, as
> using get_user() to fetch a __user pointer would cause a "cast to
> pointer from integer of different size" warning otherwise.

Can't vouch for the existing asm, but the changes to it look good to
me otherwise.

>
> Signed-off-by: Geert Uytterhoeven 

Acked-by: Martijn Coenen 

> ---
>  arch/m68k/include/asm/uaccess_mm.h | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/arch/m68k/include/asm/uaccess_mm.h 
> b/arch/m68k/include/asm/uaccess_mm.h
> index 75c172e909acdf18..c4cb889660aa0c35 100644
> --- a/arch/m68k/include/asm/uaccess_mm.h
> +++ b/arch/m68k/include/asm/uaccess_mm.h
> @@ -141,10 +141,12 @@ asm volatile ("\n"  
>   \
> case 4: \
> __get_user_asm(__gu_err, x, ptr, u32, l, r, -EFAULT);   \
> break;  \
> -/* case 8: disabled because gcc-4.1 has a broken typeof\
> -   {   \
> -   const void *__gu_ptr = (ptr);   \
> -   u64 __gu_val;   \
> +   case 8: {   \
> +   const void *__gu_ptr = (ptr);   \
> +   union { \
> +   u64 l;  \
> +   __typeof__(*(ptr)) t;   \
> +   } __gu_val; \
> asm volatile ("\n"  \
> "1: "MOVES".l   (%2)+,%1\n" \
> "2: "MOVES".l   (%2),%R1\n" \
> @@ -162,13 +164,13 @@ asm volatile ("\n"  
>   \
> "   .long   1b,10b\n"   \
> "   .long   2b,10b\n"   \
> "   .previous"  \
> -   : "+d" (__gu_err), "=" (__gu_val),\
> +   : "+d" (__gu_err), "=" (__gu_val.l),  \
>   "+a" (__gu_ptr)   \
> : "i" (-EFAULT) \
> : "memory");\
> -   (x) = (__force typeof(*(ptr)))__gu_val; \
> +   (x) = __gu_val.t;   \
> break;  \
> -   }   */  \
> +   }   \
> default:\
> __gu_err = __get_user_bad();\
> break;  \
> --
> 2.7.4
>


Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.

2018-05-14 Thread Martijn Coenen
On Mon, May 14, 2018 at 4:00 PM, Geert Uytterhoeven
 wrote:
> Patch sent.

Thanks for the quick turn-around!

>
> BTW, sh also doesn't seem to have 64-bit get_user().
> There may be others.

I checked quickly, nios2 is the only other arch that explicitly
doesn't support it and would result in a build error; some other archs
don't define __get_user, but in that case they just fall back to
raw_copy_from_user().

>
> BTW2, does the Android Binder need to care about endianness when talking
> to userspace?

No, I don't think it should.

Thanks,
Martijn

>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.

2018-05-14 Thread Martijn Coenen
On Mon, May 14, 2018 at 4:00 PM, Geert Uytterhoeven
 wrote:
> Patch sent.

Thanks for the quick turn-around!

>
> BTW, sh also doesn't seem to have 64-bit get_user().
> There may be others.

I checked quickly, nios2 is the only other arch that explicitly
doesn't support it and would result in a build error; some other archs
don't define __get_user, but in that case they just fall back to
raw_copy_from_user().

>
> BTW2, does the Android Binder need to care about endianness when talking
> to userspace?

No, I don't think it should.

Thanks,
Martijn

>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


[PATCH v2] ANDROID: binder: remove 32-bit binder interface.

2018-05-11 Thread Martijn Coenen
From: Martijn Coenen <m...@google.com>

New devices launching with Android P need to use the 64-bit
binder interface, even on 32-bit SoCs [0].

This change removes the Kconfig option to select the 32-bit
binder interface. We don't think this will affect existing
userspace for the following reasons:
1) The latest Android common tree is 4.14, so we don't
   believe any Android devices are on kernels >4.14.
2) Android devices launch on an LTS release and stick with
   it, so we wouldn't expect devices running on <= 4.14 now
   to upgrade to 4.17 or later. But even if they did, they'd
   rebuild the world (kernel + userspace) anyway.
3) Other userspaces like 'anbox' are already using the
   64-bit interface.

Note that this change doesn't remove the 32-bit UAPI
itself; the reason for that is that Android userspace
always uses the latest UAPI headers from upstream, and
userspace retains 32-bit support for devices that are
upgrading. This will be removed as well in 2-3 years,
at which point we can remove the code from the UAPI
as well.

Finally, this change introduces build errors on archs where
64-bit get_user/put_user is not supported, so make binder
unavailable on m68k (which wouldn't want it anyway).

[0]: https://android-review.googlesource.com/c/platform/build/+/595193

Signed-off-by: Martijn Coenen <m...@android.com>
---
Changes in v2:
  - Dropped support for m68k to avoid build errors.

 drivers/android/Kconfig  | 15 +--
 drivers/android/binder.c |  4 
 2 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 7dce3795b887..ee4880bfdcdc 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -10,7 +10,7 @@ if ANDROID
 
 config ANDROID_BINDER_IPC
bool "Android Binder IPC Driver"
-   depends on MMU
+   depends on MMU && !M68K
default n
---help---
  Binder is used in Android for both communication between processes,
@@ -32,19 +32,6 @@ config ANDROID_BINDER_DEVICES
  created. Each binder device has its own context manager, and is
  therefore logically separated from the other devices.
 
-config ANDROID_BINDER_IPC_32BIT
-   bool "Use old (Android 4.4 and earlier) 32-bit binder API"
-   depends on !64BIT && ANDROID_BINDER_IPC
-   default y
-   ---help---
- The Binder API has been changed to support both 32 and 64bit
- applications in a mixed environment.
-
- Enable this to support an old 32-bit Android user-space (v4.4 and
- earlier).
-
- Note that enabling this will break newer Android user-space.
-
 config ANDROID_BINDER_IPC_SELFTEST
bool "Android Binder IPC Driver Selftest"
depends on ANDROID_BINDER_IPC
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index e578eee31589..2ee9fb02dfb8 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -72,10 +72,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT
-#define BINDER_IPC_32BIT 1
-#endif
-
 #include 
 #include "binder_alloc.h"
 #include "binder_trace.h"
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCH v2] ANDROID: binder: remove 32-bit binder interface.

2018-05-11 Thread Martijn Coenen
From: Martijn Coenen 

New devices launching with Android P need to use the 64-bit
binder interface, even on 32-bit SoCs [0].

This change removes the Kconfig option to select the 32-bit
binder interface. We don't think this will affect existing
userspace for the following reasons:
1) The latest Android common tree is 4.14, so we don't
   believe any Android devices are on kernels >4.14.
2) Android devices launch on an LTS release and stick with
   it, so we wouldn't expect devices running on <= 4.14 now
   to upgrade to 4.17 or later. But even if they did, they'd
   rebuild the world (kernel + userspace) anyway.
3) Other userspaces like 'anbox' are already using the
   64-bit interface.

Note that this change doesn't remove the 32-bit UAPI
itself; the reason for that is that Android userspace
always uses the latest UAPI headers from upstream, and
userspace retains 32-bit support for devices that are
upgrading. This will be removed as well in 2-3 years,
at which point we can remove the code from the UAPI
as well.

Finally, this change introduces build errors on archs where
64-bit get_user/put_user is not supported, so make binder
unavailable on m68k (which wouldn't want it anyway).

[0]: https://android-review.googlesource.com/c/platform/build/+/595193

Signed-off-by: Martijn Coenen 
---
Changes in v2:
  - Dropped support for m68k to avoid build errors.

 drivers/android/Kconfig  | 15 +--
 drivers/android/binder.c |  4 
 2 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 7dce3795b887..ee4880bfdcdc 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -10,7 +10,7 @@ if ANDROID
 
 config ANDROID_BINDER_IPC
bool "Android Binder IPC Driver"
-   depends on MMU
+   depends on MMU && !M68K
default n
---help---
  Binder is used in Android for both communication between processes,
@@ -32,19 +32,6 @@ config ANDROID_BINDER_DEVICES
  created. Each binder device has its own context manager, and is
  therefore logically separated from the other devices.
 
-config ANDROID_BINDER_IPC_32BIT
-   bool "Use old (Android 4.4 and earlier) 32-bit binder API"
-   depends on !64BIT && ANDROID_BINDER_IPC
-   default y
-   ---help---
- The Binder API has been changed to support both 32 and 64bit
- applications in a mixed environment.
-
- Enable this to support an old 32-bit Android user-space (v4.4 and
- earlier).
-
- Note that enabling this will break newer Android user-space.
-
 config ANDROID_BINDER_IPC_SELFTEST
bool "Android Binder IPC Driver Selftest"
depends on ANDROID_BINDER_IPC
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index e578eee31589..2ee9fb02dfb8 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -72,10 +72,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT
-#define BINDER_IPC_32BIT 1
-#endif
-
 #include 
 #include "binder_alloc.h"
 #include "binder_trace.h"
-- 
2.17.0.441.gb46fe60e1d-goog



Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.

2018-05-11 Thread Martijn Coenen
On Fri, May 11, 2018 at 10:08 AM, Greg KH  wrote:
> I think using !CONFIG_M68K is a good start.  We can blacklist any other
> arch that doesn't support this, and that list should be small as I doubt
> any new ones will be added without this support.

Thanks, I will send a v2.

>
> thanks,
>
> greg k-h


Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.

2018-05-11 Thread Martijn Coenen
On Fri, May 11, 2018 at 10:08 AM, Greg KH  wrote:
> I think using !CONFIG_M68K is a good start.  We can blacklist any other
> arch that doesn't support this, and that list should be small as I doubt
> any new ones will be added without this support.

Thanks, I will send a v2.

>
> thanks,
>
> greg k-h


Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.

2018-05-11 Thread Martijn Coenen
On Sat, May 5, 2018 at 2:10 PM, kbuild test robot  wrote:
>drivers/android/binder.o: In function `binder_thread_write':
>>> binder.c:(.text+0x6a16): undefined reference to `__get_user_bad'

Looks like m68k doesn't support 64-bit get_user(). I could just have
binder depend on !CONFIG_M68K, but there may be other architectures
still that don't support this. Another alternative would be to
whitelist the architectures Android supports - eg arm, arm64, x86,
x86_64. But I'm not sure if arch-limited drivers are considered bad
form. Does anybody have suggestions for how to deal with this?

Thanks,
Martijn

>binder.c:(.text+0x6c9a): undefined reference to `__get_user_bad'
>binder.c:(.text+0x701e): undefined reference to `__get_user_bad'
>binder.c:(.text+0x7414): undefined reference to `__get_user_bad'
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.

2018-05-11 Thread Martijn Coenen
On Sat, May 5, 2018 at 2:10 PM, kbuild test robot  wrote:
>drivers/android/binder.o: In function `binder_thread_write':
>>> binder.c:(.text+0x6a16): undefined reference to `__get_user_bad'

Looks like m68k doesn't support 64-bit get_user(). I could just have
binder depend on !CONFIG_M68K, but there may be other architectures
still that don't support this. Another alternative would be to
whitelist the architectures Android supports - eg arm, arm64, x86,
x86_64. But I'm not sure if arch-limited drivers are considered bad
form. Does anybody have suggestions for how to deal with this?

Thanks,
Martijn

>binder.c:(.text+0x6c9a): undefined reference to `__get_user_bad'
>binder.c:(.text+0x701e): undefined reference to `__get_user_bad'
>binder.c:(.text+0x7414): undefined reference to `__get_user_bad'
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH] android: binder: Fix a possible data race in binder_alloc_mmap_handler

2018-05-08 Thread Martijn Coenen
On Tue, May 8, 2018 at 2:06 AM, Jia-Ju Bai  wrote:
> The write operations to "alloc->buffer" are protected by
> the lock on line 679 and 730, but the read operation to
> this data on line 712 is not protected by the lock.
> Thus, there may exist a data race for "alloc->buffer".

It's read by the same thread that just wrote it, there is no data race.

The locks at line 679 and 730 protect against multiple threads calling
mmap() at the same time.

>
> To fix this data race, the read operation to "alloc->buffer"
> should be also protected by the lock.
>
> Signed-off-by: Jia-Ju Bai 
> ---
>  drivers/android/binder_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 5a426c877dfb..596acc3a84e4 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -709,7 +709,9 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
> goto err_alloc_buf_struct_failed;
> }
>
> +   mutex_lock(_alloc_mmap_lock);
> buffer->data = alloc->buffer;
> +   mutex_unlock(_alloc_mmap_lock);
> list_add(>entry, >buffers);
> buffer->free = 1;
> binder_insert_free_buffer(alloc, buffer);
> --
> 2.17.0
>


Re: [PATCH] android: binder: Fix a possible data race in binder_alloc_mmap_handler

2018-05-08 Thread Martijn Coenen
On Tue, May 8, 2018 at 2:06 AM, Jia-Ju Bai  wrote:
> The write operations to "alloc->buffer" are protected by
> the lock on line 679 and 730, but the read operation to
> this data on line 712 is not protected by the lock.
> Thus, there may exist a data race for "alloc->buffer".

It's read by the same thread that just wrote it, there is no data race.

The locks at line 679 and 730 protect against multiple threads calling
mmap() at the same time.

>
> To fix this data race, the read operation to "alloc->buffer"
> should be also protected by the lock.
>
> Signed-off-by: Jia-Ju Bai 
> ---
>  drivers/android/binder_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 5a426c877dfb..596acc3a84e4 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -709,7 +709,9 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
> goto err_alloc_buf_struct_failed;
> }
>
> +   mutex_lock(_alloc_mmap_lock);
> buffer->data = alloc->buffer;
> +   mutex_unlock(_alloc_mmap_lock);
> list_add(>entry, >buffers);
> buffer->free = 1;
> binder_insert_free_buffer(alloc, buffer);
> --
> 2.17.0
>


[PATCH] ANDROID: binder: remove 32-bit binder interface.

2018-05-04 Thread Martijn Coenen
New devices launching with Android P need to use the 64-bit
binder interface, even on 32-bit SoCs [0].

This change removes the Kconfig option to select the 32-bit
binder interface. We don't think this will affect existing
userspace for the following reasons:
1) The latest Android common tree is 4.14, so we don't
   believe any Android devices are on kernels >4.14.
2) Android devices launch on an LTS release and stick with
   it, so we wouldn't expect devices running on <= 4.14 now
   to upgrade to 4.17 or later. But even if they did, they'd
   rebuild the world (kernel + userspace) anyway.
3) Other userspaces like 'anbox' are already using the
   64-bit interface.

Note that this change doesn't remove the 32-bit UAPI
itself; the reason for that is that Android userspace
always uses the latest UAPI headers from upstream, and
userspace retains 32-bit support for devices that are
upgrading. This will be removed as well in 2-3 years,
at which point we can remove the code from the UAPI
as well.

[0]: https://android-review.googlesource.com/c/platform/build/+/595193

Signed-off-by: Martijn Coenen <m...@android.com>
---
 drivers/android/Kconfig  | 13 -
 drivers/android/binder.c |  4 
 2 files changed, 17 deletions(-)

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 7dce3795b887..432e9ad77070 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -32,19 +32,6 @@ config ANDROID_BINDER_DEVICES
  created. Each binder device has its own context manager, and is
  therefore logically separated from the other devices.
 
-config ANDROID_BINDER_IPC_32BIT
-   bool "Use old (Android 4.4 and earlier) 32-bit binder API"
-   depends on !64BIT && ANDROID_BINDER_IPC
-   default y
-   ---help---
- The Binder API has been changed to support both 32 and 64bit
- applications in a mixed environment.
-
- Enable this to support an old 32-bit Android user-space (v4.4 and
- earlier).
-
- Note that enabling this will break newer Android user-space.
-
 config ANDROID_BINDER_IPC_SELFTEST
bool "Android Binder IPC Driver Selftest"
depends on ANDROID_BINDER_IPC
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index e578eee31589..2ee9fb02dfb8 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -72,10 +72,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT
-#define BINDER_IPC_32BIT 1
-#endif
-
 #include 
 #include "binder_alloc.h"
 #include "binder_trace.h"
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCH] ANDROID: binder: remove 32-bit binder interface.

2018-05-04 Thread Martijn Coenen
New devices launching with Android P need to use the 64-bit
binder interface, even on 32-bit SoCs [0].

This change removes the Kconfig option to select the 32-bit
binder interface. We don't think this will affect existing
userspace for the following reasons:
1) The latest Android common tree is 4.14, so we don't
   believe any Android devices are on kernels >4.14.
2) Android devices launch on an LTS release and stick with
   it, so we wouldn't expect devices running on <= 4.14 now
   to upgrade to 4.17 or later. But even if they did, they'd
   rebuild the world (kernel + userspace) anyway.
3) Other userspaces like 'anbox' are already using the
   64-bit interface.

Note that this change doesn't remove the 32-bit UAPI
itself; the reason for that is that Android userspace
always uses the latest UAPI headers from upstream, and
userspace retains 32-bit support for devices that are
upgrading. This will be removed as well in 2-3 years,
at which point we can remove the code from the UAPI
as well.

[0]: https://android-review.googlesource.com/c/platform/build/+/595193

Signed-off-by: Martijn Coenen 
---
 drivers/android/Kconfig  | 13 -
 drivers/android/binder.c |  4 
 2 files changed, 17 deletions(-)

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 7dce3795b887..432e9ad77070 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -32,19 +32,6 @@ config ANDROID_BINDER_DEVICES
  created. Each binder device has its own context manager, and is
  therefore logically separated from the other devices.
 
-config ANDROID_BINDER_IPC_32BIT
-   bool "Use old (Android 4.4 and earlier) 32-bit binder API"
-   depends on !64BIT && ANDROID_BINDER_IPC
-   default y
-   ---help---
- The Binder API has been changed to support both 32 and 64bit
- applications in a mixed environment.
-
- Enable this to support an old 32-bit Android user-space (v4.4 and
- earlier).
-
- Note that enabling this will break newer Android user-space.
-
 config ANDROID_BINDER_IPC_SELFTEST
bool "Android Binder IPC Driver Selftest"
depends on ANDROID_BINDER_IPC
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index e578eee31589..2ee9fb02dfb8 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -72,10 +72,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT
-#define BINDER_IPC_32BIT 1
-#endif
-
 #include 
 #include "binder_alloc.h"
 #include "binder_trace.h"
-- 
2.17.0.441.gb46fe60e1d-goog



Re: [PATCH v3 2/5] efi: Add embedded peripheral firmware support

2018-05-04 Thread Martijn Coenen
On Wed, Apr 25, 2018 at 10:55 AM, Luis R. Rodriguez  wrote:
> Android became the primary user of CONFIG_FW_LOADER_USER_HELPER_FALLBACK.
>
> It would be good for us to hear from Android folks if their current use of
> request_firmware_into_buf() is designed in practice to *never* use the direct
> filesystem firmware loading interface, and always rely instead on the
> fallback mechanism.

It's hard to answer this question for Android in general. As far as I
can tell the reasons we use CONFIG_FW_LOADER_USER_HELPER(_FALLBACK)
are:
1) We have multiple different paths on our devices where firmware can
be located, and the direct loader only supports one custom path
2) Most of those paths are not mounted by the time the corresponding
drivers are loaded, because pretty much all Android kernels today are
built without module support, and therefore drivers are loaded well
before the firmware partition is mounted
3) I think we use _FALLBACK because doing this with uevents is just
the easiest thing to do; our init code has a firmware helper that
deals with this and searches the paths that we care about

2) will change at some point, because Android is moving towards a
model where device-specific peripheral drivers will be loaded as
modules, and since those modules would likely come from the same
partition as the firmware, it's possible that the direct load would
succeed (depending on whether the custom path is configured there or
not). But I don't think we can rely on the direct loader even in those
cases, unless we could configure it with multiple custom paths.

I have no reason to believe request_firmware_into_buf() is special in
this regard; drivers that depend on it may have their corresponding
firmware in different locations, so just depending on the direct
loader would not be good enough.

>
> Is ptr below
>
> ret = request_firmware_into_buf(_fw, fw_name, dev,
> ptr, phdr->p_filesz);
>
> Also part of the DMA buffer allocated earlier via:
>
> ret = qcom_scm_pas_init_image(pas_id, fw->data, fw->size);
>
> Android folks?

I think the Qualcomm folks owning this (Andy, David, Bjorn, already
cc'd here) are better suited to answer that question.

Thanks,
Martijn


Re: [PATCH v3 2/5] efi: Add embedded peripheral firmware support

2018-05-04 Thread Martijn Coenen
On Wed, Apr 25, 2018 at 10:55 AM, Luis R. Rodriguez  wrote:
> Android became the primary user of CONFIG_FW_LOADER_USER_HELPER_FALLBACK.
>
> It would be good for us to hear from Android folks if their current use of
> request_firmware_into_buf() is designed in practice to *never* use the direct
> filesystem firmware loading interface, and always rely instead on the
> fallback mechanism.

It's hard to answer this question for Android in general. As far as I
can tell the reasons we use CONFIG_FW_LOADER_USER_HELPER(_FALLBACK)
are:
1) We have multiple different paths on our devices where firmware can
be located, and the direct loader only supports one custom path
2) Most of those paths are not mounted by the time the corresponding
drivers are loaded, because pretty much all Android kernels today are
built without module support, and therefore drivers are loaded well
before the firmware partition is mounted
3) I think we use _FALLBACK because doing this with uevents is just
the easiest thing to do; our init code has a firmware helper that
deals with this and searches the paths that we care about

2) will change at some point, because Android is moving towards a
model where device-specific peripheral drivers will be loaded as
modules, and since those modules would likely come from the same
partition as the firmware, it's possible that the direct load would
succeed (depending on whether the custom path is configured there or
not). But I don't think we can rely on the direct loader even in those
cases, unless we could configure it with multiple custom paths.

I have no reason to believe request_firmware_into_buf() is special in
this regard; drivers that depend on it may have their corresponding
firmware in different locations, so just depending on the direct
loader would not be good enough.

>
> Is ptr below
>
> ret = request_firmware_into_buf(_fw, fw_name, dev,
> ptr, phdr->p_filesz);
>
> Also part of the DMA buffer allocated earlier via:
>
> ret = qcom_scm_pas_init_image(pas_id, fw->data, fw->size);
>
> Android folks?

I think the Qualcomm folks owning this (Andy, David, Bjorn, already
cc'd here) are better suited to answer that question.

Thanks,
Martijn


Re: [PATCH v3 2/5] efi: Add embedded peripheral firmware support

2018-05-04 Thread Martijn Coenen
On Thu, May 3, 2018 at 5:21 PM, Luis R. Rodriguez  wrote:
> Android folks, poke below. otherwise we'll have no option but to seriously
> consider Mimi's patch to prevent these calls when IMA appraisal is enforced:

Sorry, figuring out who's the right person to answer this, will get
back to you ASAP.

Martijn

>
> http://lkml.kernel.org/r/1525182503-13849-7-git-send-email-zo...@linux.vnet.ibm.com
>
> Please read below
>
> On Wed, Apr 25, 2018 at 05:55:57PM +, Luis R. Rodriguez wrote:
>> On Wed, Apr 25, 2018 at 01:00:09AM -0400, Mimi Zohar wrote:
>> > On Tue, 2018-04-24 at 23:42 +, Luis R. Rodriguez wrote:
>> > > On Tue, Apr 24, 2018 at 12:07:01PM -0400, Mimi Zohar wrote:
>> > > > On Tue, 2018-04-24 at 17:09 +0200, Hans de Goede wrote:
>> > > If its of any help --
>> > >
>> > > drivers/soc/qcom/mdt_loader.c is the only driver currently using
>> > > request_firmware_into_buf() however I'll note qcom_mdt_load() is used in 
>> > > many
>> > > other drivers so they are wrappers around request_firmware_into_buf():
>> > >
>> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c:   * adreno_request_fw() handles 
>> > > this, but qcom_mdt_load() does
>> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c:  ret = qcom_mdt_load(dev, 
>> > > fw, fwname, GPU_PAS_ID,
>> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c:  ret = qcom_mdt_load(dev, 
>> > > fw, newname, GPU_PAS_ID,
>> > > drivers/media/platform/qcom/venus/firmware.c:   ret = qcom_mdt_load(dev, 
>> > > mdt, fwname, VENUS_PAS_ID, mem_va, mem_phys,
>> > > drivers/remoteproc/qcom_adsp_pil.c: return qcom_mdt_load(adsp->dev, 
>> > > fw, rproc->firmware, adsp->pas_id,
>> > > drivers/remoteproc/qcom_wcnss.c:return qcom_mdt_load(wcnss->dev, 
>> > > fw, rproc->firmware, WCNSS_PAS_ID,
>> > >
>> > > > > As such the current IMA code (from v4.17-rc2) actually does
>> > > > > not handle READING_FIRMWARE_PREALLOC_BUFFER at all,
>> > > >
>> > > > Right, it doesn't yet address READING_FIRMWARE_PREALLOC_BUFFER, but
>> > > > should.
>> > > >
>> > > > Depending on whether the device requesting the firmware has access to
>> > > > the DMA memory, before the signature verification,
>> > >
>> > > It would seem from the original patch review about 
>> > > READING_FIRMWARE_PREALLOC_BUFFER
>> > > that this is not a DMA buffer.
>
> To be very clear I believe Stephen implied this was not DMA buffer. Mimi
> asked for READING_FIRMWARE_DMA if it was:
>
> https://patchwork.kernel.org/patch/9074611/
>
>> > The call sequence:
>> > qcom_mdt_load() -> qcom_scm_pas_init_image() -> dma_alloc_coherent()
>> >
>> > If dma_alloc_coherent() isn't allocating a DMA buffer, then the
>> > function name is misleading/confusing.
>>
>> Hah, by *definition* the device *and* processor has immediate access
>> to data written *immediately* when dma_alloc_coherent() is used. From
>> Documentation/DMA-API.txt:
>>
>> ---
>> Part Ia - Using large DMA-coherent buffers
>> --
>>
>> ::
>>
>> void *
>> dma_alloc_coherent(struct device *dev, size_t size,
>>dma_addr_t *dma_handle, gfp_t flag)
>>
>> Consistent memory is memory for which a write by either the device or
>> the processor can immediately be read by the processor or device
>> without having to worry about caching effects.  (You may however need
>> to make sure to flush the processor's write buffers before telling
>> devices to read that memory.)
>> 
>>
>> Is ptr below
>>
>>   ret = request_firmware_into_buf(_fw, fw_name, dev,
>>   ptr, phdr->p_filesz);
>>
>> Also part of the DMA buffer allocated earlier via:
>>
>> ret = qcom_scm_pas_init_image(pas_id, fw->data, fw->size);
>>
>> Android folks?
>
> Android folks?
>
>> > > The device driver should have access to the buffer pointer with write 
>> > > given
>> > > that with request_firmware_into_buf() the driver is giving full write 
>> > > access to
>> > > the memory pointer so that the firmware API can stuff the firmware it 
>> > > finds
>> > > there.
>> > >
>> > > Firmware signature verification would be up to the device hardware to do 
>> > > upon
>> > > load *after* request_firmware_into_buf().
>> >
>> > We're discussing the kernel's signature verification, not the device
>> > hardware's signature verification.  Can the device driver access the
>> > buffer, before IMA-appraisal has verified the firmware's signature?
>>
>> It will depend on the above question.
>
>   Luis


Re: [PATCH v3 2/5] efi: Add embedded peripheral firmware support

2018-05-04 Thread Martijn Coenen
On Thu, May 3, 2018 at 5:21 PM, Luis R. Rodriguez  wrote:
> Android folks, poke below. otherwise we'll have no option but to seriously
> consider Mimi's patch to prevent these calls when IMA appraisal is enforced:

Sorry, figuring out who's the right person to answer this, will get
back to you ASAP.

Martijn

>
> http://lkml.kernel.org/r/1525182503-13849-7-git-send-email-zo...@linux.vnet.ibm.com
>
> Please read below
>
> On Wed, Apr 25, 2018 at 05:55:57PM +, Luis R. Rodriguez wrote:
>> On Wed, Apr 25, 2018 at 01:00:09AM -0400, Mimi Zohar wrote:
>> > On Tue, 2018-04-24 at 23:42 +, Luis R. Rodriguez wrote:
>> > > On Tue, Apr 24, 2018 at 12:07:01PM -0400, Mimi Zohar wrote:
>> > > > On Tue, 2018-04-24 at 17:09 +0200, Hans de Goede wrote:
>> > > If its of any help --
>> > >
>> > > drivers/soc/qcom/mdt_loader.c is the only driver currently using
>> > > request_firmware_into_buf() however I'll note qcom_mdt_load() is used in 
>> > > many
>> > > other drivers so they are wrappers around request_firmware_into_buf():
>> > >
>> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c:   * adreno_request_fw() handles 
>> > > this, but qcom_mdt_load() does
>> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c:  ret = qcom_mdt_load(dev, 
>> > > fw, fwname, GPU_PAS_ID,
>> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c:  ret = qcom_mdt_load(dev, 
>> > > fw, newname, GPU_PAS_ID,
>> > > drivers/media/platform/qcom/venus/firmware.c:   ret = qcom_mdt_load(dev, 
>> > > mdt, fwname, VENUS_PAS_ID, mem_va, mem_phys,
>> > > drivers/remoteproc/qcom_adsp_pil.c: return qcom_mdt_load(adsp->dev, 
>> > > fw, rproc->firmware, adsp->pas_id,
>> > > drivers/remoteproc/qcom_wcnss.c:return qcom_mdt_load(wcnss->dev, 
>> > > fw, rproc->firmware, WCNSS_PAS_ID,
>> > >
>> > > > > As such the current IMA code (from v4.17-rc2) actually does
>> > > > > not handle READING_FIRMWARE_PREALLOC_BUFFER at all,
>> > > >
>> > > > Right, it doesn't yet address READING_FIRMWARE_PREALLOC_BUFFER, but
>> > > > should.
>> > > >
>> > > > Depending on whether the device requesting the firmware has access to
>> > > > the DMA memory, before the signature verification,
>> > >
>> > > It would seem from the original patch review about 
>> > > READING_FIRMWARE_PREALLOC_BUFFER
>> > > that this is not a DMA buffer.
>
> To be very clear I believe Stephen implied this was not DMA buffer. Mimi
> asked for READING_FIRMWARE_DMA if it was:
>
> https://patchwork.kernel.org/patch/9074611/
>
>> > The call sequence:
>> > qcom_mdt_load() -> qcom_scm_pas_init_image() -> dma_alloc_coherent()
>> >
>> > If dma_alloc_coherent() isn't allocating a DMA buffer, then the
>> > function name is misleading/confusing.
>>
>> Hah, by *definition* the device *and* processor has immediate access
>> to data written *immediately* when dma_alloc_coherent() is used. From
>> Documentation/DMA-API.txt:
>>
>> ---
>> Part Ia - Using large DMA-coherent buffers
>> --
>>
>> ::
>>
>> void *
>> dma_alloc_coherent(struct device *dev, size_t size,
>>dma_addr_t *dma_handle, gfp_t flag)
>>
>> Consistent memory is memory for which a write by either the device or
>> the processor can immediately be read by the processor or device
>> without having to worry about caching effects.  (You may however need
>> to make sure to flush the processor's write buffers before telling
>> devices to read that memory.)
>> 
>>
>> Is ptr below
>>
>>   ret = request_firmware_into_buf(_fw, fw_name, dev,
>>   ptr, phdr->p_filesz);
>>
>> Also part of the DMA buffer allocated earlier via:
>>
>> ret = qcom_scm_pas_init_image(pas_id, fw->data, fw->size);
>>
>> Android folks?
>
> Android folks?
>
>> > > The device driver should have access to the buffer pointer with write 
>> > > given
>> > > that with request_firmware_into_buf() the driver is giving full write 
>> > > access to
>> > > the memory pointer so that the firmware API can stuff the firmware it 
>> > > finds
>> > > there.
>> > >
>> > > Firmware signature verification would be up to the device hardware to do 
>> > > upon
>> > > load *after* request_firmware_into_buf().
>> >
>> > We're discussing the kernel's signature verification, not the device
>> > hardware's signature verification.  Can the device driver access the
>> > buffer, before IMA-appraisal has verified the firmware's signature?
>>
>> It will depend on the above question.
>
>   Luis


  1   2   3   4   >