Re: [PATCH v3 -next] binder: simplify the return expression of binder_mmap
Thanks! On Tue, Sep 29, 2020 at 3:30 AM Liu Shixin wrote: > > Simplify the return expression. > > Signed-off-by: Liu Shixin Acked-by: Martijn Coenen > --- > v3: Add the change description. > v2: Get rid of the "ret" and "failure string" variables. > v1: Simplify the return expression. > --- > drivers/android/binder.c | 18 -- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c > index 37a505c41dec..49c0700816a5 100644 > --- a/drivers/android/binder.c > +++ b/drivers/android/binder.c > @@ -5180,9 +5180,7 @@ static const struct vm_operations_struct binder_vm_ops > = { > > static int binder_mmap(struct file *filp, struct vm_area_struct *vma) > { > - int ret; > struct binder_proc *proc = filp->private_data; > - const char *failure_string; > > if (proc->tsk != current->group_leader) > return -EINVAL; > @@ -5194,9 +5192,9 @@ static int binder_mmap(struct file *filp, struct > vm_area_struct *vma) > (unsigned long)pgprot_val(vma->vm_page_prot)); > > if (vma->vm_flags & FORBIDDEN_MMAP_FLAGS) { > - ret = -EPERM; > - failure_string = "bad vm_flags"; > - goto err_bad_arg; > + pr_err("%s: %d %lx-%lx %s failed %d\n", __func__, > + proc->pid, vma->vm_start, vma->vm_end, "bad vm_flags", > -EPERM); > + return -EPERM; > } > vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP; > vma->vm_flags &= ~VM_MAYWRITE; > @@ -5204,15 +5202,7 @@ static int binder_mmap(struct file *filp, struct > vm_area_struct *vma) > vma->vm_ops = _vm_ops; > vma->vm_private_data = proc; > > - ret = binder_alloc_mmap_handler(>alloc, vma); > - if (ret) > - return ret; > - return 0; > - > -err_bad_arg: > - pr_err("%s: %d %lx-%lx %s failed %d\n", __func__, > - proc->pid, vma->vm_start, vma->vm_end, failure_string, ret); > - return ret; > + return binder_alloc_mmap_handler(>alloc, vma); > } > > static int binder_open(struct inode *nodp, struct file *filp) > -- > 2.25.1 >
Re: LOOP_CONFIGURE ioctl doesn't work if lo_offset/lo_sizelimit are set
Hi, I just sent a patch to fix the issue. The loop device would have respected the configuration, but indeed the size of the underlying block device was not set correctly, so reading back the size would give the wrong result. Thanks, Martijn On Mon, Aug 24, 2020 at 8:24 PM Martijn Coenen wrote: > > Hi Lennart, > > Thanks for the report, I'll look into it. FWIW, we've been using > LOOP_CONFIGURE on Android with lo_offset/lo_sizelimit without issues, > but it may be a particular configuration that's causing issues. > > Thanks, > Martijn > > On Mon, Aug 24, 2020 at 5:44 PM Lennart Poettering > wrote: > > > > Hi! > > > > Even with fe6a8fc5ed2f0081f17375ae2005718522c392c6 the LOOP_CONFIGURE > > ioctl doesn't work correctly. It gets confused if the > > lo_offset/lo_sizelimit fields are set to non-zero. > > > > In a quick test I ran (on Linux 5.8.3) I call LOOP_CONFIGURE with > > .lo_offset=3221204992 and .lo_sizelimit=50331648 and immediately > > verify the size of the block device with BLKGETSIZE64. It should of > > course return 50331648, but actually returns 3271557120. (the precise > > values have no particular relevance, it's just what I happened to use > > in my test.) If I instead use LOOP_SET_STATUS64 with the exact same > > parameters, everything works correctly. In either case, if I use > > LOOP_GET_STATUS64 insted of BLKGETSIZE64 to verify things, everything > > looks great. > > > > My guess is that the new ioctl simply doesn't properly propagate the > > size limit into the underlying block device like it should. I didn't > > have the time to investigate further though. > > > > Lennart
Re: LOOP_CONFIGURE ioctl doesn't work if lo_offset/lo_sizelimit are set
Hi Lennart, Thanks for the report, I'll look into it. FWIW, we've been using LOOP_CONFIGURE on Android with lo_offset/lo_sizelimit without issues, but it may be a particular configuration that's causing issues. Thanks, Martijn On Mon, Aug 24, 2020 at 5:44 PM Lennart Poettering wrote: > > Hi! > > Even with fe6a8fc5ed2f0081f17375ae2005718522c392c6 the LOOP_CONFIGURE > ioctl doesn't work correctly. It gets confused if the > lo_offset/lo_sizelimit fields are set to non-zero. > > In a quick test I ran (on Linux 5.8.3) I call LOOP_CONFIGURE with > .lo_offset=3221204992 and .lo_sizelimit=50331648 and immediately > verify the size of the block device with BLKGETSIZE64. It should of > course return 50331648, but actually returns 3271557120. (the precise > values have no particular relevance, it's just what I happened to use > in my test.) If I instead use LOOP_SET_STATUS64 with the exact same > parameters, everything works correctly. In either case, if I use > LOOP_GET_STATUS64 insted of BLKGETSIZE64 to verify things, everything > looks great. > > My guess is that the new ioctl simply doesn't properly propagate the > size limit into the underlying block device like it should. I didn't > have the time to investigate further though. > > Lennart
[PATCH v3] binder: print warnings when detecting oneway spamming.
The most common cause of the binder transaction buffer filling up is a client rapidly firing oneway transactions into a process, before it has a chance to handle them. Yet the root cause of this is often hard to debug, because either the system or the app will stop, and by that time binder debug information we dump in bugreports is no longer relevant. This change warns as soon as a process dips below 80% of its oneway space (less than 100kB available in the configuration), when any one process is responsible for either more than 50 transactions, or more than 50% of the oneway space. Signed-off-by: Martijn Coenen --- v2: fixed call-site in binder_alloc_selftest v3: include size of struct binder_buffer in calculation, fix comments drivers/android/binder.c| 2 +- drivers/android/binder_alloc.c | 55 +++-- drivers/android/binder_alloc.h | 5 ++- drivers/android/binder_alloc_selftest.c | 2 +- 4 files changed, 58 insertions(+), 6 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index f936530a19b0..946332bc871a 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -3136,7 +3136,7 @@ static void binder_transaction(struct binder_proc *proc, t->buffer = binder_alloc_new_buf(_proc->alloc, tr->data_size, tr->offsets_size, extra_buffers_size, - !reply && (t->flags & TF_ONE_WAY)); + !reply && (t->flags & TF_ONE_WAY), current->tgid); if (IS_ERR(t->buffer)) { /* * -ESRCH indicates VMA cleared. The target is dying. diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index 69609696a843..b5b6c9cf1b0b 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -338,12 +338,50 @@ static inline struct vm_area_struct *binder_alloc_get_vma( return vma; } +static void debug_low_async_space_locked(struct binder_alloc *alloc, int pid) +{ + /* +* Find the amount and size of buffers allocated by the current caller; +* The idea is that once we cross the threshold, whoever is responsible +* for the low async space is likely to try to send another async txn, +* and at some point we'll catch them in the act. This is more efficient +* than keeping a map per pid. +*/ + struct rb_node *n = alloc->free_buffers.rb_node; + struct binder_buffer *buffer; + size_t total_alloc_size = 0; + size_t num_buffers = 0; + + for (n = rb_first(>allocated_buffers); n != NULL; +n = rb_next(n)) { + buffer = rb_entry(n, struct binder_buffer, rb_node); + if (buffer->pid != pid) + continue; + if (!buffer->async_transaction) + continue; + total_alloc_size += binder_alloc_buffer_size(alloc, buffer) + + sizeof(struct binder_buffer); + num_buffers++; + } + + /* +* Warn if this pid has more than 50 transactions, or more than 50% of +* async space (which is 25% of total buffer size). +*/ + if (num_buffers > 50 || total_alloc_size > alloc->buffer_size / 4) { + binder_alloc_debug(BINDER_DEBUG_USER_ERROR, +"%d: pid %d spamming oneway? %zd buffers allocated for a total size of %zd\n", + alloc->pid, pid, num_buffers, total_alloc_size); + } +} + static struct binder_buffer *binder_alloc_new_buf_locked( struct binder_alloc *alloc, size_t data_size, size_t offsets_size, size_t extra_buffers_size, - int is_async) + int is_async, + int pid) { struct rb_node *n = alloc->free_buffers.rb_node; struct binder_buffer *buffer; @@ -486,11 +524,20 @@ static struct binder_buffer *binder_alloc_new_buf_locked( buffer->offsets_size = offsets_size; buffer->async_transaction = is_async; buffer->extra_buffers_size = extra_buffers_size; + buffer->pid = pid; if (is_async) { alloc->free_async_space -= size + sizeof(struct binder_buffer); binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC, "%d: binder_alloc_buf size %zd async free %zd\n", alloc->pid, size, alloc->free_async_space); + if (alloc->free_async_space < alloc->buffer_size / 10) { + /* +* Start detecting spammers once we have less th
[PATCH v2] ANDROID: binder: print warnings when detecting oneway spamming.
The most common cause of the binder transaction buffer filling up is a client rapidly firing oneway transactions into a process, before it has a chance to handle them. Yet the root cause of this is often hard to debug, because either the system or the app will stop, and by that time binder debug information we dump in bugreports is no longer relevant. This change warns as soon as a process dips below 80% of its oneway space (less than 100kB available in the configuration), when any one process is responsible for either more than 50 transactions, or more than 50% of the oneway space. Signed-off-by: Martijn Coenen --- v2: fixed call-site in binder_alloc_selftest drivers/android/binder.c| 2 +- drivers/android/binder_alloc.c | 49 +++-- drivers/android/binder_alloc.h | 5 ++- drivers/android/binder_alloc_selftest.c | 2 +- 4 files changed, 52 insertions(+), 6 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index f936530a19b0..946332bc871a 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -3136,7 +3136,7 @@ static void binder_transaction(struct binder_proc *proc, t->buffer = binder_alloc_new_buf(_proc->alloc, tr->data_size, tr->offsets_size, extra_buffers_size, - !reply && (t->flags & TF_ONE_WAY)); + !reply && (t->flags & TF_ONE_WAY), current->tgid); if (IS_ERR(t->buffer)) { /* * -ESRCH indicates VMA cleared. The target is dying. diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index 69609696a843..76e8e633dbd4 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -338,12 +338,48 @@ static inline struct vm_area_struct *binder_alloc_get_vma( return vma; } +static void debug_low_async_space_locked(struct binder_alloc *alloc, int pid) +{ + /* +* Find the amount and size of buffers allocated by the current caller; +* The idea is that once we cross the threshold, whoever is responsible +* for the low async space is likely to try to send another async txn, +* and at some point we'll catch them in the act. This is more efficient +* than keeping a map per pid. +*/ + struct rb_node *n = alloc->free_buffers.rb_node; + struct binder_buffer *buffer; + size_t buffer_size; + size_t total_alloc_size = 0; + size_t num_buffers = 0; + + for (n = rb_first(>allocated_buffers); n != NULL; +n = rb_next(n)) { + buffer = rb_entry(n, struct binder_buffer, rb_node); + if (buffer->pid != pid) + continue; + if (!buffer->async_transaction) + continue; + buffer_size = binder_alloc_buffer_size(alloc, buffer); + total_alloc_size += buffer_size; + num_buffers++; + } + + // Warn if this pid has more than 50% of async space, or more than 50 txns + if (num_buffers > 50 || total_alloc_size > alloc->buffer_size / 4) { + binder_alloc_debug(BINDER_DEBUG_USER_ERROR, +"%d: pid %d spamming oneway? %zd buffers allocated for a total size of %zd\n", + alloc->pid, pid, num_buffers, total_alloc_size); + } +} + static struct binder_buffer *binder_alloc_new_buf_locked( struct binder_alloc *alloc, size_t data_size, size_t offsets_size, size_t extra_buffers_size, - int is_async) + int is_async, + int pid) { struct rb_node *n = alloc->free_buffers.rb_node; struct binder_buffer *buffer; @@ -486,11 +522,16 @@ static struct binder_buffer *binder_alloc_new_buf_locked( buffer->offsets_size = offsets_size; buffer->async_transaction = is_async; buffer->extra_buffers_size = extra_buffers_size; + buffer->pid = pid; if (is_async) { alloc->free_async_space -= size + sizeof(struct binder_buffer); binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC, "%d: binder_alloc_buf size %zd async free %zd\n", alloc->pid, size, alloc->free_async_space); + if (alloc->free_async_space < alloc->buffer_size / 10) { + // Start detecting spammers once we reach 80% of async space used + debug_low_async_space_locked(alloc, pid); + } } return buffer; @@ -508,6 +549,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
Re: [PATCH] binder: print warnings when detecting oneway spamming.
On Thu, Aug 20, 2020 at 12:41 PM kernel test robot wrote: > > Hi Martijn, > > I love your patch! Yet something to improve: > > [auto build test ERROR on staging/staging-testing] > [also build test ERROR on v5.9-rc1 next-20200820] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch] > > url: > https://github.com/0day-ci/linux/commits/Martijn-Coenen/binder-print-warnings-when-detecting-oneway-spamming/20200820-155358 > base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git > bc752d2f345bf55d71b3422a6a24890ea03168dc > config: s390-randconfig-r002-20200818 (attached as .config) > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project > 4deda57106f7c9b982a49cb907c33e3966c8de7f) > reproduce (this is a W=1 build): > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # install s390 cross compiling tool for clang build > # apt-get install binutils-s390x-linux-gnu > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot > > All errors (new ones prefixed by >>): > > >> drivers/android/binder_alloc_selftest.c:122:61: error: too few arguments > >> to function call, expected 6, have 5 >buffers[i] = binder_alloc_new_buf(alloc, sizes[i], 0, 0, > 0); missed this call-site, will send v2. Martijn > ^ >drivers/android/binder_alloc.h:118:30: note: 'binder_alloc_new_buf' > declared here >extern struct binder_buffer *binder_alloc_new_buf(struct binder_alloc > *alloc, > ^ >1 error generated. > > # > https://github.com/0day-ci/linux/commit/9d0b269f4468d6793f6fd76a410fdde39dbf6ac2 > git remote add linux-review https://github.com/0day-ci/linux > git fetch --no-tags linux-review > Martijn-Coenen/binder-print-warnings-when-detecting-oneway-spamming/20200820-155358 > git checkout 9d0b269f4468d6793f6fd76a410fdde39dbf6ac2 > vim +122 drivers/android/binder_alloc_selftest.c > > 4175e2b46fd4b9 Sherry Yang 2017-08-23 114 > 4175e2b46fd4b9 Sherry Yang 2017-08-23 115 static void > binder_selftest_alloc_buf(struct binder_alloc *alloc, > 4175e2b46fd4b9 Sherry Yang 2017-08-23 116 > struct binder_buffer *buffers[], > 4175e2b46fd4b9 Sherry Yang 2017-08-23 117 > size_t *sizes, int *seq) > 4175e2b46fd4b9 Sherry Yang 2017-08-23 118 { > 4175e2b46fd4b9 Sherry Yang 2017-08-23 119 int i; > 4175e2b46fd4b9 Sherry Yang 2017-08-23 120 > 4175e2b46fd4b9 Sherry Yang 2017-08-23 121 for (i = 0; i < BUFFER_NUM; > i++) { > 4175e2b46fd4b9 Sherry Yang 2017-08-23 @122 buffers[i] = > binder_alloc_new_buf(alloc, sizes[i], 0, 0, 0); > 4175e2b46fd4b9 Sherry Yang 2017-08-23 123 if > (IS_ERR(buffers[i]) || > 4175e2b46fd4b9 Sherry Yang 2017-08-23 124 > !check_buffer_pages_allocated(alloc, buffers[i], > 4175e2b46fd4b9 Sherry Yang 2017-08-23 125 > sizes[i])) { > 4175e2b46fd4b9 Sherry Yang 2017-08-23 126 > pr_err_size_seq(sizes, seq); > 4175e2b46fd4b9 Sherry Yang 2017-08-23 127 > binder_selftest_failures++; > 4175e2b46fd4b9 Sherry Yang 2017-08-23 128 } > 4175e2b46fd4b9 Sherry Yang 2017-08-23 129 } > 4175e2b46fd4b9 Sherry Yang 2017-08-23 130 } > 4175e2b46fd4b9 Sherry Yang 2017-08-23 131 > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
[PATCH] binder: print warnings when detecting oneway spamming.
The most common cause of the binder transaction buffer filling up is a client rapidly firing oneway transactions into a process, before it has a chance to handle them. Yet the root cause of this is often hard to debug, because either the system or the app will stop, and by that time binder debug information we dump in bugreports is no longer relevant. This change warns as soon as a process dips below 80% of its oneway space (less than 100kB available in the configuration), when any one process is responsible for either more than 50 transactions, or more than 50% of the oneway space. Signed-off-by: Martijn Coenen --- drivers/android/binder.c | 2 +- drivers/android/binder_alloc.c | 49 +++--- drivers/android/binder_alloc.h | 5 +++- 3 files changed, 51 insertions(+), 5 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index f936530a19b0..946332bc871a 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -3136,7 +3136,7 @@ static void binder_transaction(struct binder_proc *proc, t->buffer = binder_alloc_new_buf(_proc->alloc, tr->data_size, tr->offsets_size, extra_buffers_size, - !reply && (t->flags & TF_ONE_WAY)); + !reply && (t->flags & TF_ONE_WAY), current->tgid); if (IS_ERR(t->buffer)) { /* * -ESRCH indicates VMA cleared. The target is dying. diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index 69609696a843..76e8e633dbd4 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -338,12 +338,48 @@ static inline struct vm_area_struct *binder_alloc_get_vma( return vma; } +static void debug_low_async_space_locked(struct binder_alloc *alloc, int pid) +{ + /* +* Find the amount and size of buffers allocated by the current caller; +* The idea is that once we cross the threshold, whoever is responsible +* for the low async space is likely to try to send another async txn, +* and at some point we'll catch them in the act. This is more efficient +* than keeping a map per pid. +*/ + struct rb_node *n = alloc->free_buffers.rb_node; + struct binder_buffer *buffer; + size_t buffer_size; + size_t total_alloc_size = 0; + size_t num_buffers = 0; + + for (n = rb_first(>allocated_buffers); n != NULL; +n = rb_next(n)) { + buffer = rb_entry(n, struct binder_buffer, rb_node); + if (buffer->pid != pid) + continue; + if (!buffer->async_transaction) + continue; + buffer_size = binder_alloc_buffer_size(alloc, buffer); + total_alloc_size += buffer_size; + num_buffers++; + } + + // Warn if this pid has more than 50% of async space, or more than 50 txns + if (num_buffers > 50 || total_alloc_size > alloc->buffer_size / 4) { + binder_alloc_debug(BINDER_DEBUG_USER_ERROR, +"%d: pid %d spamming oneway? %zd buffers allocated for a total size of %zd\n", + alloc->pid, pid, num_buffers, total_alloc_size); + } +} + static struct binder_buffer *binder_alloc_new_buf_locked( struct binder_alloc *alloc, size_t data_size, size_t offsets_size, size_t extra_buffers_size, - int is_async) + int is_async, + int pid) { struct rb_node *n = alloc->free_buffers.rb_node; struct binder_buffer *buffer; @@ -486,11 +522,16 @@ static struct binder_buffer *binder_alloc_new_buf_locked( buffer->offsets_size = offsets_size; buffer->async_transaction = is_async; buffer->extra_buffers_size = extra_buffers_size; + buffer->pid = pid; if (is_async) { alloc->free_async_space -= size + sizeof(struct binder_buffer); binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC, "%d: binder_alloc_buf size %zd async free %zd\n", alloc->pid, size, alloc->free_async_space); + if (alloc->free_async_space < alloc->buffer_size / 10) { + // Start detecting spammers once we reach 80% of async space used + debug_low_async_space_locked(alloc, pid); + } } return buffer; @@ -508,6 +549,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked( * @offsets_size: user specified buffer offset * @extra_buffers_size: size of extra space for meta-d
Re: [PATCH v2] loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
On Mon, Aug 10, 2020 at 7:16 PM Lennart Poettering wrote: > > When LOOP_CONFIGURE is used with LO_FLAGS_PARTSCAN we need to propagate > this into the GENHD_FL_NO_PART_SCAN. LOOP_SETSTATUS does this, > LOOP_CONFIGURE doesn't so far. Effect is that setting up a loopback > device with partition scanning doesn't actually work when LOOP_CONFIGURE > is issued, though it works fine with LOOP_SETSTATUS. > > Let's correct that and propagate the flag in LOOP_CONFIGURE too. > > Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl") > > Signed-off-by: Lennart Poettering > Acked-by: Martijn Coenen Thanks, still looks good to me. > --- > drivers/block/loop.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index d18160146226..2f137d6ce169 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1171,6 +1171,8 @@ static int loop_configure(struct loop_device *lo, > fmode_t mode, > if (part_shift) > lo->lo_flags |= LO_FLAGS_PARTSCAN; > partscan = lo->lo_flags & LO_FLAGS_PARTSCAN; > + if (partscan) > + lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN; > > /* Grab the block_device to prevent its destruction after we > * put /dev/loopXX inode. Later in __loop_clr_fd() we bdput(bdev). > -- > 2.26.2
Re: [PATCH] loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
On Fri, Aug 7, 2020 at 5:31 PM Lennart Poettering wrote: > Thanks for the review. I'll fix this up and send a v2. Are you OK with > me adding your Ack to the patch? Yeah, sure! > And also should this geta cc for stable? LOOP_CONFIGURE was just added in v5.8, and stable is v5.7 now, so I don't think that's needed. Thanks, Martijn > > Thanks, > > Lennart > > -- > Lennart Poettering, Berlin
Re: [PATCH] loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
Hi Lennart, Thanks again for the patch, I tested it and it looks good to me. I'll also add a test case to LTP for this. Two minor nits on the patch: On Thu, Aug 6, 2020 at 9:32 AM Lennart Poettering wrote: > Let's correct that and propagate the flag in LOOP_SETSTATUS too. Think you meant LOOP_CONFIGURE. Also, could you add a "Fixes" tag, like: Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl") > > Signed-off-by: Lennart Poettering > --- > drivers/block/loop.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index d18160146226..2f137d6ce169 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1171,6 +1171,8 @@ static int loop_configure(struct loop_device *lo, > fmode_t mode, > if (part_shift) > lo->lo_flags |= LO_FLAGS_PARTSCAN; > partscan = lo->lo_flags & LO_FLAGS_PARTSCAN; > + if (partscan) > + lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN; > > /* Grab the block_device to prevent its destruction after we > * put /dev/loopXX inode. Later in __loop_clr_fd() we bdput(bdev). > -- > 2.26.2
Re: [PATCH v2] binder: Prevent context manager from incrementing ref 0
Thanks a lot for the detailed explanation, I understood now. Martijn On Tue, Jul 28, 2020 at 4:50 PM Jann Horn wrote: > > On Tue, Jul 28, 2020 at 3:50 PM Martijn Coenen wrote: > > On Mon, Jul 27, 2020 at 2:04 PM Jann Horn wrote: > > > - task B opens /dev/binder once, creating binder_proc instance P3 > > > - P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way > > >transaction) > > > - P2 receives the handle and uses it to call P3 (two-way transaction) > > > - P3 calls P2 (via magic handle 0) (two-way transaction) > > > - P2 calls P2 (via handle 1) (two-way transaction) > > > > Why do you need P3 involved at all? Could P2 just straight away make a > > call on handle 1? > > Yes, it could, I think. IIRC these steps are necessary if you want to > actually get a KASAN splat, instead of just a transaction-to-self with > no further consequences. It has been a while since I looked at this, > but I'll try to figure out again what was going on... > > > A UAF occurs in the following code due to the transaction-to-self, > because the "if (t->to_thread == thread)" is tricked into believing > that the transaction has already been accepted. > > static int binder_thread_release(struct binder_proc *proc, > struct binder_thread *thread) > { > struct binder_transaction *t; > struct binder_transaction *send_reply = NULL; > [...] > t = thread->transaction_stack; > if (t) { > [...] > if (t->to_thread == thread) > send_reply = t; > } else { > [...] > } > [...] > //NOTE: transaction is freed here because top-of-stack is > // wrongly treated as already-accepted incoming transaction) > if (send_reply) > binder_send_failed_reply(send_reply, BR_DEAD_REPLY); > //NOTE pending transaction work is processed here (transaction has not > // yet been accepted) > binder_release_work(proc, >todo); > [...] > } > > An unaccepted transaction will only have a non-NULL ->to_thread if the > transaction has a specific target thread; for a non-reply transaction, > that is only the case if it is a two-way transaction that was sent > while the binder call stack already contained the target task (iow, > the transaction looks like a synchronous callback invocation). > > With the steps: > > - P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way >transaction) > - P2 receives the handle and uses it to call P3 (two-way transaction) > - P3 calls P2 (via magic handle 0) (two-way transaction) > - P2 calls P2 (via handle 1) (two-way transaction) > > the call stack will look like this: > > P3 -> P2 -> P3 -> P2 -> P2 > > The initial call from P3 to P2 was IIRC just to give P2 a handle to > P3; IIRC the relevant part of the call stack was: > > P2 -> P3 -> P2 -> P2 > > Because P2 already occurs down in the call stack, the final > transaction "P2 -> P2" is considered to be a callback and is > thread-directed. > > > The reason why P3 has to be created from task B is the "if > (target_node && target_proc->pid == proc->pid)" check for transactions > to reference 0.
Re: [PATCH v2] binder: Prevent context manager from incrementing ref 0
Thanks Jann, the change LGTM, one question on the repro scenario that wasn't immediately obvious to me: On Mon, Jul 27, 2020 at 2:04 PM Jann Horn wrote: > - task B opens /dev/binder once, creating binder_proc instance P3 > - P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way >transaction) > - P2 receives the handle and uses it to call P3 (two-way transaction) > - P3 calls P2 (via magic handle 0) (two-way transaction) > - P2 calls P2 (via handle 1) (two-way transaction) Why do you need P3 involved at all? Could P2 just straight away make a call on handle 1? > > And then, if P2 does *NOT* accept the incoming transaction work, but > instead closes the binder fd, we get a crash. > > Solve it by preventing the context manager from using ACQUIRE on ref 0. > There shouldn't be any legitimate reason for the context manager to do > that. > > Additionally, print a warning if someone manages to find another way to > trigger a transaction-to-self bug in the future. > > Cc: sta...@vger.kernel.org > Fixes: 457b9a6f09f0 ("Staging: android: add binder driver") > Acked-by: Todd Kjos Reviewed-by: Martijn Coenen > Signed-off-by: Jann Horn > --- > fixed that broken binder_user_error() from the first version... > I sent v1 while I had a dirty tree containing the missing fix. whoops. > > drivers/android/binder.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c > index f50c5f182bb5..5b310eea9e52 100644 > --- a/drivers/android/binder.c > +++ b/drivers/android/binder.c > @@ -2982,6 +2982,12 @@ static void binder_transaction(struct binder_proc > *proc, > goto err_dead_binder; > } > e->to_node = target_node->debug_id; > + if (WARN_ON(proc == target_proc)) { > + return_error = BR_FAILED_REPLY; > + return_error_param = -EINVAL; > + return_error_line = __LINE__; > + goto err_invalid_target_handle; > + } > if (security_binder_transaction(proc->tsk, > target_proc->tsk) < 0) { > return_error = BR_FAILED_REPLY; > @@ -3635,10 +3641,17 @@ static int binder_thread_write(struct binder_proc > *proc, > struct binder_node *ctx_mgr_node; > mutex_lock(>context_mgr_node_lock); > ctx_mgr_node = > context->binder_context_mgr_node; > - if (ctx_mgr_node) > + if (ctx_mgr_node) { > + if (ctx_mgr_node->proc == proc) { > + binder_user_error("%d:%d > context manager tried to acquire desc 0\n", > + proc->pid, > thread->pid); > + > mutex_unlock(>context_mgr_node_lock); > + return -EINVAL; > + } > ret = binder_inc_ref_for_node( > proc, ctx_mgr_node, > strong, NULL, ); > + } > mutex_unlock(>context_mgr_node_lock); > } > if (ret) > > base-commit: 2a89b99f580371b86ae9bafd6cbeccd3bfab524a > -- > 2.28.0.rc0.142.g3c755180ce-goog >
Re: [LTP] LTP: syscalls: regression on mainline - ioctl_loop01 mknod07 setns01
Hi Yang, On Fri, Jun 5, 2020 at 11:27 AM Yang Xu wrote: > since ~LOOP_SET_STATUS_SETTABLE_FLAGS has been included in > ~LOOP_SET_STATUS_CLEARABLE_FLAGS, do we still need the previous step? > What do you think about it? Yeah, I don't think we need the previous step with the current set of flags, because clearable is a subset of settable. I will send a follow-up patch some time next week. Best, Martijn > > Best Regards > Yang Xu > > > Hey Yang, > > > > On Fri, Jun 5, 2020 at 10:59 AM Yang Xu > > wrote: > >> > >> Hi Martijn > >> > >> Sorry for noise. I see your patch in here[1] . I will modify > >> ioctl_loop01 to test that LO_FLAGS_PARTSCAN can not clear and > >> LO_FLAGS_AUTOCLEAR can be clear. > > > > Thanks, that would indeed be useful. > > > >> > >> ps: Giving the url of patch is better so that other people doesn't need > >> to investigate it again. > >> [1]https://patchwork.kernel.org/patch/11588321/ > > > > Ok, will do next time! > > > > Best, > > Martijn > >> > >> Best Regards > >> Yang Xu > >>> Hi Martijn > >>> > >>>> Hi Naresh, > >>>> > >>>> I just sent a patch and cc'd you. I verified all the loop tests pass > >>>> again with that patch. > >>> I think you want to say "without". I verified the ioctl_loop01 fails > >>> with faf1d25440 ("loop: Clean up LOOP_SET_STATUS lo_flags handling"). > >>> > >>> This kernel commit breaks old behaviour(if old flag all 0, new flag is > >>> always 0 regradless your flag setting). > >>> > >>> I think we should modify code as below: > >>> diff --git a/drivers/block/loop.c b/drivers/block/loop.c > >>> index 13518ba191f5..c6ba8cf486ce 100644 > >>> --- a/drivers/block/loop.c > >>> +++ b/drivers/block/loop.c > >>> @@ -1364,11 +1364,9 @@ loop_set_status(struct loop_device *lo, const > >>> struct loop_info64 *info) > >>> if (err) > >>> goto out_unfreeze; > >>> > >>> - /* Mask out flags that can't be set using LOOP_SET_STATUS. */ > >>> - lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS; > >>> - /* For those flags, use the previous values instead */ > >>> - lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS; > >>> - /* For flags that can't be cleared, use previous values too */ > >>> + /* Mask out flags that can be set using LOOP_SET_STATUS. */ > >>> + lo->lo_flags &= LOOP_SET_STATUS_SETTABLE_FLAGS; > >>> + /* For flags that can't be cleared, use previous values. */ > >>> lo->lo_flags |= prev_lo_flags &~LOOP_SET_STATUS_CLEARABLE_FLAGS; > >>> > >>> Best Regards > >>> Yang Xu > >>>> > >>>> Thanks, > >>>> Martijn > >>>> > >>>> > >>>> On Thu, Jun 4, 2020 at 9:10 PM Martijn Coenen wrote: > >>>>> > >>>>> Hi Naresh, > >>>>> > >>>>> I suspect the loop failures are due to > >>>>> faf1d25440d6ad06d509dada4b6fe62fea844370 ("loop: Clean up > >>>>> LOOP_SET_STATUS lo_flags handling"), I will investigate and get back > >>>>> to you. > >>>>> > >>>>> Thanks, > >>>>> Martijn > >>>>> > >>>>> On Thu, Jun 4, 2020 at 7:19 PM Naresh Kamboju > >>>>> wrote: > >>>>>> > >>>>>> + linux-bl...@vger.kernel.org > >>>>>> > >>>>>> On Thu, 4 Jun 2020 at 22:47, Naresh Kamboju > >>>>>> wrote: > >>>>>>> > >>>>>>> Following three test cases reported as regression on Linux mainline > >>>>>>> kernel > >>>>>>> on x86_64, arm64, arm and i386 > >>>>>>> > >>>>>>> ltp-syscalls-tests: > >>>>>>> * ioctl_loop01 > >>>>>>> * mknod07 > >>>>>>> * setns01 > >>>>>>> > >>>>>>> git repo: > >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > >>>>>>> git branch: master >
Re: [LTP] LTP: syscalls: regression on mainline - ioctl_loop01 mknod07 setns01
Hey Yang, On Fri, Jun 5, 2020 at 10:59 AM Yang Xu wrote: > > Hi Martijn > > Sorry for noise. I see your patch in here[1] . I will modify > ioctl_loop01 to test that LO_FLAGS_PARTSCAN can not clear and > LO_FLAGS_AUTOCLEAR can be clear. Thanks, that would indeed be useful. > > ps: Giving the url of patch is better so that other people doesn't need > to investigate it again. > [1]https://patchwork.kernel.org/patch/11588321/ Ok, will do next time! Best, Martijn > > Best Regards > Yang Xu > > Hi Martijn > > > >> Hi Naresh, > >> > >> I just sent a patch and cc'd you. I verified all the loop tests pass > >> again with that patch. > > I think you want to say "without". I verified the ioctl_loop01 fails > > with faf1d25440 ("loop: Clean up LOOP_SET_STATUS lo_flags handling"). > > > > This kernel commit breaks old behaviour(if old flag all 0, new flag is > > always 0 regradless your flag setting). > > > > I think we should modify code as below: > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > > index 13518ba191f5..c6ba8cf486ce 100644 > > --- a/drivers/block/loop.c > > +++ b/drivers/block/loop.c > > @@ -1364,11 +1364,9 @@ loop_set_status(struct loop_device *lo, const > > struct loop_info64 *info) > > if (err) > > goto out_unfreeze; > > > > - /* Mask out flags that can't be set using LOOP_SET_STATUS. */ > > - lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS; > > - /* For those flags, use the previous values instead */ > > - lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS; > > - /* For flags that can't be cleared, use previous values too */ > > + /* Mask out flags that can be set using LOOP_SET_STATUS. */ > > + lo->lo_flags &= LOOP_SET_STATUS_SETTABLE_FLAGS; > > + /* For flags that can't be cleared, use previous values. */ > > lo->lo_flags |= prev_lo_flags &~LOOP_SET_STATUS_CLEARABLE_FLAGS; > > > > Best Regards > > Yang Xu > >> > >> Thanks, > >> Martijn > >> > >> > >> On Thu, Jun 4, 2020 at 9:10 PM Martijn Coenen wrote: > >>> > >>> Hi Naresh, > >>> > >>> I suspect the loop failures are due to > >>> faf1d25440d6ad06d509dada4b6fe62fea844370 ("loop: Clean up > >>> LOOP_SET_STATUS lo_flags handling"), I will investigate and get back > >>> to you. > >>> > >>> Thanks, > >>> Martijn > >>> > >>> On Thu, Jun 4, 2020 at 7:19 PM Naresh Kamboju > >>> wrote: > >>>> > >>>> + linux-bl...@vger.kernel.org > >>>> > >>>> On Thu, 4 Jun 2020 at 22:47, Naresh Kamboju > >>>> wrote: > >>>>> > >>>>> Following three test cases reported as regression on Linux mainline > >>>>> kernel > >>>>> on x86_64, arm64, arm and i386 > >>>>> > >>>>>ltp-syscalls-tests: > >>>>> * ioctl_loop01 > >>>>> * mknod07 > >>>>> * setns01 > >>>>> > >>>>> git repo: > >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > >>>>> git branch: master > >>>>> GOOD: > >>>>>git commit: b23c4771ff62de8ca9b5e4a2d64491b2fb6f8f69 > >>>>>git describe: v5.7-1230-gb23c4771ff62 > >>>>> BAD: > >>>>>git commit: 1ee08de1e234d95b5b4f866878b72fceb5372904 > >>>>>git describe: v5.7-3523-g1ee08de1e234 > >>>>> > >>>>> kernel-config: > >>>>> https://builds.tuxbuild.com/U3bU0dMA62OVHb4DvZIVuw/kernel.config > >>>>> > >>>>> We are investigating these failures. > >>>>> > >>>>> tst_test.c:906: CONF: btrfs driver not available > >>>>> tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s > >>>>> tst_device.c:88: INFO: Found free device 1 '/dev/loop1' > >>>>> ioctl_loop01.c:49: PASS: /sys/block/loop1/loop/partscan = 0 > >>>>> [ 1073.639677] loop_set_status: loop1 () has still dirty pages > >>>>> (nrpages=1) > >>>>> ioctl_loop01.c:50: PASS: /sys/block/loop1/loop/autoclear = 0 > >>>>> ioctl_loop01.c:51: PASS: /sys/block/loop1/loop/backing_file = > >>
Re: LTP: syscalls: regression on mainline - ioctl_loop01 mknod07 setns01
Hi Naresh, I just sent a patch and cc'd you. I verified all the loop tests pass again with that patch. Thanks, Martijn On Thu, Jun 4, 2020 at 9:10 PM Martijn Coenen wrote: > > Hi Naresh, > > I suspect the loop failures are due to > faf1d25440d6ad06d509dada4b6fe62fea844370 ("loop: Clean up > LOOP_SET_STATUS lo_flags handling"), I will investigate and get back > to you. > > Thanks, > Martijn > > On Thu, Jun 4, 2020 at 7:19 PM Naresh Kamboju > wrote: > > > > + linux-bl...@vger.kernel.org > > > > On Thu, 4 Jun 2020 at 22:47, Naresh Kamboju > > wrote: > > > > > > Following three test cases reported as regression on Linux mainline kernel > > > on x86_64, arm64, arm and i386 > > > > > > ltp-syscalls-tests: > > > * ioctl_loop01 > > > * mknod07 > > > * setns01 > > > > > > git repo: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > > git branch: master > > > GOOD: > > > git commit: b23c4771ff62de8ca9b5e4a2d64491b2fb6f8f69 > > > git describe: v5.7-1230-gb23c4771ff62 > > > BAD: > > > git commit: 1ee08de1e234d95b5b4f866878b72fceb5372904 > > > git describe: v5.7-3523-g1ee08de1e234 > > > > > > kernel-config: > > > https://builds.tuxbuild.com/U3bU0dMA62OVHb4DvZIVuw/kernel.config > > > > > > We are investigating these failures. > > > > > > tst_test.c:906: CONF: btrfs driver not available > > > tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s > > > tst_device.c:88: INFO: Found free device 1 '/dev/loop1' > > > ioctl_loop01.c:49: PASS: /sys/block/loop1/loop/partscan = 0 > > > [ 1073.639677] loop_set_status: loop1 () has still dirty pages (nrpages=1) > > > ioctl_loop01.c:50: PASS: /sys/block/loop1/loop/autoclear = 0 > > > ioctl_loop01.c:51: PASS: /sys/block/loop1/loop/backing_file = > > > '/scratch/ltp-mnIdulzriQ/9cPtLQ/test.img' > > > ioctl_loop01.c:63: FAIL: expect 12 but got 17 > > > ioctl_loop01.c:67: FAIL: /sys/block/loop1/loop/partscan != 1 got 0 > > > ioctl_loop01.c:68: FAIL: /sys/block/loop1/loop/autoclear != 1 got 0 > > > ioctl_loop01.c:79: FAIL: access /dev/loop1p1 fails > > > [ 1073.679678] loop_set_status: loop1 () has still dirty pages (nrpages=1) > > > ioctl_loop01.c:85: FAIL: access /sys/block/loop1/loop1p1 fails > > > > > > HINT: You _MAY_ be missing kernel fixes, see: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10c70d95c0f2 > > > > > > mke2fs 1.43.8 (1-Jan-2018) > > > [ 1264.711379] EXT4-fs (loop0): mounting ext2 file system using the > > > ext4 subsystem > > > [ 1264.716642] EXT4-fs (loop0): mounted filesystem without journal. Opts: > > > (null) > > > mknod07 0 TINFO : Using test device LTP_DEV='/dev/loop0' > > > mknod07 0 TINFO : Formatting /dev/loop0 with ext2 opts='' extra > > > opts='' > > > mknod07 1 TPASS : mknod failed as expected: > > > TEST_ERRNO=EACCES(13): Permission denied > > > mknod07 2 TPASS : mknod failed as expected: > > > TEST_ERRNO=EACCES(13): Permission denied > > > mknod07 3 TFAIL : mknod07.c:155: mknod succeeded unexpectedly > > > mknod07 4 TPASS : mknod failed as expected: > > > TEST_ERRNO=EPERM(1): Operation not permitted > > > mknod07 5 TPASS : mknod failed as expected: > > > TEST_ERRNO=EROFS(30): Read-only file system > > > mknod07 6 TPASS : mknod failed as expected: > > > TEST_ERRNO=ELOOP(40): Too many levels of symbolic links > > > > > > > > > setns01 0 TINFO : ns_name=ipc, ns_fds[0]=6, ns_types[0]=0x800 > > > setns01 0 TINFO : ns_name=mnt, ns_fds[1]=7, ns_types[1]=0x2 > > > setns01 0 TINFO : ns_name=net, ns_fds[2]=8, ns_types[2]=0x4000 > > > setns01 0 TINFO : ns_name=pid, ns_fds[3]=9, ns_types[3]=0x2000 > > > setns01 0 TINFO : ns_name=uts, ns_fds[4]=10, ns_types[4]=0x400 > > > setns01 0 TINFO : setns(-1, 0x800) > > > setns01 1 TPASS : invalid fd exp_errno=9 > > > setns01 0 TINFO : setns(-1, 0x2) > > > setns01 2 TPASS : invalid fd exp_errno=9 > > > setns01 0 TINFO : setns(-1, 0x4000) > > > setns01 3 TPASS : invalid fd exp_errno=9 > > > setns01 0 TINFO : setns(-1, 0x2000) > > > setns01 4 TPASS : invalid fd exp_errno=9 > > > setns01
[PATCH] loop: Fix wrong masking of status flags
In faf1d25440d6, loop_set_status() now assigns lo_status directly from the passed in lo_flags, but then fixes it up by masking out flags that can't be set by LOOP_SET_STATUS; unfortunately the mask was negated. Re-ran all ltp ioctl_loop tests, and they all passed. Pass run of the previously failing one: tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s tst_device.c:88: INFO: Found free device 0 '/dev/loop0' ioctl_loop01.c:49: PASS: /sys/block/loop0/loop/partscan = 0 ioctl_loop01.c:50: PASS: /sys/block/loop0/loop/autoclear = 0 ioctl_loop01.c:51: PASS: /sys/block/loop0/loop/backing_file = '/tmp/ZRJ6H4/test.img' ioctl_loop01.c:65: PASS: get expected lo_flag 12 ioctl_loop01.c:67: PASS: /sys/block/loop0/loop/partscan = 1 ioctl_loop01.c:68: PASS: /sys/block/loop0/loop/autoclear = 1 ioctl_loop01.c:77: PASS: access /dev/loop0p1 succeeds ioctl_loop01.c:83: PASS: access /sys/block/loop0/loop0p1 succeeds Summary: passed 8 failed 0 skipped 0 warnings 0 Fixes: faf1d25440d6 ("loop: Clean up LOOP_SET_STATUS lo_flags handling") Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4212288ab157..ad63e4247868 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1390,7 +1390,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) goto out_unfreeze; /* Mask out flags that can't be set using LOOP_SET_STATUS. */ - lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS; + lo->lo_flags &= LOOP_SET_STATUS_SETTABLE_FLAGS; /* For those flags, use the previous values instead */ lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS; /* For flags that can't be cleared, use previous values too */ -- 2.27.0.278.ge193c7cf3a9-goog
Re: LTP: syscalls: regression on mainline - ioctl_loop01 mknod07 setns01
Hi Naresh, I suspect the loop failures are due to faf1d25440d6ad06d509dada4b6fe62fea844370 ("loop: Clean up LOOP_SET_STATUS lo_flags handling"), I will investigate and get back to you. Thanks, Martijn On Thu, Jun 4, 2020 at 7:19 PM Naresh Kamboju wrote: > > + linux-bl...@vger.kernel.org > > On Thu, 4 Jun 2020 at 22:47, Naresh Kamboju wrote: > > > > Following three test cases reported as regression on Linux mainline kernel > > on x86_64, arm64, arm and i386 > > > > ltp-syscalls-tests: > > * ioctl_loop01 > > * mknod07 > > * setns01 > > > > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > git branch: master > > GOOD: > > git commit: b23c4771ff62de8ca9b5e4a2d64491b2fb6f8f69 > > git describe: v5.7-1230-gb23c4771ff62 > > BAD: > > git commit: 1ee08de1e234d95b5b4f866878b72fceb5372904 > > git describe: v5.7-3523-g1ee08de1e234 > > > > kernel-config: > > https://builds.tuxbuild.com/U3bU0dMA62OVHb4DvZIVuw/kernel.config > > > > We are investigating these failures. > > > > tst_test.c:906: CONF: btrfs driver not available > > tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s > > tst_device.c:88: INFO: Found free device 1 '/dev/loop1' > > ioctl_loop01.c:49: PASS: /sys/block/loop1/loop/partscan = 0 > > [ 1073.639677] loop_set_status: loop1 () has still dirty pages (nrpages=1) > > ioctl_loop01.c:50: PASS: /sys/block/loop1/loop/autoclear = 0 > > ioctl_loop01.c:51: PASS: /sys/block/loop1/loop/backing_file = > > '/scratch/ltp-mnIdulzriQ/9cPtLQ/test.img' > > ioctl_loop01.c:63: FAIL: expect 12 but got 17 > > ioctl_loop01.c:67: FAIL: /sys/block/loop1/loop/partscan != 1 got 0 > > ioctl_loop01.c:68: FAIL: /sys/block/loop1/loop/autoclear != 1 got 0 > > ioctl_loop01.c:79: FAIL: access /dev/loop1p1 fails > > [ 1073.679678] loop_set_status: loop1 () has still dirty pages (nrpages=1) > > ioctl_loop01.c:85: FAIL: access /sys/block/loop1/loop1p1 fails > > > > HINT: You _MAY_ be missing kernel fixes, see: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10c70d95c0f2 > > > > mke2fs 1.43.8 (1-Jan-2018) > > [ 1264.711379] EXT4-fs (loop0): mounting ext2 file system using the > > ext4 subsystem > > [ 1264.716642] EXT4-fs (loop0): mounted filesystem without journal. Opts: > > (null) > > mknod07 0 TINFO : Using test device LTP_DEV='/dev/loop0' > > mknod07 0 TINFO : Formatting /dev/loop0 with ext2 opts='' extra > > opts='' > > mknod07 1 TPASS : mknod failed as expected: > > TEST_ERRNO=EACCES(13): Permission denied > > mknod07 2 TPASS : mknod failed as expected: > > TEST_ERRNO=EACCES(13): Permission denied > > mknod07 3 TFAIL : mknod07.c:155: mknod succeeded unexpectedly > > mknod07 4 TPASS : mknod failed as expected: > > TEST_ERRNO=EPERM(1): Operation not permitted > > mknod07 5 TPASS : mknod failed as expected: > > TEST_ERRNO=EROFS(30): Read-only file system > > mknod07 6 TPASS : mknod failed as expected: > > TEST_ERRNO=ELOOP(40): Too many levels of symbolic links > > > > > > setns01 0 TINFO : ns_name=ipc, ns_fds[0]=6, ns_types[0]=0x800 > > setns01 0 TINFO : ns_name=mnt, ns_fds[1]=7, ns_types[1]=0x2 > > setns01 0 TINFO : ns_name=net, ns_fds[2]=8, ns_types[2]=0x4000 > > setns01 0 TINFO : ns_name=pid, ns_fds[3]=9, ns_types[3]=0x2000 > > setns01 0 TINFO : ns_name=uts, ns_fds[4]=10, ns_types[4]=0x400 > > setns01 0 TINFO : setns(-1, 0x800) > > setns01 1 TPASS : invalid fd exp_errno=9 > > setns01 0 TINFO : setns(-1, 0x2) > > setns01 2 TPASS : invalid fd exp_errno=9 > > setns01 0 TINFO : setns(-1, 0x4000) > > setns01 3 TPASS : invalid fd exp_errno=9 > > setns01 0 TINFO : setns(-1, 0x2000) > > setns01 4 TPASS : invalid fd exp_errno=9 > > setns01 0 TINFO : setns(-1, 0x400) > > setns01 5 TPASS : invalid fd exp_errno=9 > > setns01 0 TINFO : setns(11, 0x800) > > setns01 6 TFAIL : setns01.c:176: regular file fd exp_errno=22: > > errno=EBADF(9): Bad file descriptor > > setns01 0 TINFO : setns(11, 0x2) > > setns01 7 TFAIL : setns01.c:176: regular file fd exp_errno=22: > > errno=EBADF(9): Bad file descriptor > > setns01 0 TINFO : setns(11, 0x4000) > > setns01 8 TFAIL : setns01.c:176: regular file fd exp_errno=22: > > errno=EBADF(9): Bad file descriptor > > setns01 0 TINFO : setns(11, 0x2000) > > setns01 9 TFAIL : setns01.c:176: regular file fd exp_errno=22: > > errno=EBADF(9): Bad file descriptor > > setns01 0 TINFO : setns(11, 0x400) > > setns0110 TFAIL : setns01.c:176: regular file fd exp_errno=22: > > errno=EBADF(9): Bad file descriptor > > > > Full test log link, > > https://lkft.validation.linaro.org/scheduler/job/1467931#L8047 > > > > test results comparison shows this test case started failing from > > June-2-2020 > >
Re: Writeback bug causing writeback stalls
On Mon, Jun 1, 2020 at 11:09 AM Jan Kara wrote: > Thanks for testing! My test run has completed fine so I'll submit patches > for review. But I'm curious what's causing the dips in throughput in your > test... It turned out to be unrelated to your patch. Sorry for the noise! We have the patch in dogfood on some of our devices, and I will let you know if we run into any issues. I'll also spend some more time reviewing your patches and will respond to them later. Thanks, Martijn > > Honza > > -- > Jan Kara > SUSE Labs, CR
Re: Writeback bug causing writeback stalls
Hi Jan, On Fri, May 29, 2020 at 5:20 PM Jan Kara wrote: > I understand. I have written a fix (attached). Currently its under testing > together with other cleanups. If everything works fine, I plan to submit > the patches on Monday. Thanks a lot for the quick fix! I ran my usual way to reproduce the problem, and did not see it, so that's good! I do observe write speed dips - eg we usually sustain 180 MB/s on this device, but now it regularly dips down to 10 MB/s, then jumps back up again. That might be unrelated to your patch though, I will run more tests over the weekend and report back! Best, Martijn > > Honza > -- > Jan Kara > SUSE Labs, CR
Re: Writeback bug causing writeback stalls
Hi Jan, On Mon, May 25, 2020 at 9:31 AM Jan Kara wrote: > Well, most importantly filesystems like ext4, xfs, btrfs don't hold i_rwsem > when writing back inode and that's deliberate because of performance. We > don't want to block writes (or event reads in case of XFS) for the inode > during writeback. Thanks for clarifying, that makes sense. By the way, do you have an ETA for your fix? We are under some time pressure to get this fixed in our downstream kernels, but I'd much rather take a fix from upstream from somebody who knows this code well. Alternatively, I can take a stab at the idea you proposed and send a patch to LKML for review this week. Thanks, Martijn > > Honza > -- > Jan Kara > SUSE Labs, CR
Re: Writeback bug causing writeback stalls
Jaegeuk wondered whether callers of write_inode_now() should hold i_rwsem, and whether that would also prevent this problem. Some existing callers of write_inode_now() do, eg ntfs and hfs: hfs_file_fsync() inode_lock(inode); /* sync the inode to buffers */ ret = write_inode_now(inode, 0); but there are also some that don't (eg fat, fuse, orangefs). Thanks, Martijn On Fri, May 22, 2020 at 5:36 PM Jan Kara wrote: > > On Fri 22-05-20 17:23:30, Martijn Coenen wrote: > > [ dropped android-storage-c...@google.com from CC: since that list > > can't receive emails from outside google.com - sorry about that ] > > > > Hi Jan, > > > > On Fri, May 22, 2020 at 4:41 PM Jan Kara wrote: > > > > The easiest way to fix this, I think, is to call requeue_inode() at the > > > > end of > > > > writeback_single_inode(), much like it is called from > > > > writeback_sb_inodes(). > > > > However, requeue_inode() has the following ominous warning: > > > > > > > > /* > > > > * Find proper writeback list for the inode depending on its current > > > > state and > > > > * possibly also change of its state while we were doing writeback. > > > > Here we > > > > * handle things such as livelock prevention or fairness of writeback > > > > among > > > > * inodes. This function can be called only by flusher thread - noone > > > > else > > > > * processes all inodes in writeback lists and requeueing inodes behind > > > > flusher > > > > * thread's back can have unexpected consequences. > > > > */ > > > > > > > > Obviously this is very critical code both from a correctness and a > > > > performance > > > > point of view, so I wanted to run this by the maintainers and folks who > > > > have > > > > contributed to this code first. > > > > > > Sadly, the fix won't be so easy. The main problem with calling > > > requeue_inode() from writeback_single_inode() is that if there's parallel > > > sync(2) call, inode->i_io_list is used to track all inodes that need > > > writing > > > before sync(2) can complete. So requeueing inodes in parallel while > > > sync(2) > > > runs can result in breaking data integrity guarantees of it. > > > > Ah, makes sense. > > > > > But I agree > > > we need to find some mechanism to safely move inode to appropriate dirty > > > list reasonably quickly. > > > > > > Probably I'd add an inode state flag telling that inode is queued for > > > writeback by flush worker and we won't touch dirty lists in that case, > > > otherwise we are safe to update current writeback list as needed. I'll > > > work > > > on fixing this as when I was reading the code I've noticed there are other > > > quirks in the code as well. Thanks for the report! > > > > Thanks! While looking at the code I also saw some other paths that > > appeared to be racy, though I haven't worked them out in detail to > > confirm that - the locking around the inode and writeback lists is > > tricky. What's the best way to follow up on those? Happy to post them > > to this same thread after I spend a bit more time looking at the code. > > Sure, if you are aware some some other problems, just write them to this > thread. FWIW stuff that I've found so far: > > 1) __I_DIRTY_TIME_EXPIRED setting in move_expired_inodes() can get lost as > there are other places doing RMW modifications of inode->i_state. > > 2) sync(2) is prone to livelocks as when we queue inodes from b_dirty_time > list, we don't take dirtied_when into account (and that's the only thing > that makes sure aggressive dirtier cannot livelock sync). > > Honza > -- > Jan Kara > SUSE Labs, CR
Re: Writeback bug causing writeback stalls
[ dropped android-storage-c...@google.com from CC: since that list can't receive emails from outside google.com - sorry about that ] Hi Jan, On Fri, May 22, 2020 at 4:41 PM Jan Kara wrote: > > The easiest way to fix this, I think, is to call requeue_inode() at the end > > of > > writeback_single_inode(), much like it is called from writeback_sb_inodes(). > > However, requeue_inode() has the following ominous warning: > > > > /* > > * Find proper writeback list for the inode depending on its current state > > and > > * possibly also change of its state while we were doing writeback. Here we > > * handle things such as livelock prevention or fairness of writeback among > > * inodes. This function can be called only by flusher thread - noone else > > * processes all inodes in writeback lists and requeueing inodes behind > > flusher > > * thread's back can have unexpected consequences. > > */ > > > > Obviously this is very critical code both from a correctness and a > > performance > > point of view, so I wanted to run this by the maintainers and folks who have > > contributed to this code first. > > Sadly, the fix won't be so easy. The main problem with calling > requeue_inode() from writeback_single_inode() is that if there's parallel > sync(2) call, inode->i_io_list is used to track all inodes that need writing > before sync(2) can complete. So requeueing inodes in parallel while sync(2) > runs can result in breaking data integrity guarantees of it. Ah, makes sense. > But I agree > we need to find some mechanism to safely move inode to appropriate dirty > list reasonably quickly. > > Probably I'd add an inode state flag telling that inode is queued for > writeback by flush worker and we won't touch dirty lists in that case, > otherwise we are safe to update current writeback list as needed. I'll work > on fixing this as when I was reading the code I've noticed there are other > quirks in the code as well. Thanks for the report! Thanks! While looking at the code I also saw some other paths that appeared to be racy, though I haven't worked them out in detail to confirm that - the locking around the inode and writeback lists is tricky. What's the best way to follow up on those? Happy to post them to this same thread after I spend a bit more time looking at the code. Thanks, Martijn > > Honza > > -- > Jan Kara > SUSE Labs, CR
Writeback bug causing writeback stalls
Hi, We've been working on implementing a FUSE filesystem in Android, and have run into what appears to be a bug in the kernel writeback code. The problem that we observed is that an inode in the filesystem is on the b_dirty_time list, but its i_state field does have I_DIRTY_PAGES set, which I think means that the inode is on the wrong list. This condition doesn't appear to fix itself even as new pages are dirtied, because __mark_inode_dirty() has this check: if ((inode->i_state & flags) != flags) { before considering moving the inode to another list. Since the inode already has I_DIRTY_PAGES set, we're not going to move it to the dirty list. I *think* the only way the inode gets out of this condition is whenever we handle the b_dirty_time list, which can take a while. The reason we even noticed this bug in the first place is that FUSE has a very low wb max_ratio by default (1), and so at some point processes get stuck in balance_dirty_pages_ratelimited(), waiting for pages to be written. They hold the inode's write lock, and when other processes try to acquire it, they get stuck. We have a watchdog that reboots the device after ~10 mins of a task being stuck in D state, and this triggered regularly in some tests. After careful studying of the kernel code, I found a reliable repro scenario for this condition, which is described in more detail below. But essentially what I think is happening is this: __mark_inode_dirty() has an early return condition for when a sync is in progress, where it updates the inode i_state but not the writeback list: inode->i_state |= flags; /* * If the inode is being synced, just update its dirty state. * The unlocker will place the inode on the appropriate * superblock list, based upon its state. */ if (inode->i_state & I_SYNC) goto out_unlock_inode; now this comment is true for the generic flusher threads, which run writeback_sb_inodes(), which calls requeue_inode() to move the inode back to the correct wb list when the sync is done. However, there is another function that uses I_SYNC: writeback_single_inode(). This function has some comments saying it prefers not to touch writeback lists, and in fact only removes it if the inode is clean: /* * Skip inode if it is clean and we have no outstanding writeback in * WB_SYNC_ALL mode. We don't want to mess with writeback lists in this * function since flusher thread may be doing for example sync in * parallel and if we move the inode, it could get skipped. So here we * make sure inode is on some writeback list and leave it there unless * we have completely cleaned the inode. */ writeback_single_inode() is called from a few functions, in particular write_inode_now(). write_inode_now() is called by FUSE's flush() f_ops. So, the sequence of events is something like this. Let's assume the inode is already on b_dirty_time for valid reasons. Then: CPU1 CPU2 fuse_flush() write_inode_now() writeback_single_inode() sets I_SYNC __writeback_single_inode() writes back data clears inode dirty flags unlocks inode calls mark_inode_dirty_sync() sets I_DIRTY_SYNC, but doesn't update wb list because I_SYNC is still set write() // somebody else writes mark_inode_dirty(I_DIRTY_PAGES) sets I_DIRTY_PAGES on i_state doesn't update wb list, because I_SYNC set locks inode again sees inode is still dirty, doesn't touch WB list clears I_SYNC So now we have an inode on b_dirty_time with I_DIRTY_PAGES | I_DIRTY_SYNC set, and subsequent calls to mark_inode_dirty() with either I_DIRTY_PAGES or I_DIRTY_SYNC will do nothing to change that. The flusher won't touch the inode either, because it's not on a b_dirty or b_io list. The easiest way to fix this, I think, is to call requeue_inode() at the end of writeback_single_inode(), much like it is called from writeback_sb_inodes(). However, requeue_inode() has the following ominous warning: /* * Find proper writeback list for the inode depending on its current state and * possibly also change of its state while we were doing writeback. Here we * handle things such as livelock prevention or fairness of writeback among * inodes. This function can be called only by flusher thread - noone else * processes all inodes in writeback lists and requeueing inodes behind flusher * thread's back can have unexpected consequences. */ Obviously this is very critical code both from a correctness and a performance point of view, so I wanted to run this by the maintainers and folks who have contributed to this code first. The way I got to reproduce this
Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl
On Wed, May 13, 2020 at 12:22 PM Christoph Hellwig wrote: > > On Wed, May 13, 2020 at 09:07:43AM +0200, Martijn Coenen wrote: > > On Wed, May 13, 2020 at 4:30 AM Jens Axboe wrote: > > > > Looks acceptable to me, but I'm getting a failure applying it to > > > > for-5.8/drivers on this patch: > > > > > > > > Applying: loop: Refactor loop_set_status() size calculation > > > > > > > > So you'll probably want to respin on the right branch. > > > > This series depends on a separate bugfix I sent to LKML earlier - see > > https://lkml.org/lkml/2020/3/31/755 . I mentioned it in [00/10] of > > this series, but perhaps I should have just included that patch. > > > > I just verified that patch + this series still applies cleanly on your > > for-5.8/drivers tree, but if you prefer I send a v5 with that patch > > going first let me know. > > You probably want to resend with the fix includes as the first patch. > And drop the truncation check now that we figured out that we don't > actually need it. Just sent v5, thanks! Martijn
[PATCH v5 01/11] loop: Call loop_config_discard() only after new config is applied
loop_set_status() calls loop_config_discard() to configure discard for the loop device; however, the discard configuration depends on whether the loop device uses encryption, and when we call it the encryption configuration has not been updated yet. Move the call down so we apply the correct discard configuration based on the new configuration. Reviewed-by: Christoph Hellwig Reviewed-by: Bob Liu Reviewed-by: Bart Van Assche Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index da693e6a834e..f1754262fc94 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1334,8 +1334,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) } } - loop_config_discard(lo); - memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE); lo->lo_file_name[LO_NAME_SIZE-1] = 0; @@ -1359,6 +1357,8 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) lo->lo_key_owner = uid; } + loop_config_discard(lo); + /* update dio if lo_offset or transfer is changed */ __loop_update_dio(lo, lo->use_dio); -- 2.26.2.645.ge9eca65c58-goog
[PATCH v5 02/11] loop: Remove sector_t truncation checks
sector_t is now always u64, so we don't need to check for truncation. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 21 +++-- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index f1754262fc94..00de7fec0ed5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -228,24 +228,20 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) blk_mq_unfreeze_queue(lo->lo_queue); } -static int +static void figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) { loff_t size = get_size(offset, sizelimit, lo->lo_backing_file); - sector_t x = (sector_t)size; struct block_device *bdev = lo->lo_device; - if (unlikely((loff_t)x != size)) - return -EFBIG; if (lo->lo_offset != offset) lo->lo_offset = offset; if (lo->lo_sizelimit != sizelimit) lo->lo_sizelimit = sizelimit; - set_capacity(lo->lo_disk, x); + set_capacity(lo->lo_disk, size); bd_set_size(bdev, (loff_t)get_capacity(bdev->bd_disk) << 9); /* let user-space know about the new size */ kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); - return 0; } static inline int @@ -1003,10 +999,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, !file->f_op->write_iter) lo_flags |= LO_FLAGS_READ_ONLY; - error = -EFBIG; size = get_loop_size(lo, file); - if ((loff_t)(sector_t)size != size) - goto out_unlock; + error = loop_prepare_queue(lo); if (error) goto out_unlock; @@ -1328,10 +1322,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) lo->lo_device->bd_inode->i_mapping->nrpages); goto out_unfreeze; } - if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit)) { - err = -EFBIG; - goto out_unfreeze; - } + figure_loop_size(lo, info->lo_offset, info->lo_sizelimit); } memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); @@ -1534,7 +1525,9 @@ static int loop_set_capacity(struct loop_device *lo) if (unlikely(lo->lo_state != Lo_bound)) return -ENXIO; - return figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit); + figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit); + + return 0; } static int loop_set_dio(struct loop_device *lo, unsigned long arg) -- 2.26.2.645.ge9eca65c58-goog
[PATCH v5 11/11] loop: Add LOOP_CONFIGURE ioctl
This allows userspace to completely setup a loop device with a single ioctl, removing the in-between state where the device can be partially configured - eg the loop device has a backing file associated with it, but is reading from the wrong offset. Besides removing the intermediate state, another big benefit of this ioctl is that LOOP_SET_STATUS can be slow; the main reason for this slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to freeze the associated queue; this requires waiting for RCU synchronization, which I've measured can take about 15-20ms on this device on average. In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can also be used to: - Set the correct block size immediately by setting loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE) - Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO) - Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY in loop_config.info.lo_flags Here's setting up ~70 regular loop devices with an offset on an x86 Android device, using LOOP_SET_FD and LOOP_SET_STATUS: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m03.40s real 0m00.02s user 0m00.03s system Here's configuring ~70 devices in the same way, but using a modified losetup that uses the new LOOP_CONFIGURE ioctl: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m01.94s real 0m00.01s user 0m00.01s system Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 104 -- include/uapi/linux/loop.h | 21 2 files changed, 97 insertions(+), 28 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 13518ba191f5..a565c5aafa52 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -228,6 +228,19 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) blk_mq_unfreeze_queue(lo->lo_queue); } +/** + * loop_validate_block_size() - validates the passed in block size + * @bsize: size to validate + */ +static int +loop_validate_block_size(unsigned short bsize) +{ + if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize)) + return -EINVAL; + + return 0; +} + /** * loop_set_size() - sets device size and notifies userspace * @lo: struct loop_device to set the size for @@ -1050,23 +1063,24 @@ loop_set_status_from_info(struct loop_device *lo, return 0; } -static int loop_set_fd(struct loop_device *lo, fmode_t mode, - struct block_device *bdev, unsigned int arg) +static int loop_configure(struct loop_device *lo, fmode_t mode, + struct block_device *bdev, + const struct loop_config *config) { struct file *file; struct inode*inode; struct address_space *mapping; struct block_device *claimed_bdev = NULL; - int lo_flags = 0; int error; loff_t size; boolpartscan; + unsigned short bsize; /* This is safe, since we have a reference from open(). */ __module_get(THIS_MODULE); error = -EBADF; - file = fget(arg); + file = fget(config->fd); if (!file) goto out; @@ -1075,7 +1089,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, * here to avoid changing device under exclusive owner. */ if (!(mode & FMODE_EXCL)) { - claimed_bdev = bd_start_claiming(bdev, loop_set_fd); + claimed_bdev = bd_start_claiming(bdev, loop_configure); if (IS_ERR(claimed_bdev)) { error = PTR_ERR(claimed_bdev); goto out_putf; @@ -1097,11 +,26 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, mapping = file->f_mapping; inode = mapping->host; + size = get_loop_size(lo, file); + + if ((config->info.lo_flags & ~LOOP_CONFIGURE_SETTABLE_FLAGS) != 0) { + error = -EINVAL; + goto out_unlock; + } + + if (config->block_size) { + error = loop_validate_block_size(config->block_size); + if (error) + goto out_unlock; + } + + error = loop_set_status_from_info(lo, >info); + if (error) + goto out_unlock; + if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) || !file->f_op->write_iter) - lo_flags |= LO_FLAGS_READ_ONLY; - - size = get_loop_size(lo, file); + lo->lo_flags |= LO_FLAGS_READ_ONLY; error = loop_prepare_queue(lo); if (er
[PATCH v5 10/11] loop: Clean up LOOP_SET_STATUS lo_flags handling
LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this explicit by updating the UAPI to include the flags that can be set/cleared using this ioctl. The implementation can then blindly take over the passed in flags, and use the previous flags for those flags that can't be set / cleared using LOOP_SET_STATUS. Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 19 +-- include/uapi/linux/loop.h | 10 -- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 31f10da4945e..13518ba191f5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1036,9 +1036,7 @@ loop_set_status_from_info(struct loop_device *lo, lo->transfer = xfer->transfer; lo->ioctl = xfer->ioctl; - if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) != -(info->lo_flags & LO_FLAGS_AUTOCLEAR)) - lo->lo_flags ^= LO_FLAGS_AUTOCLEAR; + lo->lo_flags = info->lo_flags; lo->lo_encrypt_key_size = info->lo_encrypt_key_size; lo->lo_init[0] = info->lo_init[0]; @@ -1323,6 +1321,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) int err; struct block_device *bdev; kuid_t uid = current_uid(); + int prev_lo_flags; bool partscan = false; bool size_changed = false; @@ -1359,10 +1358,19 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) goto out_unfreeze; } + prev_lo_flags = lo->lo_flags; + err = loop_set_status_from_info(lo, info); if (err) goto out_unfreeze; + /* Mask out flags that can't be set using LOOP_SET_STATUS. */ + lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS; + /* For those flags, use the previous values instead */ + lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS; + /* For flags that can't be cleared, use previous values too */ + lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_CLEARABLE_FLAGS; + if (size_changed) { loff_t new_size = get_size(lo->lo_offset, lo->lo_sizelimit, lo->lo_backing_file); @@ -1377,9 +1385,8 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) out_unfreeze: blk_mq_unfreeze_queue(lo->lo_queue); - if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) && -!(lo->lo_flags & LO_FLAGS_PARTSCAN)) { - lo->lo_flags |= LO_FLAGS_PARTSCAN; + if (!err && (lo->lo_flags & LO_FLAGS_PARTSCAN) && +!(prev_lo_flags & LO_FLAGS_PARTSCAN)) { lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN; bdev = lo->lo_device; partscan = true; diff --git a/include/uapi/linux/loop.h b/include/uapi/linux/loop.h index 080a8df134ef..6b32fee80ce0 100644 --- a/include/uapi/linux/loop.h +++ b/include/uapi/linux/loop.h @@ -25,6 +25,12 @@ enum { LO_FLAGS_DIRECT_IO = 16, }; +/* LO_FLAGS that can be set using LOOP_SET_STATUS(64) */ +#define LOOP_SET_STATUS_SETTABLE_FLAGS (LO_FLAGS_AUTOCLEAR | LO_FLAGS_PARTSCAN) + +/* LO_FLAGS that can be cleared using LOOP_SET_STATUS(64) */ +#define LOOP_SET_STATUS_CLEARABLE_FLAGS (LO_FLAGS_AUTOCLEAR) + #include/* for __kernel_old_dev_t */ #include/* for __u64 */ @@ -37,7 +43,7 @@ struct loop_info { intlo_offset; intlo_encrypt_type; intlo_encrypt_key_size; /* ioctl w/o */ - intlo_flags;/* ioctl r/o */ + intlo_flags; char lo_name[LO_NAME_SIZE]; unsigned char lo_encrypt_key[LO_KEY_SIZE]; /* ioctl w/o */ unsigned long lo_init[2]; @@ -53,7 +59,7 @@ struct loop_info64 { __u32 lo_number; /* ioctl r/o */ __u32 lo_encrypt_type; __u32 lo_encrypt_key_size; /* ioctl w/o */ - __u32 lo_flags;/* ioctl r/o */ + __u32 lo_flags; __u8 lo_file_name[LO_NAME_SIZE]; __u8 lo_crypt_name[LO_NAME_SIZE]; __u8 lo_encrypt_key[LO_KEY_SIZE]; /* ioctl w/o */ -- 2.26.2.645.ge9eca65c58-goog
[PATCH v5 06/11] loop: Remove figure_loop_size()
This function was now only used by loop_set_capacity(). Just open code the remaining code in the caller instead. Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index c134b3439483..e281a9f03d96 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -245,14 +245,6 @@ static void loop_set_size(struct loop_device *lo, loff_t size) set_capacity_revalidate_and_notify(lo->lo_disk, size, false); } -static void -figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) -{ - loff_t size = get_size(offset, sizelimit, lo->lo_backing_file); - - loop_set_size(lo, size); -} - static inline int lo_do_transfer(struct loop_device *lo, int cmd, struct page *rpage, unsigned roffs, @@ -1534,10 +1526,13 @@ loop_get_status64(struct loop_device *lo, struct loop_info64 __user *arg) { static int loop_set_capacity(struct loop_device *lo) { + loff_t size; + if (unlikely(lo->lo_state != Lo_bound)) return -ENXIO; - figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit); + size = get_loop_size(lo, lo->lo_backing_file); + loop_set_size(lo, size); return 0; } -- 2.26.2.645.ge9eca65c58-goog
[PATCH v5 09/11] loop: Rework lo_ioctl() __user argument casting
In preparation for a new ioctl that needs to copy_from_user(); makes the code easier to read as well. Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4dc11d954169..31f10da4945e 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1634,6 +1634,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { struct loop_device *lo = bdev->bd_disk->private_data; + void __user *argp = (void __user *) arg; int err; switch (cmd) { @@ -1646,21 +1647,19 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode, case LOOP_SET_STATUS: err = -EPERM; if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) { - err = loop_set_status_old(lo, - (struct loop_info __user *)arg); + err = loop_set_status_old(lo, argp); } break; case LOOP_GET_STATUS: - return loop_get_status_old(lo, (struct loop_info __user *) arg); + return loop_get_status_old(lo, argp); case LOOP_SET_STATUS64: err = -EPERM; if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) { - err = loop_set_status64(lo, - (struct loop_info64 __user *) arg); + err = loop_set_status64(lo, argp); } break; case LOOP_GET_STATUS64: - return loop_get_status64(lo, (struct loop_info64 __user *) arg); + return loop_get_status64(lo, argp); case LOOP_SET_CAPACITY: case LOOP_SET_DIRECT_IO: case LOOP_SET_BLOCK_SIZE: -- 2.26.2.645.ge9eca65c58-goog
[PATCH v5 03/11] loop: Factor out setting loop device size
This code is used repeatedly. Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 00de7fec0ed5..e69ff3c19eff 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -228,20 +228,35 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) blk_mq_unfreeze_queue(lo->lo_queue); } +/** + * loop_set_size() - sets device size and notifies userspace + * @lo: struct loop_device to set the size for + * @size: new size of the loop device + * + * Callers must validate that the size passed into this function fits into + * a sector_t, eg using loop_validate_size() + */ +static void loop_set_size(struct loop_device *lo, loff_t size) +{ + struct block_device *bdev = lo->lo_device; + + set_capacity(lo->lo_disk, size); + bd_set_size(bdev, size << SECTOR_SHIFT); + /* let user-space know about the new size */ + kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); +} + static void figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) { loff_t size = get_size(offset, sizelimit, lo->lo_backing_file); - struct block_device *bdev = lo->lo_device; if (lo->lo_offset != offset) lo->lo_offset = offset; if (lo->lo_sizelimit != sizelimit) lo->lo_sizelimit = sizelimit; - set_capacity(lo->lo_disk, size); - bd_set_size(bdev, (loff_t)get_capacity(bdev->bd_disk) << 9); - /* let user-space know about the new size */ - kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); + + loop_set_size(lo, size); } static inline int @@ -1034,11 +1049,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, loop_update_rotational(lo); loop_update_dio(lo); - set_capacity(lo->lo_disk, size); - bd_set_size(bdev, size << 9); loop_sysfs_init(lo); - /* let user-space know about the new size */ - kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); + loop_set_size(lo, size); set_blocksize(bdev, S_ISBLK(inode->i_mode) ? block_size(inode->i_bdev) : PAGE_SIZE); -- 2.26.2.645.ge9eca65c58-goog
[PATCH v5 07/11] loop: Factor out configuring loop from status
Factor out this code into a separate function, so it can be reused by other code more easily. Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 117 +-- 1 file changed, 67 insertions(+), 50 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index e281a9f03d96..6a4c0ba225ca 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1251,75 +1251,43 @@ static int loop_clr_fd(struct loop_device *lo) return __loop_clr_fd(lo, false); } +/** + * loop_set_status_from_info - configure device from loop_info + * @lo: struct loop_device to configure + * @info: struct loop_info64 to configure the device with + * + * Configures the loop device parameters according to the passed + * in loop_info64 configuration. + */ static int -loop_set_status(struct loop_device *lo, const struct loop_info64 *info) +loop_set_status_from_info(struct loop_device *lo, + const struct loop_info64 *info) { int err; struct loop_func_table *xfer; kuid_t uid = current_uid(); - struct block_device *bdev; - bool partscan = false; - bool size_changed = false; - - err = mutex_lock_killable(_ctl_mutex); - if (err) - return err; - if (lo->lo_encrypt_key_size && - !uid_eq(lo->lo_key_owner, uid) && - !capable(CAP_SYS_ADMIN)) { - err = -EPERM; - goto out_unlock; - } - if (lo->lo_state != Lo_bound) { - err = -ENXIO; - goto out_unlock; - } - if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) { - err = -EINVAL; - goto out_unlock; - } - - if (lo->lo_offset != info->lo_offset || - lo->lo_sizelimit != info->lo_sizelimit) { - size_changed = true; - sync_blockdev(lo->lo_device); - kill_bdev(lo->lo_device); - } - /* I/O need to be drained during transfer transition */ - blk_mq_freeze_queue(lo->lo_queue); - - if (size_changed && lo->lo_device->bd_inode->i_mapping->nrpages) { - /* If any pages were dirtied after kill_bdev(), try again */ - err = -EAGAIN; - pr_warn("%s: loop%d (%s) has still dirty pages (nrpages=%lu)\n", - __func__, lo->lo_number, lo->lo_file_name, - lo->lo_device->bd_inode->i_mapping->nrpages); - goto out_unfreeze; - } + if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) + return -EINVAL; err = loop_release_xfer(lo); if (err) - goto out_unfreeze; + return err; if (info->lo_encrypt_type) { unsigned int type = info->lo_encrypt_type; - if (type >= MAX_LO_CRYPT) { - err = -EINVAL; - goto out_unfreeze; - } + if (type >= MAX_LO_CRYPT) + return -EINVAL; xfer = xfer_funcs[type]; - if (xfer == NULL) { - err = -EINVAL; - goto out_unfreeze; - } + if (xfer == NULL) + return -EINVAL; } else xfer = NULL; err = loop_init_xfer(lo, xfer, info); if (err) - goto out_unfreeze; + return err; lo->lo_offset = info->lo_offset; lo->lo_sizelimit = info->lo_sizelimit; @@ -1346,6 +1314,55 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) lo->lo_key_owner = uid; } + return 0; +} + +static int +loop_set_status(struct loop_device *lo, const struct loop_info64 *info) +{ + int err; + struct block_device *bdev; + kuid_t uid = current_uid(); + bool partscan = false; + bool size_changed = false; + + err = mutex_lock_killable(_ctl_mutex); + if (err) + return err; + if (lo->lo_encrypt_key_size && + !uid_eq(lo->lo_key_owner, uid) && + !capable(CAP_SYS_ADMIN)) { + err = -EPERM; + goto out_unlock; + } + if (lo->lo_state != Lo_bound) { + err = -ENXIO; + goto out_unlock; + } + + if (lo->lo_offset != info->lo_offset || + lo->lo_sizelimit != info->lo_sizelimit) { + size_changed = true; + sync_blockdev(lo->lo_device); + kill_bdev(lo->lo_device); + } + + /* I/O need to be drained during transfer transition */ + blk_mq_freeze_queue(lo->lo_queue); + + if (size_changed &a
[PATCH v5 04/11] loop: Switch to set_capacity_revalidate_and_notify()
This was recently added to block/genhd.c, and takes care of both updating the capacity and notifying userspace of the new size. Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index e69ff3c19eff..d9a756f8abd5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -240,10 +240,9 @@ static void loop_set_size(struct loop_device *lo, loff_t size) { struct block_device *bdev = lo->lo_device; - set_capacity(lo->lo_disk, size); bd_set_size(bdev, size << SECTOR_SHIFT); - /* let user-space know about the new size */ - kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); + + set_capacity_revalidate_and_notify(lo->lo_disk, size, false); } static void -- 2.26.2.645.ge9eca65c58-goog
[PATCH v5 08/11] loop: Move loop_set_status_from_info() and friends up
So we can use it without forward declaration. This is a separate commit to make it easier to verify that this is just a move, without functional modifications. Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 206 +-- 1 file changed, 103 insertions(+), 103 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 6a4c0ba225ca..4dc11d954169 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -949,6 +949,109 @@ static void loop_update_rotational(struct loop_device *lo) blk_queue_flag_clear(QUEUE_FLAG_NONROT, q); } +static int +loop_release_xfer(struct loop_device *lo) +{ + int err = 0; + struct loop_func_table *xfer = lo->lo_encryption; + + if (xfer) { + if (xfer->release) + err = xfer->release(lo); + lo->transfer = NULL; + lo->lo_encryption = NULL; + module_put(xfer->owner); + } + return err; +} + +static int +loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer, + const struct loop_info64 *i) +{ + int err = 0; + + if (xfer) { + struct module *owner = xfer->owner; + + if (!try_module_get(owner)) + return -EINVAL; + if (xfer->init) + err = xfer->init(lo, i); + if (err) + module_put(owner); + else + lo->lo_encryption = xfer; + } + return err; +} + +/** + * loop_set_status_from_info - configure device from loop_info + * @lo: struct loop_device to configure + * @info: struct loop_info64 to configure the device with + * + * Configures the loop device parameters according to the passed + * in loop_info64 configuration. + */ +static int +loop_set_status_from_info(struct loop_device *lo, + const struct loop_info64 *info) +{ + int err; + struct loop_func_table *xfer; + kuid_t uid = current_uid(); + + if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) + return -EINVAL; + + err = loop_release_xfer(lo); + if (err) + return err; + + if (info->lo_encrypt_type) { + unsigned int type = info->lo_encrypt_type; + + if (type >= MAX_LO_CRYPT) + return -EINVAL; + xfer = xfer_funcs[type]; + if (xfer == NULL) + return -EINVAL; + } else + xfer = NULL; + + err = loop_init_xfer(lo, xfer, info); + if (err) + return err; + + lo->lo_offset = info->lo_offset; + lo->lo_sizelimit = info->lo_sizelimit; + memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); + memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE); + lo->lo_file_name[LO_NAME_SIZE-1] = 0; + lo->lo_crypt_name[LO_NAME_SIZE-1] = 0; + + if (!xfer) + xfer = _funcs; + lo->transfer = xfer->transfer; + lo->ioctl = xfer->ioctl; + + if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) != +(info->lo_flags & LO_FLAGS_AUTOCLEAR)) + lo->lo_flags ^= LO_FLAGS_AUTOCLEAR; + + lo->lo_encrypt_key_size = info->lo_encrypt_key_size; + lo->lo_init[0] = info->lo_init[0]; + lo->lo_init[1] = info->lo_init[1]; + if (info->lo_encrypt_key_size) { + memcpy(lo->lo_encrypt_key, info->lo_encrypt_key, + info->lo_encrypt_key_size); + lo->lo_key_owner = uid; + } + + return 0; +} + static int loop_set_fd(struct loop_device *lo, fmode_t mode, struct block_device *bdev, unsigned int arg) { @@ -1070,43 +1173,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, return error; } -static int -loop_release_xfer(struct loop_device *lo) -{ - int err = 0; - struct loop_func_table *xfer = lo->lo_encryption; - - if (xfer) { - if (xfer->release) - err = xfer->release(lo); - lo->transfer = NULL; - lo->lo_encryption = NULL; - module_put(xfer->owner); - } - return err; -} - -static int -loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer, - const struct loop_info64 *i) -{ - int err = 0; - - if (xfer) { - struct module *owner = xfer->owner; - - if (!try_module_get(owner)) - return -EINVAL; - if (xfer->init) - err = xfer->init(lo, i); - if (err) - module_put(owner); -
[PATCH v5 00/11] Add a new LOOP_CONFIGURE ioctl
This series introduces a new ioctl that makes it possible to atomically configure a loop device. Previously, if you wanted to set parameters such as the offset on a loop device, this required calling LOOP_SET_FD to set the backing file, and then LOOP_SET_STATUS to set the offset. However, in between these two calls, the loop device is available and would accept requests, which is generally not desirable. Similar issues exist around setting the block size (LOOP_SET_BLOCK_SIZE) and requesting direct I/O mode (LOOP_SET_DIRECT_IO). There are also performance benefits with combining these ioctls into one, which are described in more detail in the last change in the series. --- v5: - Added "loop: Call loop_config_discard() ..." as a first patch to this series, as the rest of the refactoring done depends on it. - Removed sector_t truncation checks, as suggested by Ming Lei. v4: - Addressed review comments from Christoph Hellwig: -- Minor code cleanups -- Clarified what lo_flags LOOP_SET_STATUS can set and clear, and made that more explicit in the code (see [10/11]) -- LOOP_CONFIGURE can now also be used to configure the block size and to explicitly request Direct I/O and read-only mode. -- Explicitly reject lo_flags we don't know about in LOOP_CONFIGURE -- Renamed LOOP_SET_FD_AND_STATUS to LOOP_CONFIGURE, since the ioctl can now do things LOOP_SET_STATUS couldn't do. v3: - Addressed review comments from Christoph Hellwig: -- Factored out loop_validate_size() -- Split up the largish first patch in a few smaller ones -- Use set_capacity_revalidate_and_notify() - Fixed a variable wrongly using size_t instead of loff_t v2: - Addressed review comments from Bart van Assche: -- Use SECTOR_SHIFT constant -- Renamed loop_set_from_status() to loop_set_status_from_info() -- Added kerneldoc for loop_set_status_from_info() -- Removed dots in patch subject lines - Addressed review comments from Christoph Hellwig: -- Added missing padding in struct loop_fd_and_status -- Cleaned up some __user pointer handling in lo_ioctl -- Pass in a stack-initialized loop_info64 for the legacy LOOP_SET_FD case Martijn Coenen (11): loop: Call loop_config_discard() only after new config is applied loop: Remove sector_t truncation checks loop: Factor out setting loop device size loop: Switch to set_capacity_revalidate_and_notify() loop: Refactor loop_set_status() size calculation loop: Remove figure_loop_size() loop: Factor out configuring loop from status loop: Move loop_set_status_from_info() and friends up loop: Rework lo_ioctl() __user argument casting loop: Clean up LOOP_SET_STATUS lo_flags handling loop: Add LOOP_CONFIGURE ioctl drivers/block/loop.c | 381 ++ include/uapi/linux/loop.h | 31 +++- 2 files changed, 255 insertions(+), 157 deletions(-) -- 2.26.2.645.ge9eca65c58-goog
[PATCH v5 05/11] loop: Refactor loop_set_status() size calculation
figure_loop_size() calculates the loop size based on the passed in parameters, but at the same time it updates the offset and sizelimit parameters in the loop device configuration. That is a somewhat unexpected side effect of a function with this name, and it is only only needed by one of the two callers of this function - loop_set_status(). Move the lo_offset and lo_sizelimit assignment back into loop_set_status(), and use the newly factored out functions to validate and apply the newly calculated size. This allows us to get rid of figure_loop_size() in a follow-up commit. Reviewed-by: Christoph Hellwig Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 37 +++-- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index d9a756f8abd5..c134b3439483 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -250,11 +250,6 @@ figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) { loff_t size = get_size(offset, sizelimit, lo->lo_backing_file); - if (lo->lo_offset != offset) - lo->lo_offset = offset; - if (lo->lo_sizelimit != sizelimit) - lo->lo_sizelimit = sizelimit; - loop_set_size(lo, size); } @@ -1272,6 +1267,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) kuid_t uid = current_uid(); struct block_device *bdev; bool partscan = false; + bool size_changed = false; err = mutex_lock_killable(_ctl_mutex); if (err) @@ -1293,6 +1289,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) if (lo->lo_offset != info->lo_offset || lo->lo_sizelimit != info->lo_sizelimit) { + size_changed = true; sync_blockdev(lo->lo_device); kill_bdev(lo->lo_device); } @@ -1300,6 +1297,15 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) /* I/O need to be drained during transfer transition */ blk_mq_freeze_queue(lo->lo_queue); + if (size_changed && lo->lo_device->bd_inode->i_mapping->nrpages) { + /* If any pages were dirtied after kill_bdev(), try again */ + err = -EAGAIN; + pr_warn("%s: loop%d (%s) has still dirty pages (nrpages=%lu)\n", + __func__, lo->lo_number, lo->lo_file_name, + lo->lo_device->bd_inode->i_mapping->nrpages); + goto out_unfreeze; + } + err = loop_release_xfer(lo); if (err) goto out_unfreeze; @@ -1323,19 +1329,8 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) if (err) goto out_unfreeze; - if (lo->lo_offset != info->lo_offset || - lo->lo_sizelimit != info->lo_sizelimit) { - /* kill_bdev should have truncated all the pages */ - if (lo->lo_device->bd_inode->i_mapping->nrpages) { - err = -EAGAIN; - pr_warn("%s: loop%d (%s) has still dirty pages (nrpages=%lu)\n", - __func__, lo->lo_number, lo->lo_file_name, - lo->lo_device->bd_inode->i_mapping->nrpages); - goto out_unfreeze; - } - figure_loop_size(lo, info->lo_offset, info->lo_sizelimit); - } - + lo->lo_offset = info->lo_offset; + lo->lo_sizelimit = info->lo_sizelimit; memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE); lo->lo_file_name[LO_NAME_SIZE-1] = 0; @@ -1359,6 +1354,12 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) lo->lo_key_owner = uid; } + if (size_changed) { + loff_t new_size = get_size(lo->lo_offset, lo->lo_sizelimit, + lo->lo_backing_file); + loop_set_size(lo, new_size); + } + loop_config_discard(lo); /* update dio if lo_offset or transfer is changed */ -- 2.26.2.645.ge9eca65c58-goog
Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl
On Wed, May 13, 2020 at 4:30 AM Jens Axboe wrote: > > Looks acceptable to me, but I'm getting a failure applying it to > > for-5.8/drivers on this patch: > > > > Applying: loop: Refactor loop_set_status() size calculation > > > > So you'll probably want to respin on the right branch. This series depends on a separate bugfix I sent to LKML earlier - see https://lkml.org/lkml/2020/3/31/755 . I mentioned it in [00/10] of this series, but perhaps I should have just included that patch. I just verified that patch + this series still applies cleanly on your for-5.8/drivers tree, but if you prefer I send a v5 with that patch going first let me know. Thanks, Martijn > > Then you can also drop patch #1. > > -- > Jens Axboe >
Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl
Hi Jens, What do you think of this series? Thanks, Martijn On Wed, Apr 29, 2020 at 4:03 PM Martijn Coenen wrote: > > This allows userspace to completely setup a loop device with a single > ioctl, removing the in-between state where the device can be partially > configured - eg the loop device has a backing file associated with it, > but is reading from the wrong offset. > > Besides removing the intermediate state, another big benefit of this > ioctl is that LOOP_SET_STATUS can be slow; the main reason for this > slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to > freeze the associated queue; this requires waiting for RCU > synchronization, which I've measured can take about 15-20ms on this > device on average. > > In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can > also be used to: > - Set the correct block size immediately by setting > loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE) > - Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO > in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO) > - Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY > in loop_config.info.lo_flags > > Here's setting up ~70 regular loop devices with an offset on an x86 > Android device, using LOOP_SET_FD and LOOP_SET_STATUS: > > vsoc_x86:/system/apex # time for i in `seq 30 100`; > do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done > 0m03.40s real 0m00.02s user 0m00.03s system > > Here's configuring ~70 devices in the same way, but using a modified > losetup that uses the new LOOP_CONFIGURE ioctl: > > vsoc_x86:/system/apex # time for i in `seq 30 100`; > do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done > 0m01.94s real 0m00.01s user 0m00.01s system > > Signed-off-by: Martijn Coenen > --- > drivers/block/loop.c | 107 +++--- > include/uapi/linux/loop.h | 21 > 2 files changed, 99 insertions(+), 29 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index cfbdd99fdb1a..a353ce55fd18 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -241,6 +241,19 @@ loop_validate_size(loff_t size) > return 0; > } > > +/** > + * loop_validate_block_size() - validates the passed in block size > + * @bsize: size to validate > + */ > +static int > +loop_validate_block_size(unsigned short bsize) > +{ > + if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize)) > + return -EINVAL; > + > + return 0; > +} > + > /** > * loop_set_size() - sets device size and notifies userspace > * @lo: struct loop_device to set the size for > @@ -1063,23 +1076,24 @@ loop_set_status_from_info(struct loop_device *lo, > return 0; > } > > -static int loop_set_fd(struct loop_device *lo, fmode_t mode, > - struct block_device *bdev, unsigned int arg) > +static int loop_configure(struct loop_device *lo, fmode_t mode, > + struct block_device *bdev, > + const struct loop_config *config) > { > struct file *file; > struct inode*inode; > struct address_space *mapping; > struct block_device *claimed_bdev = NULL; > - int lo_flags = 0; > int error; > loff_t size; > boolpartscan; > + unsigned short bsize; > > /* This is safe, since we have a reference from open(). */ > __module_get(THIS_MODULE); > > error = -EBADF; > - file = fget(arg); > + file = fget(config->fd); > if (!file) > goto out; > > @@ -1088,7 +1102,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t > mode, > * here to avoid changing device under exclusive owner. > */ > if (!(mode & FMODE_EXCL)) { > - claimed_bdev = bd_start_claiming(bdev, loop_set_fd); > + claimed_bdev = bd_start_claiming(bdev, loop_configure); > if (IS_ERR(claimed_bdev)) { > error = PTR_ERR(claimed_bdev); > goto out_putf; > @@ -1110,44 +1124,58 @@ static int loop_set_fd(struct loop_device *lo, > fmode_t mode, > mapping = file->f_mapping; > inode = mapping->host; > > - if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) || > - !file->f_op->write_iter) > - lo_flags |= LO_FLAGS_READ_ONLY; > - > size = get_loop_size(lo, file); > error = lo
Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl
On Wed, May 6, 2020 at 11:44 AM Michael Kerrisk (man-pages) > > Can we have also a patch for the loop.4 manual page please? Ack, will do when the series lands. Best, Martijn > > Thanks, > > Michael > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/
Re: [PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl
On Wed, May 6, 2020 at 8:09 AM Christoph Hellwig wrote: > > Thanks, this looks very nice! Thanks! And thanks for all the feedback, it has made this series a lot cleaner. > > Reviewed-by: Christoph Hellwig > > Are you also going to submit a patch to util-linux to use the new > ioctl? Yeah, I'm happy to - will do it as soon as the series lands. Best, Martijn
Re: [PATCH] loop: Call loop_config_discard() only after new config is applied.
Hi Jens, Are you ok with this one? One of my later series depends on it, but so far I've kept it separate because it's a bug fix. Thanks, Martijn On Sat, Apr 18, 2020 at 5:46 PM Bart Van Assche wrote: > > On 2020-03-31 04:41, Martijn Coenen wrote: > > loop_set_status() calls loop_config_discard() to configure discard for > > the loop device; however, the discard configuration depends on whether > > the loop device uses encryption, and when we call it the encryption > > configuration has not been updated yet. Move the call down so we apply > > the correct discard configuration based on the new configuration. > > Reviewed-by: Bart Van Assche
Re: [PATCH v4 04/10] loop: Refactor loop_set_status() size calculation
On Fri, May 1, 2020 at 7:30 PM Christoph Hellwig wrote: > > For some reason this fails to apply for me against both > Jens' for-5.8/block and Linus' current tree. > > What is the baseline for this series? This was based on Linus' tree from a week or two ago or so, but I think you're most likely missing this one? https://lkml.org/lkml/2020/3/31/755 (I mentioned it in the cover letter, but can make it a part of this series if you prefer).
Re: [PATCH v4 01/10] loop: Factor out loop size validation
Hi Ming, On Wed, Apr 29, 2020 at 4:12 PM Ming Lei wrote: > Now sector_t has been switched to u64 unconditionally, do we still need such > validation? I think you're right; I hadn't seen that change, but truncating because of sector_t shouldn't be an issue anymore. I wondered if we could actually have a smaller loff_t, but looks like that is 'long long', which should always be 8 bytes as well. I might send this as a separate patch, I don't want to drag this series on for too long. Thanks, Martijn > > > Thanks, > Ming > >
Re: [PATCH v3 0/9] Add a new LOOP_SET_FD_AND_STATUS ioctl
On Tue, Apr 28, 2020 at 4:57 PM Martijn Coenen wrote: > and it allows requesting a partition scan. It makes sense to maintain > that behavior, but what about LO_FLAGS_DIRECT_IO? I think you're > proposing LOOP_SET_STATUS(64) should keep ignoring that like it used > to? I've just sent a v4 which basically implements that and your other suggestions. Thanks, Martijn > > Thanks, > Martijn > > > and then in the main function reject anything not known. > > > > And then maybe add something like 64 bytes of padding to the end of the > > new structure, so that we can use flags to expand to it.
[PATCH v4 01/10] loop: Factor out loop size validation
Ensuring we don't truncate loff_t when casting to sector_t is done in multiple places; factor it out. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index f1754262fc94..396b8bd4d75c 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -228,15 +228,30 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) blk_mq_unfreeze_queue(lo->lo_queue); } +/** + * loop_validate_size() - validates that the passed in size fits in a sector_t + * @size: size to validate + */ +static int +loop_validate_size(loff_t size) +{ + if ((loff_t)(sector_t)size != size) + return -EFBIG; + + return 0; +} + static int figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) { + int err; loff_t size = get_size(offset, sizelimit, lo->lo_backing_file); - sector_t x = (sector_t)size; struct block_device *bdev = lo->lo_device; - if (unlikely((loff_t)x != size)) - return -EFBIG; + err = loop_validate_size(size); + if (err) + return err; + if (lo->lo_offset != offset) lo->lo_offset = offset; if (lo->lo_sizelimit != sizelimit) @@ -1003,9 +1018,9 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, !file->f_op->write_iter) lo_flags |= LO_FLAGS_READ_ONLY; - error = -EFBIG; size = get_loop_size(lo, file); - if ((loff_t)(sector_t)size != size) + error = loop_validate_size(size); + if (error) goto out_unlock; error = loop_prepare_queue(lo); if (error) -- 2.26.2.303.gf8c07b1a785-goog
[PATCH v4 06/10] loop: Factor out configuring loop from status
Factor out this code into a separate function, so it can be reused by other code more easily. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 117 +-- 1 file changed, 67 insertions(+), 50 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 32755e874326..4a36a3f47503 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1266,13 +1266,78 @@ static int loop_clr_fd(struct loop_device *lo) return __loop_clr_fd(lo, false); } +/** + * loop_set_status_from_info - configure device from loop_info + * @lo: struct loop_device to configure + * @info: struct loop_info64 to configure the device with + * + * Configures the loop device parameters according to the passed + * in loop_info64 configuration. + */ static int -loop_set_status(struct loop_device *lo, const struct loop_info64 *info) +loop_set_status_from_info(struct loop_device *lo, + const struct loop_info64 *info) { int err; struct loop_func_table *xfer; kuid_t uid = current_uid(); + + if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) + return -EINVAL; + + err = loop_release_xfer(lo); + if (err) + return err; + + if (info->lo_encrypt_type) { + unsigned int type = info->lo_encrypt_type; + + if (type >= MAX_LO_CRYPT) + return -EINVAL; + xfer = xfer_funcs[type]; + if (xfer == NULL) + return -EINVAL; + } else + xfer = NULL; + + err = loop_init_xfer(lo, xfer, info); + if (err) + return err; + + lo->lo_offset = info->lo_offset; + lo->lo_sizelimit = info->lo_sizelimit; + memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); + memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE); + lo->lo_file_name[LO_NAME_SIZE-1] = 0; + lo->lo_crypt_name[LO_NAME_SIZE-1] = 0; + + if (!xfer) + xfer = _funcs; + lo->transfer = xfer->transfer; + lo->ioctl = xfer->ioctl; + + if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) != +(info->lo_flags & LO_FLAGS_AUTOCLEAR)) + lo->lo_flags ^= LO_FLAGS_AUTOCLEAR; + + lo->lo_encrypt_key_size = info->lo_encrypt_key_size; + lo->lo_init[0] = info->lo_init[0]; + lo->lo_init[1] = info->lo_init[1]; + if (info->lo_encrypt_key_size) { + memcpy(lo->lo_encrypt_key, info->lo_encrypt_key, + info->lo_encrypt_key_size); + lo->lo_key_owner = uid; + } + + return 0; +} + +static int +loop_set_status(struct loop_device *lo, const struct loop_info64 *info) +{ + int err; struct block_device *bdev; + kuid_t uid = current_uid(); bool partscan = false; bool size_changed = false; loff_t validated_size; @@ -1290,10 +1355,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) err = -ENXIO; goto out_unlock; } - if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) { - err = -EINVAL; - goto out_unlock; - } if (lo->lo_offset != info->lo_offset || lo->lo_sizelimit != info->lo_sizelimit) { @@ -1320,54 +1381,10 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) goto out_unfreeze; } - err = loop_release_xfer(lo); + err = loop_set_status_from_info(lo, info); if (err) goto out_unfreeze; - if (info->lo_encrypt_type) { - unsigned int type = info->lo_encrypt_type; - - if (type >= MAX_LO_CRYPT) { - err = -EINVAL; - goto out_unfreeze; - } - xfer = xfer_funcs[type]; - if (xfer == NULL) { - err = -EINVAL; - goto out_unfreeze; - } - } else - xfer = NULL; - - err = loop_init_xfer(lo, xfer, info); - if (err) - goto out_unfreeze; - - lo->lo_offset = info->lo_offset; - lo->lo_sizelimit = info->lo_sizelimit; - memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); - memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE); - lo->lo_file_name[LO_NAME_SIZE-1] = 0; - lo->lo_crypt_name[LO_NAME_SIZE-1] = 0; - - if (!xfer) - xfer = _funcs; - lo->transfer = xfer->transfer; - lo->ioctl = xfer->ioctl; - - if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) != -(info->lo_flags & LO_FLAGS_AUTOCLEAR)) - lo->lo_f
[PATCH v4 02/10] loop: Factor out setting loop device size
This code is used repeatedly. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 31 ++- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 396b8bd4d75c..6643e48ad71c 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -241,12 +241,29 @@ loop_validate_size(loff_t size) return 0; } +/** + * loop_set_size() - sets device size and notifies userspace + * @lo: struct loop_device to set the size for + * @size: new size of the loop device + * + * Callers must validate that the size passed into this function fits into + * a sector_t, eg using loop_validate_size() + */ +static void loop_set_size(struct loop_device *lo, loff_t size) +{ + struct block_device *bdev = lo->lo_device; + + set_capacity(lo->lo_disk, size); + bd_set_size(bdev, size << SECTOR_SHIFT); + /* let user-space know about the new size */ + kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); +} + static int figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) { int err; loff_t size = get_size(offset, sizelimit, lo->lo_backing_file); - struct block_device *bdev = lo->lo_device; err = loop_validate_size(size); if (err) @@ -256,10 +273,9 @@ figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) lo->lo_offset = offset; if (lo->lo_sizelimit != sizelimit) lo->lo_sizelimit = sizelimit; - set_capacity(lo->lo_disk, x); - bd_set_size(bdev, (loff_t)get_capacity(bdev->bd_disk) << 9); - /* let user-space know about the new size */ - kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); + + loop_set_size(lo, size); + return 0; } @@ -1055,11 +1071,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, loop_update_rotational(lo); loop_update_dio(lo); - set_capacity(lo->lo_disk, size); - bd_set_size(bdev, size << 9); loop_sysfs_init(lo); - /* let user-space know about the new size */ - kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); + loop_set_size(lo, size); set_blocksize(bdev, S_ISBLK(inode->i_mode) ? block_size(inode->i_bdev) : PAGE_SIZE); -- 2.26.2.303.gf8c07b1a785-goog
[PATCH v4 05/10] loop: Remove figure_loop_size()
This function was now only used by loop_set_capacity(). Just open code the remaining code in the caller instead. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 28 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 9f5913879921..32755e874326 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -258,21 +258,6 @@ static void loop_set_size(struct loop_device *lo, loff_t size) set_capacity_revalidate_and_notify(lo->lo_disk, size, false); } -static int -figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) -{ - int err; - loff_t size = get_size(offset, sizelimit, lo->lo_backing_file); - - err = loop_validate_size(size); - if (err) - return err; - - loop_set_size(lo, size); - - return 0; -} - static inline int lo_do_transfer(struct loop_device *lo, int cmd, struct page *rpage, unsigned roffs, @@ -1560,10 +1545,21 @@ loop_get_status64(struct loop_device *lo, struct loop_info64 __user *arg) { static int loop_set_capacity(struct loop_device *lo) { + int err; + loff_t size; + if (unlikely(lo->lo_state != Lo_bound)) return -ENXIO; - return figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit); + size = get_loop_size(lo, lo->lo_backing_file); + + err = loop_validate_size(size); + if (err) + return err; + + loop_set_size(lo, size); + + return 0; } static int loop_set_dio(struct loop_device *lo, unsigned long arg) -- 2.26.2.303.gf8c07b1a785-goog
[PATCH v4 09/10] loop: Clean up LOOP_SET_STATUS lo_flags handling.
LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this explicit by updating the UAPI to include the flags that can be set/cleared using this ioctl. The implementation can then blindly take over the passed in flags, and use the previous flags for those flags that can't be set / cleared using LOOP_SET_STATUS. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 19 +-- include/uapi/linux/loop.h | 10 -- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index f172a64333aa..cfbdd99fdb1a 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1049,9 +1049,7 @@ loop_set_status_from_info(struct loop_device *lo, lo->transfer = xfer->transfer; lo->ioctl = xfer->ioctl; - if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) != -(info->lo_flags & LO_FLAGS_AUTOCLEAR)) - lo->lo_flags ^= LO_FLAGS_AUTOCLEAR; + lo->lo_flags = info->lo_flags; lo->lo_encrypt_key_size = info->lo_encrypt_key_size; lo->lo_init[0] = info->lo_init[0]; @@ -1338,6 +1336,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) int err; struct block_device *bdev; kuid_t uid = current_uid(); + int prev_lo_flags; bool partscan = false; bool size_changed = false; loff_t validated_size; @@ -1381,10 +1380,19 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) goto out_unfreeze; } + prev_lo_flags = lo->lo_flags; + err = loop_set_status_from_info(lo, info); if (err) goto out_unfreeze; + /* Mask out flags that can't be set using LOOP_SET_STATUS. */ + lo->lo_flags &= ~LOOP_SET_STATUS_SETTABLE_FLAGS; + /* For those flags, use the previous values instead */ + lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS; + /* For flags that can't be cleared, use previous values too */ + lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_CLEARABLE_FLAGS; + if (size_changed) loop_set_size(lo, validated_size); @@ -1396,9 +1404,8 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) out_unfreeze: blk_mq_unfreeze_queue(lo->lo_queue); - if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) && -!(lo->lo_flags & LO_FLAGS_PARTSCAN)) { - lo->lo_flags |= LO_FLAGS_PARTSCAN; + if (!err && (lo->lo_flags & LO_FLAGS_PARTSCAN) && +!(prev_lo_flags & LO_FLAGS_PARTSCAN)) { lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN; bdev = lo->lo_device; partscan = true; diff --git a/include/uapi/linux/loop.h b/include/uapi/linux/loop.h index 080a8df134ef..6b32fee80ce0 100644 --- a/include/uapi/linux/loop.h +++ b/include/uapi/linux/loop.h @@ -25,6 +25,12 @@ enum { LO_FLAGS_DIRECT_IO = 16, }; +/* LO_FLAGS that can be set using LOOP_SET_STATUS(64) */ +#define LOOP_SET_STATUS_SETTABLE_FLAGS (LO_FLAGS_AUTOCLEAR | LO_FLAGS_PARTSCAN) + +/* LO_FLAGS that can be cleared using LOOP_SET_STATUS(64) */ +#define LOOP_SET_STATUS_CLEARABLE_FLAGS (LO_FLAGS_AUTOCLEAR) + #include/* for __kernel_old_dev_t */ #include/* for __u64 */ @@ -37,7 +43,7 @@ struct loop_info { intlo_offset; intlo_encrypt_type; intlo_encrypt_key_size; /* ioctl w/o */ - intlo_flags;/* ioctl r/o */ + intlo_flags; char lo_name[LO_NAME_SIZE]; unsigned char lo_encrypt_key[LO_KEY_SIZE]; /* ioctl w/o */ unsigned long lo_init[2]; @@ -53,7 +59,7 @@ struct loop_info64 { __u32 lo_number; /* ioctl r/o */ __u32 lo_encrypt_type; __u32 lo_encrypt_key_size; /* ioctl w/o */ - __u32 lo_flags;/* ioctl r/o */ + __u32 lo_flags; __u8 lo_file_name[LO_NAME_SIZE]; __u8 lo_crypt_name[LO_NAME_SIZE]; __u8 lo_encrypt_key[LO_KEY_SIZE]; /* ioctl w/o */ -- 2.26.2.303.gf8c07b1a785-goog
[PATCH v4 08/10] loop: Rework lo_ioctl() __user argument casting
In preparation for a new ioctl that needs to copy_from_user(); makes the code easier to read as well. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 60ba1ed95d77..f172a64333aa 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1659,6 +1659,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { struct loop_device *lo = bdev->bd_disk->private_data; + void __user *argp = (void __user *) arg; int err; switch (cmd) { @@ -1671,21 +1672,19 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode, case LOOP_SET_STATUS: err = -EPERM; if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) { - err = loop_set_status_old(lo, - (struct loop_info __user *)arg); + err = loop_set_status_old(lo, argp); } break; case LOOP_GET_STATUS: - return loop_get_status_old(lo, (struct loop_info __user *) arg); + return loop_get_status_old(lo, argp); case LOOP_SET_STATUS64: err = -EPERM; if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) { - err = loop_set_status64(lo, - (struct loop_info64 __user *) arg); + err = loop_set_status64(lo, argp); } break; case LOOP_GET_STATUS64: - return loop_get_status64(lo, (struct loop_info64 __user *) arg); + return loop_get_status64(lo, argp); case LOOP_SET_CAPACITY: case LOOP_SET_DIRECT_IO: case LOOP_SET_BLOCK_SIZE: -- 2.26.2.303.gf8c07b1a785-goog
[PATCH v4 10/10] loop: Add LOOP_CONFIGURE ioctl
This allows userspace to completely setup a loop device with a single ioctl, removing the in-between state where the device can be partially configured - eg the loop device has a backing file associated with it, but is reading from the wrong offset. Besides removing the intermediate state, another big benefit of this ioctl is that LOOP_SET_STATUS can be slow; the main reason for this slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to freeze the associated queue; this requires waiting for RCU synchronization, which I've measured can take about 15-20ms on this device on average. In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can also be used to: - Set the correct block size immediately by setting loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE) - Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO) - Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY in loop_config.info.lo_flags Here's setting up ~70 regular loop devices with an offset on an x86 Android device, using LOOP_SET_FD and LOOP_SET_STATUS: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m03.40s real 0m00.02s user 0m00.03s system Here's configuring ~70 devices in the same way, but using a modified losetup that uses the new LOOP_CONFIGURE ioctl: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m01.94s real 0m00.01s user 0m00.01s system Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 107 +++--- include/uapi/linux/loop.h | 21 2 files changed, 99 insertions(+), 29 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index cfbdd99fdb1a..a353ce55fd18 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -241,6 +241,19 @@ loop_validate_size(loff_t size) return 0; } +/** + * loop_validate_block_size() - validates the passed in block size + * @bsize: size to validate + */ +static int +loop_validate_block_size(unsigned short bsize) +{ + if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize)) + return -EINVAL; + + return 0; +} + /** * loop_set_size() - sets device size and notifies userspace * @lo: struct loop_device to set the size for @@ -1063,23 +1076,24 @@ loop_set_status_from_info(struct loop_device *lo, return 0; } -static int loop_set_fd(struct loop_device *lo, fmode_t mode, - struct block_device *bdev, unsigned int arg) +static int loop_configure(struct loop_device *lo, fmode_t mode, + struct block_device *bdev, + const struct loop_config *config) { struct file *file; struct inode*inode; struct address_space *mapping; struct block_device *claimed_bdev = NULL; - int lo_flags = 0; int error; loff_t size; boolpartscan; + unsigned short bsize; /* This is safe, since we have a reference from open(). */ __module_get(THIS_MODULE); error = -EBADF; - file = fget(arg); + file = fget(config->fd); if (!file) goto out; @@ -1088,7 +1102,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, * here to avoid changing device under exclusive owner. */ if (!(mode & FMODE_EXCL)) { - claimed_bdev = bd_start_claiming(bdev, loop_set_fd); + claimed_bdev = bd_start_claiming(bdev, loop_configure); if (IS_ERR(claimed_bdev)) { error = PTR_ERR(claimed_bdev); goto out_putf; @@ -1110,44 +1124,58 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, mapping = file->f_mapping; inode = mapping->host; - if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) || - !file->f_op->write_iter) - lo_flags |= LO_FLAGS_READ_ONLY; - size = get_loop_size(lo, file); error = loop_validate_size(size); if (error) goto out_unlock; + + if ((config->info.lo_flags & ~LOOP_CONFIGURE_SETTABLE_FLAGS) != 0) { + error = -EINVAL; + goto out_unlock; + } + + if (config->block_size) { + error = loop_validate_block_size(config->block_size); + if (error) + goto out_unlock; + } + + error = loop_set_status_from_info(lo, >info); + if (error) + goto out_unlock; + + if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) || + !file->f_op->write_iter) + lo->
[PATCH v4 07/10] loop: Move loop_set_status_from_info() and friends up
So we can use it without forward declaration. This is a separate commit to make it easier to verify that this is just a move, without functional modifications. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 206 +-- 1 file changed, 103 insertions(+), 103 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4a36a3f47503..60ba1ed95d77 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -962,6 +962,109 @@ static void loop_update_rotational(struct loop_device *lo) blk_queue_flag_clear(QUEUE_FLAG_NONROT, q); } +static int +loop_release_xfer(struct loop_device *lo) +{ + int err = 0; + struct loop_func_table *xfer = lo->lo_encryption; + + if (xfer) { + if (xfer->release) + err = xfer->release(lo); + lo->transfer = NULL; + lo->lo_encryption = NULL; + module_put(xfer->owner); + } + return err; +} + +static int +loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer, + const struct loop_info64 *i) +{ + int err = 0; + + if (xfer) { + struct module *owner = xfer->owner; + + if (!try_module_get(owner)) + return -EINVAL; + if (xfer->init) + err = xfer->init(lo, i); + if (err) + module_put(owner); + else + lo->lo_encryption = xfer; + } + return err; +} + +/** + * loop_set_status_from_info - configure device from loop_info + * @lo: struct loop_device to configure + * @info: struct loop_info64 to configure the device with + * + * Configures the loop device parameters according to the passed + * in loop_info64 configuration. + */ +static int +loop_set_status_from_info(struct loop_device *lo, + const struct loop_info64 *info) +{ + int err; + struct loop_func_table *xfer; + kuid_t uid = current_uid(); + + if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) + return -EINVAL; + + err = loop_release_xfer(lo); + if (err) + return err; + + if (info->lo_encrypt_type) { + unsigned int type = info->lo_encrypt_type; + + if (type >= MAX_LO_CRYPT) + return -EINVAL; + xfer = xfer_funcs[type]; + if (xfer == NULL) + return -EINVAL; + } else + xfer = NULL; + + err = loop_init_xfer(lo, xfer, info); + if (err) + return err; + + lo->lo_offset = info->lo_offset; + lo->lo_sizelimit = info->lo_sizelimit; + memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); + memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE); + lo->lo_file_name[LO_NAME_SIZE-1] = 0; + lo->lo_crypt_name[LO_NAME_SIZE-1] = 0; + + if (!xfer) + xfer = _funcs; + lo->transfer = xfer->transfer; + lo->ioctl = xfer->ioctl; + + if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) != +(info->lo_flags & LO_FLAGS_AUTOCLEAR)) + lo->lo_flags ^= LO_FLAGS_AUTOCLEAR; + + lo->lo_encrypt_key_size = info->lo_encrypt_key_size; + lo->lo_init[0] = info->lo_init[0]; + lo->lo_init[1] = info->lo_init[1]; + if (info->lo_encrypt_key_size) { + memcpy(lo->lo_encrypt_key, info->lo_encrypt_key, + info->lo_encrypt_key_size); + lo->lo_key_owner = uid; + } + + return 0; +} + static int loop_set_fd(struct loop_device *lo, fmode_t mode, struct block_device *bdev, unsigned int arg) { @@ -1085,43 +1188,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, return error; } -static int -loop_release_xfer(struct loop_device *lo) -{ - int err = 0; - struct loop_func_table *xfer = lo->lo_encryption; - - if (xfer) { - if (xfer->release) - err = xfer->release(lo); - lo->transfer = NULL; - lo->lo_encryption = NULL; - module_put(xfer->owner); - } - return err; -} - -static int -loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer, - const struct loop_info64 *i) -{ - int err = 0; - - if (xfer) { - struct module *owner = xfer->owner; - - if (!try_module_get(owner)) - return -EINVAL; - if (xfer->init) - err = xfer->init(lo, i); - if (err) - module_put(owner); - else - lo->lo
[PATCH v4 00/10] Add a new LOOP_CONFIGURE ioctl
This series introduces a new ioctl that makes it possible to atomically configure a loop device. Previously, if you wanted to set parameters such as the offset on a loop device, this required calling LOOP_SET_FD to set the backing file, and then LOOP_SET_STATUS to set the offset. However, in between these two calls, the loop device is available and would accept requests, which is generally not desirable. Similar issues exist around setting the block size (LOOP_SET_BLOCK_SIZE) and requesting direct I/O mode (LOOP_SET_DIRECT_IO). There are also performance benefits with combining these ioctls into one, which are described in more detail in the last change in the series. Note that this series depends on "loop: Call loop_config_discard() only after new config is applied." [0], which I sent as a separate patch as it fixes an unrelated bug. [0]: https://lkml.org/lkml/2020/3/31/755 --- v4: - Addressed review comments from Christoph Hellwig: -- Minor code cleanups -- Clarified what lo_flags LOOP_SET_STATUS can set and clear, and made that more explicit in the code (see [9/10]) -- LOOP_CONFIGURE can now also be used to configure the block size and to explicitly request Direct I/O and read-only mode. -- Explicitly reject lo_flags we don't know about in LOOP_CONFIGURE -- Renamed LOOP_SET_FD_AND_STATUS to LOOP_CONFIGURE, since the ioctl can now do things LOOP_SET_STATUS couldn't do. v3: - Addressed review comments from Christoph Hellwig: -- Factored out loop_validate_size() -- Split up the largish first patch in a few smaller ones -- Use set_capacity_revalidate_and_notify() - Fixed a variable wrongly using size_t instead of loff_t v2: - Addressed review comments from Bart van Assche: -- Use SECTOR_SHIFT constant -- Renamed loop_set_from_status() to loop_set_status_from_info() -- Added kerneldoc for loop_set_status_from_info() -- Removed dots in patch subject lines - Addressed review comments from Christoph Hellwig: -- Added missing padding in struct loop_fd_and_status -- Cleaned up some __user pointer handling in lo_ioctl -- Pass in a stack-initialized loop_info64 for the legacy LOOP_SET_FD case Martijn Coenen (10): loop: Factor out loop size validation loop: Factor out setting loop device size loop: Switch to set_capacity_revalidate_and_notify() loop: Refactor loop_set_status() size calculation loop: Remove figure_loop_size() loop: Factor out configuring loop from status loop: Move loop_set_status_from_info() and friends up loop: Rework lo_ioctl() __user argument casting loop: Clean up LOOP_SET_STATUS lo_flags handling. loop: Add LOOP_CONFIGURE ioctl drivers/block/loop.c | 407 +++--- include/uapi/linux/loop.h | 31 ++- 2 files changed, 281 insertions(+), 157 deletions(-) -- 2.26.2.303.gf8c07b1a785-goog
[PATCH v4 04/10] loop: Refactor loop_set_status() size calculation
figure_loop_size() calculates the loop size based on the passed in parameters, but at the same time it updates the offset and sizelimit parameters in the loop device configuration. That is a somewhat unexpected side effect of a function with this name, and it is only only needed by one of the two callers of this function - loop_set_status(). Move the lo_offset and lo_sizelimit assignment back into loop_set_status(), and use the newly factored out functions to validate and apply the newly calculated size. This allows us to get rid of figure_loop_size() in a follow-up commit. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 44 +++- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 8f3194c2b8aa..9f5913879921 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -268,11 +268,6 @@ figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit) if (err) return err; - if (lo->lo_offset != offset) - lo->lo_offset = offset; - if (lo->lo_sizelimit != sizelimit) - lo->lo_sizelimit = sizelimit; - loop_set_size(lo, size); return 0; @@ -1294,6 +1289,8 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) kuid_t uid = current_uid(); struct block_device *bdev; bool partscan = false; + bool size_changed = false; + loff_t validated_size; err = mutex_lock_killable(_ctl_mutex); if (err) @@ -1315,6 +1312,13 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) if (lo->lo_offset != info->lo_offset || lo->lo_sizelimit != info->lo_sizelimit) { + loff_t size = get_size(info->lo_offset, info->lo_sizelimit, + lo->lo_backing_file); + err = loop_validate_size(size); + if (err) + goto out_unlock; + size_changed = true; + validated_size = size; sync_blockdev(lo->lo_device); kill_bdev(lo->lo_device); } @@ -1322,6 +1326,15 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) /* I/O need to be drained during transfer transition */ blk_mq_freeze_queue(lo->lo_queue); + if (size_changed && lo->lo_device->bd_inode->i_mapping->nrpages) { + /* If any pages were dirtied after kill_bdev(), try again */ + err = -EAGAIN; + pr_warn("%s: loop%d (%s) has still dirty pages (nrpages=%lu)\n", + __func__, lo->lo_number, lo->lo_file_name, + lo->lo_device->bd_inode->i_mapping->nrpages); + goto out_unfreeze; + } + err = loop_release_xfer(lo); if (err) goto out_unfreeze; @@ -1345,22 +1358,8 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) if (err) goto out_unfreeze; - if (lo->lo_offset != info->lo_offset || - lo->lo_sizelimit != info->lo_sizelimit) { - /* kill_bdev should have truncated all the pages */ - if (lo->lo_device->bd_inode->i_mapping->nrpages) { - err = -EAGAIN; - pr_warn("%s: loop%d (%s) has still dirty pages (nrpages=%lu)\n", - __func__, lo->lo_number, lo->lo_file_name, - lo->lo_device->bd_inode->i_mapping->nrpages); - goto out_unfreeze; - } - if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit)) { - err = -EFBIG; - goto out_unfreeze; - } - } - + lo->lo_offset = info->lo_offset; + lo->lo_sizelimit = info->lo_sizelimit; memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE); lo->lo_file_name[LO_NAME_SIZE-1] = 0; @@ -1384,6 +1383,9 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) lo->lo_key_owner = uid; } + if (size_changed) + loop_set_size(lo, validated_size); + loop_config_discard(lo); /* update dio if lo_offset or transfer is changed */ -- 2.26.2.303.gf8c07b1a785-goog
[PATCH v4 03/10] loop: Switch to set_capacity_revalidate_and_notify()
This was recently added to block/genhd.c, and takes care of both updating the capacity and notifying userspace of the new size. Signed-off-by: Martijn Coenen --- drivers/block/loop.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 6643e48ad71c..8f3194c2b8aa 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -253,10 +253,9 @@ static void loop_set_size(struct loop_device *lo, loff_t size) { struct block_device *bdev = lo->lo_device; - set_capacity(lo->lo_disk, size); bd_set_size(bdev, size << SECTOR_SHIFT); - /* let user-space know about the new size */ - kobject_uevent(_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); + + set_capacity_revalidate_and_notify(lo->lo_disk, size, false); } static int -- 2.26.2.303.gf8c07b1a785-goog
Re: [PATCH v3 0/9] Add a new LOOP_SET_FD_AND_STATUS ioctl
On Tue, Apr 28, 2020 at 9:02 AM Christoph Hellwig wrote: > I think reusing LO_FLAGS_DIRECT_IO makes sense to me - we have 32 > flags in the existing flags field (at least for loop_info64), so > we might as well use the field and the flags. Then we need flags > validation in that we don't accept new flags through the old > interface, and the new one validates that no unknown flags are passed. > > E.g. in the LOOP_SET_STATUS / LOOP_SET_STATUS64 handler do: > > lo->lo_flags &= ~(LO_LEGACY_FLAGS); > mmm, I thought lo_flags was read-only in LOOP_SET_STATUS(64): __u32 lo_flags;/* ioctl r/o */ but it looks LO_FLAGS_AUTOCLEAR is writable: if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) != (info->lo_flags & LO_FLAGS_AUTOCLEAR)) lo->lo_flags ^= LO_FLAGS_AUTOCLEAR; and it allows requesting a partition scan. It makes sense to maintain that behavior, but what about LO_FLAGS_DIRECT_IO? I think you're proposing LOOP_SET_STATUS(64) should keep ignoring that like it used to? Thanks, Martijn > and then in the main function reject anything not known. > > And then maybe add something like 64 bytes of padding to the end of the > new structure, so that we can use flags to expand to it.
Re: [PATCH 4.9 30/47] ANDROID: binder: remove waitqueue when thread exits.
On Sun, Oct 6, 2019 at 7:23 PM Greg Kroah-Hartman wrote: > > From: Martijn Coenen > > commit f5cb779ba16334b45ba8946d6bfa6d9834d1527f upstream. > > binder_poll() passes the thread->wait waitqueue that > can be slept on for work. When a thread that uses > epoll explicitly exits using BINDER_THREAD_EXIT, > the waitqueue is freed, but it is never removed > from the corresponding epoll data structure. When > the process subsequently exits, the epoll cleanup > code tries to access the waitlist, which results in > a use-after-free. > > Prevent this by using POLLFREE when the thread exits. > > Signed-off-by: Martijn Coenen > Reported-by: syzbot > Cc: stable # 4.14 > [backport BINDER_LOOPER_STATE_POLL logic as well] > Signed-off-by: Mattias Nissler > Signed-off-by: Greg Kroah-Hartman > --- > drivers/android/binder.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > > --- a/drivers/android/binder.c > +++ b/drivers/android/binder.c > @@ -334,7 +334,8 @@ enum { > BINDER_LOOPER_STATE_EXITED = 0x04, > BINDER_LOOPER_STATE_INVALID = 0x08, > BINDER_LOOPER_STATE_WAITING = 0x10, > - BINDER_LOOPER_STATE_NEED_RETURN = 0x20 > + BINDER_LOOPER_STATE_NEED_RETURN = 0x20, > + BINDER_LOOPER_STATE_POLL= 0x40, > }; > > struct binder_thread { > @@ -2628,6 +2629,18 @@ static int binder_free_thread(struct bin > } else > BUG(); > } > + > + /* > +* If this thread used poll, make sure we remove the waitqueue > +* from any epoll data structures holding it with POLLFREE. > +* waitqueue_active() is safe to use here because we're holding > +* the inner lock. This should be "global lock" in 4.9 and 4.4 :) Otherwise LGTM, thanks! Martijn > +*/ > + if ((thread->looper & BINDER_LOOPER_STATE_POLL) && > + waitqueue_active(>wait)) { > + wake_up_poll(>wait, POLLHUP | POLLFREE); > + } > + > if (send_reply) > binder_send_failed_reply(send_reply, BR_DEAD_REPLY); > binder_release_work(>todo); > @@ -2651,6 +2664,8 @@ static unsigned int binder_poll(struct f > return POLLERR; > } > > + thread->looper |= BINDER_LOOPER_STATE_POLL; > + > wait_for_proc_work = thread->transaction_stack == NULL && > list_empty(>todo) && thread->return_error == BR_OK; > > >
Re: [PATCH 4.9 30/47] ANDROID: binder: remove waitqueue when thread exits.
On Mon, Oct 7, 2019 at 8:28 AM Mattias Nissler wrote: > Jann's PoC calls the BINDER_THREAD_EXIT ioctl to free the > binder_thread which will then cause the UAF, and this is cut off by > the patch. IIUC, you are worried about a similar AUF on the proc->wait > access. I am not 100% sure, but I think the binder_proc lifetime > matches the corresponding struct file instance, so it shouldn't be > possible to get the binder_proc deallocated while still being able to > access it via filp->private_data. Yes, I think this is correct; either the binder fd is closed first, in which case eventpoll_release() removes the waitqueue from the list before it is freed (before binder's release() is called); instead if the epoll fd is closed first, it will likewise remove the waitqueue itself, before binder_proc can be freed.. I don't know the __fput() code that well, but at first glance it seems these two can't overlap. The whole problem with BINDER_THREAD_EXIT was that the returned waitqueue wasn't tied to the lifetime of the underlying file. Apologies for not spotting this needed a backport BTW - I refactored the wait code heavily somewhere between 4.9 and 4.14, and somehow didn't realize the same problem existed in the old code. Thanks, Martijn > > > > > > > wait_for_proc_work = thread->transaction_stack == NULL && > > > list_empty(>todo) && thread->return_error == > > > BR_OK; > > > > > > binder_unlock(__func__); > > > > > > if (wait_for_proc_work) { > > > if (binder_has_proc_work(proc, thread)) > > > return POLLIN; > > > poll_wait(filp, >wait, wait); > > > if (binder_has_proc_work(proc, thread)) > > > return POLLIN; > > > } else { > > > if (binder_has_thread_work(thread)) > > > return POLLIN; > > > poll_wait(filp, >wait, wait); > > > if (binder_has_thread_work(thread)) > > > return POLLIN; > > > } > > > return 0; > > > > I _think_ the backport is correct, and I know someone has verified that > > the 4.4.y backport works properly and I don't see much difference here > > from that version. > > > > But I will defer to Todd and Martijn here, as they know this code _WAY_ > > better than I do. The codebase has changed a lot from 4.9.y to 4.14.y > > so it makes it hard to do equal comparisons simply. > > > > Todd and Martijn, thoughts? > > > > thanks, > > > > greg k-h
[PATCH] binder: Set end of SG buffer area properly.
In case the target node requests a security context, the extra_buffers_size is increased with the size of the security context. But, that size is not available for use by regular scatter-gather buffers; make sure the ending of that buffer is marked correctly. Acked-by: Todd Kjos Fixes: ec74136ded79 ("binder: create node flag to request sender's security context") Signed-off-by: Martijn Coenen Cc: sta...@vger.kernel.org # 5.1+ --- drivers/android/binder.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 38a59a630cd4c..5bde08603fbc2 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -3239,7 +3239,8 @@ static void binder_transaction(struct binder_proc *proc, buffer_offset = off_start_offset; off_end_offset = off_start_offset + tr->offsets_size; sg_buf_offset = ALIGN(off_end_offset, sizeof(void *)); - sg_buf_end_offset = sg_buf_offset + extra_buffers_size; + sg_buf_end_offset = sg_buf_offset + extra_buffers_size - + ALIGN(secctx_sz, sizeof(u64)); off_min = 0; for (buffer_offset = off_start_offset; buffer_offset < off_end_offset; buffer_offset += sizeof(binder_size_t)) { -- 2.22.0.410.gd8fdbe21b5-goog
Re: [PATCH] MAINTAINERS: update Android binder
Thanks Christian for signing up! On Thu, Dec 20, 2018 at 12:09 PM Greg KH wrote: > Ok, no objection from me, Todd/Martijn/Joel/Arve, can I get an ack from > someone? Acked-By: Martijn Coenen > > thanks, > > greg k-h
[PATCH v2] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
This allows the context manager to retrieve information about nodes that it holds a reference to, such as the current number of references to those nodes. Such information can for example be used to determine whether the servicemanager is the only process holding a reference to a node. This information can then be passed on to the process holding the node, which can in turn decide whether it wants to shut down to reduce resource usage. Signed-off-by: Martijn Coenen --- v2: made sure reserved fields are aligned, and enforce caller zeroes all fields except handle, as suggested by Dan Carpenter. drivers/android/binder.c| 55 + include/uapi/linux/android/binder.h | 10 ++ 2 files changed, 65 insertions(+) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index d58763b6b0090..5b25412e15ccf 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -4544,6 +4544,42 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp) return ret; } +static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc, + struct binder_node_info_for_ref *info) +{ + struct binder_node *node; + struct binder_context *context = proc->context; + __u32 handle = info->handle; + + if (info->strong_count || info->weak_count || info->reserved1 || + info->reserved2 || info->reserved3) { + binder_user_error("%d BINDER_GET_NODE_INFO_FOR_REF: only handle may be non-zero.", + proc->pid); + return -EINVAL; + } + + /* This ioctl may only be used by the context manager */ + mutex_lock(>context_mgr_node_lock); + if (!context->binder_context_mgr_node || + context->binder_context_mgr_node->proc != proc) { + mutex_unlock(>context_mgr_node_lock); + return -EPERM; + } + mutex_unlock(>context_mgr_node_lock); + + node = binder_get_node_from_ref(proc, handle, true, NULL); + if (!node) + return -EINVAL; + + info->strong_count = node->local_strong_refs + + node->internal_strong_refs; + info->weak_count = node->local_weak_refs; + + binder_put_node(node); + + return 0; +} + static int binder_ioctl_get_node_debug_info(struct binder_proc *proc, struct binder_node_debug_info *info) { @@ -4638,6 +4674,25 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } break; } + case BINDER_GET_NODE_INFO_FOR_REF: { + struct binder_node_info_for_ref info; + + if (copy_from_user(, ubuf, sizeof(info))) { + ret = -EFAULT; + goto err; + } + + ret = binder_ioctl_get_node_info_for_ref(proc, ); + if (ret < 0) + goto err; + + if (copy_to_user(ubuf, , sizeof(info))) { + ret = -EFAULT; + goto err; + } + + break; + } case BINDER_GET_NODE_DEBUG_INFO: { struct binder_node_debug_info info; diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h index bfaec6903b8bc..b9ba520f7e4bb 100644 --- a/include/uapi/linux/android/binder.h +++ b/include/uapi/linux/android/binder.h @@ -200,6 +200,15 @@ struct binder_node_debug_info { __u32has_weak_ref; }; +struct binder_node_info_for_ref { + __u32handle; + __u32strong_count; + __u32weak_count; + __u32reserved1; + __u32reserved2; + __u32reserved3; +}; + #define BINDER_WRITE_READ _IOWR('b', 1, struct binder_write_read) #define BINDER_SET_IDLE_TIMEOUT_IOW('b', 3, __s64) #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32) @@ -208,6 +217,7 @@ struct binder_node_debug_info { #define BINDER_THREAD_EXIT _IOW('b', 8, __s32) #define BINDER_VERSION _IOWR('b', 9, struct binder_version) #define BINDER_GET_NODE_DEBUG_INFO _IOWR('b', 11, struct binder_node_debug_info) +#define BINDER_GET_NODE_INFO_FOR_REF _IOWR('b', 12, struct binder_node_info_for_ref) /* * NOTE: Two special error codes you should check for when calling -- 2.19.0.rc2.392.g5ba43deb5a-goog
[PATCH v2] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
This allows the context manager to retrieve information about nodes that it holds a reference to, such as the current number of references to those nodes. Such information can for example be used to determine whether the servicemanager is the only process holding a reference to a node. This information can then be passed on to the process holding the node, which can in turn decide whether it wants to shut down to reduce resource usage. Signed-off-by: Martijn Coenen --- v2: made sure reserved fields are aligned, and enforce caller zeroes all fields except handle, as suggested by Dan Carpenter. drivers/android/binder.c| 55 + include/uapi/linux/android/binder.h | 10 ++ 2 files changed, 65 insertions(+) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index d58763b6b0090..5b25412e15ccf 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -4544,6 +4544,42 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp) return ret; } +static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc, + struct binder_node_info_for_ref *info) +{ + struct binder_node *node; + struct binder_context *context = proc->context; + __u32 handle = info->handle; + + if (info->strong_count || info->weak_count || info->reserved1 || + info->reserved2 || info->reserved3) { + binder_user_error("%d BINDER_GET_NODE_INFO_FOR_REF: only handle may be non-zero.", + proc->pid); + return -EINVAL; + } + + /* This ioctl may only be used by the context manager */ + mutex_lock(>context_mgr_node_lock); + if (!context->binder_context_mgr_node || + context->binder_context_mgr_node->proc != proc) { + mutex_unlock(>context_mgr_node_lock); + return -EPERM; + } + mutex_unlock(>context_mgr_node_lock); + + node = binder_get_node_from_ref(proc, handle, true, NULL); + if (!node) + return -EINVAL; + + info->strong_count = node->local_strong_refs + + node->internal_strong_refs; + info->weak_count = node->local_weak_refs; + + binder_put_node(node); + + return 0; +} + static int binder_ioctl_get_node_debug_info(struct binder_proc *proc, struct binder_node_debug_info *info) { @@ -4638,6 +4674,25 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } break; } + case BINDER_GET_NODE_INFO_FOR_REF: { + struct binder_node_info_for_ref info; + + if (copy_from_user(, ubuf, sizeof(info))) { + ret = -EFAULT; + goto err; + } + + ret = binder_ioctl_get_node_info_for_ref(proc, ); + if (ret < 0) + goto err; + + if (copy_to_user(ubuf, , sizeof(info))) { + ret = -EFAULT; + goto err; + } + + break; + } case BINDER_GET_NODE_DEBUG_INFO: { struct binder_node_debug_info info; diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h index bfaec6903b8bc..b9ba520f7e4bb 100644 --- a/include/uapi/linux/android/binder.h +++ b/include/uapi/linux/android/binder.h @@ -200,6 +200,15 @@ struct binder_node_debug_info { __u32has_weak_ref; }; +struct binder_node_info_for_ref { + __u32handle; + __u32strong_count; + __u32weak_count; + __u32reserved1; + __u32reserved2; + __u32reserved3; +}; + #define BINDER_WRITE_READ _IOWR('b', 1, struct binder_write_read) #define BINDER_SET_IDLE_TIMEOUT_IOW('b', 3, __s64) #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32) @@ -208,6 +217,7 @@ struct binder_node_debug_info { #define BINDER_THREAD_EXIT _IOW('b', 8, __s32) #define BINDER_VERSION _IOWR('b', 9, struct binder_version) #define BINDER_GET_NODE_DEBUG_INFO _IOWR('b', 11, struct binder_node_debug_info) +#define BINDER_GET_NODE_INFO_FOR_REF _IOWR('b', 12, struct binder_node_info_for_ref) /* * NOTE: Two special error codes you should check for when calling -- 2.19.0.rc2.392.g5ba43deb5a-goog
Re: [PATCH] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
On Wed, Sep 5, 2018 at 11:09 AM, Dan Carpenter wrote: > What's the reserved for? On 64 bit systems there is a 4 byte struct > hole between weak_count and reserved. There's many more pieces of information that we hold for a node. While we don't have a use for most of that now, we may want some of it in the future, and so I thought it would be wise to reserve some space here so we don't need a new ioctl when that happens. I'm actually not sure it's common to do things this way. > Why not just make reserved a > __u32 and get rid of the hole? (Not rhetorical, I have no idea). Because I thought 8 bytes of reserved space would be nice :-) But you have a good point re:alignment, I should make it two __u32's then. > > Btw, people sometimes complain about that we don't check that user input > is zeroed in ioctls. Like for example maybe they're passing random data > in the the strong_count field and then later we decide that actually > that field should mean something but we can't make it mean anything > because we've been letting the user put whatever they want there. These > are just random thoughts in my head, not necessarily important. That's a good point, I will change the code to check for that. Thanks, Martijn > > regards, > dan carpenter >
Re: [PATCH] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
On Wed, Sep 5, 2018 at 11:09 AM, Dan Carpenter wrote: > What's the reserved for? On 64 bit systems there is a 4 byte struct > hole between weak_count and reserved. There's many more pieces of information that we hold for a node. While we don't have a use for most of that now, we may want some of it in the future, and so I thought it would be wise to reserve some space here so we don't need a new ioctl when that happens. I'm actually not sure it's common to do things this way. > Why not just make reserved a > __u32 and get rid of the hole? (Not rhetorical, I have no idea). Because I thought 8 bytes of reserved space would be nice :-) But you have a good point re:alignment, I should make it two __u32's then. > > Btw, people sometimes complain about that we don't check that user input > is zeroed in ioctls. Like for example maybe they're passing random data > in the the strong_count field and then later we decide that actually > that field should mean something but we can't make it mean anything > because we've been letting the user put whatever they want there. These > are just random thoughts in my head, not necessarily important. That's a good point, I will change the code to check for that. Thanks, Martijn > > regards, > dan carpenter >
[PATCH] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
This allows the context manager to retrieve information about nodes that it holds a reference to, such as the current number of references to those nodes. Such information can for example be used to determine whether the servicemanager is the only process holding a reference to a node. This information can then be passed on to the process holding the node, which can in turn decide whether it wants to shut down to reduce resource usage. Signed-off-by: Martijn Coenen --- drivers/android/binder.c| 50 + include/uapi/linux/android/binder.h | 8 + 2 files changed, 58 insertions(+) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index d58763b6b0090..1c7e965241fb8 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -4544,6 +4544,37 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp) return ret; } +static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc, + struct binder_node_info_for_ref *info) +{ + struct binder_node *node; + struct binder_context *context = proc->context; + __u32 handle = info->handle; + + memset(info, 0, sizeof(*info)); + + /* This ioctl may only be used by the context manager */ + mutex_lock(>context_mgr_node_lock); + if (!context->binder_context_mgr_node || + context->binder_context_mgr_node->proc != proc) { + mutex_unlock(>context_mgr_node_lock); + return -EPERM; + } + mutex_unlock(>context_mgr_node_lock); + + node = binder_get_node_from_ref(proc, handle, true, NULL); + if (!node) + return -EINVAL; + + info->strong_count = node->local_strong_refs + + node->internal_strong_refs; + info->weak_count = node->local_weak_refs; + + binder_put_node(node); + + return 0; +} + static int binder_ioctl_get_node_debug_info(struct binder_proc *proc, struct binder_node_debug_info *info) { @@ -4638,6 +4669,25 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } break; } + case BINDER_GET_NODE_INFO_FOR_REF: { + struct binder_node_info_for_ref info; + + if (copy_from_user(, ubuf, sizeof(info))) { + ret = -EFAULT; + goto err; + } + + ret = binder_ioctl_get_node_info_for_ref(proc, ); + if (ret < 0) + goto err; + + if (copy_to_user(ubuf, , sizeof(info))) { + ret = -EFAULT; + goto err; + } + + break; + } case BINDER_GET_NODE_DEBUG_INFO: { struct binder_node_debug_info info; diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h index bfaec6903b8bc..a54a680ff2936 100644 --- a/include/uapi/linux/android/binder.h +++ b/include/uapi/linux/android/binder.h @@ -200,6 +200,13 @@ struct binder_node_debug_info { __u32has_weak_ref; }; +struct binder_node_info_for_ref { + __u32handle; + __u32strong_count; + __u32weak_count; + __u64reserved; +}; + #define BINDER_WRITE_READ _IOWR('b', 1, struct binder_write_read) #define BINDER_SET_IDLE_TIMEOUT_IOW('b', 3, __s64) #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32) @@ -208,6 +215,7 @@ struct binder_node_debug_info { #define BINDER_THREAD_EXIT _IOW('b', 8, __s32) #define BINDER_VERSION _IOWR('b', 9, struct binder_version) #define BINDER_GET_NODE_DEBUG_INFO _IOWR('b', 11, struct binder_node_debug_info) +#define BINDER_GET_NODE_INFO_FOR_REF _IOWR('b', 12, struct binder_node_info_for_ref) /* * NOTE: Two special error codes you should check for when calling -- 2.19.0.rc1.350.ge57e33dbd1-goog
[PATCH] ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
This allows the context manager to retrieve information about nodes that it holds a reference to, such as the current number of references to those nodes. Such information can for example be used to determine whether the servicemanager is the only process holding a reference to a node. This information can then be passed on to the process holding the node, which can in turn decide whether it wants to shut down to reduce resource usage. Signed-off-by: Martijn Coenen --- drivers/android/binder.c| 50 + include/uapi/linux/android/binder.h | 8 + 2 files changed, 58 insertions(+) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index d58763b6b0090..1c7e965241fb8 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -4544,6 +4544,37 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp) return ret; } +static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc, + struct binder_node_info_for_ref *info) +{ + struct binder_node *node; + struct binder_context *context = proc->context; + __u32 handle = info->handle; + + memset(info, 0, sizeof(*info)); + + /* This ioctl may only be used by the context manager */ + mutex_lock(>context_mgr_node_lock); + if (!context->binder_context_mgr_node || + context->binder_context_mgr_node->proc != proc) { + mutex_unlock(>context_mgr_node_lock); + return -EPERM; + } + mutex_unlock(>context_mgr_node_lock); + + node = binder_get_node_from_ref(proc, handle, true, NULL); + if (!node) + return -EINVAL; + + info->strong_count = node->local_strong_refs + + node->internal_strong_refs; + info->weak_count = node->local_weak_refs; + + binder_put_node(node); + + return 0; +} + static int binder_ioctl_get_node_debug_info(struct binder_proc *proc, struct binder_node_debug_info *info) { @@ -4638,6 +4669,25 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } break; } + case BINDER_GET_NODE_INFO_FOR_REF: { + struct binder_node_info_for_ref info; + + if (copy_from_user(, ubuf, sizeof(info))) { + ret = -EFAULT; + goto err; + } + + ret = binder_ioctl_get_node_info_for_ref(proc, ); + if (ret < 0) + goto err; + + if (copy_to_user(ubuf, , sizeof(info))) { + ret = -EFAULT; + goto err; + } + + break; + } case BINDER_GET_NODE_DEBUG_INFO: { struct binder_node_debug_info info; diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h index bfaec6903b8bc..a54a680ff2936 100644 --- a/include/uapi/linux/android/binder.h +++ b/include/uapi/linux/android/binder.h @@ -200,6 +200,13 @@ struct binder_node_debug_info { __u32has_weak_ref; }; +struct binder_node_info_for_ref { + __u32handle; + __u32strong_count; + __u32weak_count; + __u64reserved; +}; + #define BINDER_WRITE_READ _IOWR('b', 1, struct binder_write_read) #define BINDER_SET_IDLE_TIMEOUT_IOW('b', 3, __s64) #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32) @@ -208,6 +215,7 @@ struct binder_node_debug_info { #define BINDER_THREAD_EXIT _IOW('b', 8, __s32) #define BINDER_VERSION _IOWR('b', 9, struct binder_version) #define BINDER_GET_NODE_DEBUG_INFO _IOWR('b', 11, struct binder_node_debug_info) +#define BINDER_GET_NODE_INFO_FOR_REF _IOWR('b', 12, struct binder_node_info_for_ref) /* * NOTE: Two special error codes you should check for when calling -- 2.19.0.rc1.350.ge57e33dbd1-goog
Re: KASAN: null-ptr-deref Write in binder_update_page_range
Thanks Minchan! On Thu, Aug 23, 2018 at 7:29 AM, Minchan Kim wrote: > Signed-off-by: Todd Kjos > Signed-off-by: Minchan Kim Reviewed-by: Martijn Coenen > --- > drivers/android/binder_alloc.c | 43 +++--- > 1 file changed, 35 insertions(+), 8 deletions(-) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index 3f3b7b253445..64fd96eada31 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -332,6 +332,35 @@ static int binder_update_page_range(struct binder_alloc > *alloc, int allocate, > return vma ? -ENOMEM : -ESRCH; > } > > + > +static inline void binder_alloc_set_vma(struct binder_alloc *alloc, > + struct vm_area_struct *vma) > +{ > + if (vma) > + alloc->vma_vm_mm = vma->vm_mm; > + /* > +* If we see alloc->vma is not NULL, buffer data structures set up > +* completely. Look at smp_rmb side binder_alloc_get_vma. > +* We also want to guarantee new alloc->vma_vm_mm is always visible > +* if alloc->vma is set. > +*/ > + smp_wmb(); > + alloc->vma = vma; > +} > + > +static inline struct vm_area_struct *binder_alloc_get_vma( > + struct binder_alloc *alloc) > +{ > + struct vm_area_struct *vma = NULL; > + > + if (alloc->vma) { > + /* Look at description in binder_alloc_set_vma */ > + smp_rmb(); > + vma = alloc->vma; > + } > + return vma; > +} > + > static struct binder_buffer *binder_alloc_new_buf_locked( > struct binder_alloc *alloc, > size_t data_size, > @@ -348,7 +377,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked( > size_t size, data_offsets_size; > int ret; > > - if (alloc->vma == NULL) { > + if (!binder_alloc_get_vma(alloc)) { > binder_alloc_debug(BINDER_DEBUG_USER_ERROR, >"%d: binder_alloc_buf, no vma\n", >alloc->pid); > @@ -723,9 +752,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc, > buffer->free = 1; > binder_insert_free_buffer(alloc, buffer); > alloc->free_async_space = alloc->buffer_size / 2; > - barrier(); > - alloc->vma = vma; > - alloc->vma_vm_mm = vma->vm_mm; > + binder_alloc_set_vma(alloc, vma); > mmgrab(alloc->vma_vm_mm); > > return 0; > @@ -754,10 +781,10 @@ void binder_alloc_deferred_release(struct binder_alloc > *alloc) > int buffers, page_count; > struct binder_buffer *buffer; > > - BUG_ON(alloc->vma); > - > buffers = 0; > mutex_lock(>mutex); > + BUG_ON(alloc->vma); > + > while ((n = rb_first(>allocated_buffers))) { > buffer = rb_entry(n, struct binder_buffer, rb_node); > > @@ -900,7 +927,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc > *alloc) > */ > void binder_alloc_vma_close(struct binder_alloc *alloc) > { > - WRITE_ONCE(alloc->vma, NULL); > + binder_alloc_set_vma(alloc, NULL); > } > > /** > @@ -935,7 +962,7 @@ enum lru_status binder_alloc_free_page(struct list_head > *item, > > index = page - alloc->pages; > page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE; > - vma = alloc->vma; > + vma = binder_alloc_get_vma(alloc); > if (vma) { > if (!mmget_not_zero(alloc->vma_vm_mm)) > goto err_mmget; > -- > 2.18.0.1017.ga543ac7ca45-goog > > > >
Re: KASAN: null-ptr-deref Write in binder_update_page_range
Thanks Minchan! On Thu, Aug 23, 2018 at 7:29 AM, Minchan Kim wrote: > Signed-off-by: Todd Kjos > Signed-off-by: Minchan Kim Reviewed-by: Martijn Coenen > --- > drivers/android/binder_alloc.c | 43 +++--- > 1 file changed, 35 insertions(+), 8 deletions(-) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index 3f3b7b253445..64fd96eada31 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -332,6 +332,35 @@ static int binder_update_page_range(struct binder_alloc > *alloc, int allocate, > return vma ? -ENOMEM : -ESRCH; > } > > + > +static inline void binder_alloc_set_vma(struct binder_alloc *alloc, > + struct vm_area_struct *vma) > +{ > + if (vma) > + alloc->vma_vm_mm = vma->vm_mm; > + /* > +* If we see alloc->vma is not NULL, buffer data structures set up > +* completely. Look at smp_rmb side binder_alloc_get_vma. > +* We also want to guarantee new alloc->vma_vm_mm is always visible > +* if alloc->vma is set. > +*/ > + smp_wmb(); > + alloc->vma = vma; > +} > + > +static inline struct vm_area_struct *binder_alloc_get_vma( > + struct binder_alloc *alloc) > +{ > + struct vm_area_struct *vma = NULL; > + > + if (alloc->vma) { > + /* Look at description in binder_alloc_set_vma */ > + smp_rmb(); > + vma = alloc->vma; > + } > + return vma; > +} > + > static struct binder_buffer *binder_alloc_new_buf_locked( > struct binder_alloc *alloc, > size_t data_size, > @@ -348,7 +377,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked( > size_t size, data_offsets_size; > int ret; > > - if (alloc->vma == NULL) { > + if (!binder_alloc_get_vma(alloc)) { > binder_alloc_debug(BINDER_DEBUG_USER_ERROR, >"%d: binder_alloc_buf, no vma\n", >alloc->pid); > @@ -723,9 +752,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc, > buffer->free = 1; > binder_insert_free_buffer(alloc, buffer); > alloc->free_async_space = alloc->buffer_size / 2; > - barrier(); > - alloc->vma = vma; > - alloc->vma_vm_mm = vma->vm_mm; > + binder_alloc_set_vma(alloc, vma); > mmgrab(alloc->vma_vm_mm); > > return 0; > @@ -754,10 +781,10 @@ void binder_alloc_deferred_release(struct binder_alloc > *alloc) > int buffers, page_count; > struct binder_buffer *buffer; > > - BUG_ON(alloc->vma); > - > buffers = 0; > mutex_lock(>mutex); > + BUG_ON(alloc->vma); > + > while ((n = rb_first(>allocated_buffers))) { > buffer = rb_entry(n, struct binder_buffer, rb_node); > > @@ -900,7 +927,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc > *alloc) > */ > void binder_alloc_vma_close(struct binder_alloc *alloc) > { > - WRITE_ONCE(alloc->vma, NULL); > + binder_alloc_set_vma(alloc, NULL); > } > > /** > @@ -935,7 +962,7 @@ enum lru_status binder_alloc_free_page(struct list_head > *item, > > index = page - alloc->pages; > page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE; > - vma = alloc->vma; > + vma = binder_alloc_get_vma(alloc); > if (vma) { > if (!mmget_not_zero(alloc->vma_vm_mm)) > goto err_mmget; > -- > 2.18.0.1017.ga543ac7ca45-goog > > > >
Re: [PATCH] android: binder: no outgoing transaction when thread todo has transaction
Sherry, this was found by syzkaller, right? In that case, can you add the tag so the issue gets closed automatically when this gets merged? On Tue, Aug 14, 2018 at 2:28 AM, Sherry Yang wrote: > When a process dies, failed reply is sent to the sender of any transaction > queued on a dead thread's todo list. The sender asserts that the > received failed reply corresponds to the head of the transaction stack. > This assert can fail if the dead thread is allowed to send outgoing > transactions when there is already a transaction on its todo list, > because this new transaction can end up on the transaction stack of the > original sender. The following steps illustrate how this assertion can > fail. > > 1. Thread1 sends txn19 to Thread2 >(T1->transaction_stack=txn19, T2->todo+=txn19) > 2. Without processing todo list, Thread2 sends txn20 to Thread1 >(T1->todo+=txn20, T2->transaction_stack=txn20) > 3. T1 processes txn20 on its todo list >(T1->transaction_stack=txn20->txn19, T1->todo=) > 4. T2 dies, T2->todo cleanup attempts to send failed reply for txn19, but >T1->transaction_stack points to txn20 -- assertion failes > > Step 2. is the incorrect behavior. When there is a transaction on a > thread's todo list, this thread should not be able to send any outgoing > synchronous transactions. Only the head of the todo list needs to be > checked because only threads that are waiting for proc work can directly > receive work from another thread, and no work is allowed to be queued > on such a thread without waking up the thread. This patch also enforces > that a thread is not waiting for proc work when a work is directly > enqueued to its todo list. > > Acked-by: Arve Hjønnevåg > Signed-off-by: Sherry Yang Reviewed-by: Martijn Coenen Thanks, Martijn > --- > drivers/android/binder.c | 44 +--- > 1 file changed, 32 insertions(+), 12 deletions(-) > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c > index d58763b6b009..f2081934eb8b 100644 > --- a/drivers/android/binder.c > +++ b/drivers/android/binder.c > @@ -822,6 +822,7 @@ static void > binder_enqueue_deferred_thread_work_ilocked(struct binder_thread *thread, > struct binder_work *work) > { > + WARN_ON(!list_empty(>waiting_thread_node)); > binder_enqueue_work_ilocked(work, >todo); > } > > @@ -839,6 +840,7 @@ static void > binder_enqueue_thread_work_ilocked(struct binder_thread *thread, >struct binder_work *work) > { > + WARN_ON(!list_empty(>waiting_thread_node)); > binder_enqueue_work_ilocked(work, >todo); > thread->process_todo = true; > } > @@ -1270,19 +1272,12 @@ static int binder_inc_node_nilocked(struct > binder_node *node, int strong, > } else > node->local_strong_refs++; > if (!node->has_strong_ref && target_list) { > + struct binder_thread *thread = > container_of(target_list, > + struct binder_thread, > todo); > binder_dequeue_work_ilocked(>work); > - /* > -* Note: this function is the only place where we > queue > -* directly to a thread->todo without using the > -* corresponding binder_enqueue_thread_work() helper > -* functions; in this case it's ok to not set the > -* process_todo flag, since we know this node work > will > -* always be followed by other work that starts queue > -* processing: in case of synchronous transactions, a > -* BR_REPLY or BR_ERROR; in case of oneway > -* transactions, a BR_TRANSACTION_COMPLETE. > -*/ > - binder_enqueue_work_ilocked(>work, target_list); > + BUG_ON(>todo != target_list); > + binder_enqueue_deferred_thread_work_ilocked(thread, > + > >work); > } > } else { > if (!internal) > @@ -2723,6 +2718,7 @@ static void binder_transaction(struct binder_proc *proc, > { > int ret; > struct binder_transaction *t; > + struct binder_work *w; > struct binder_work *tcomplete; > binder_size_t *offp, *off_end, *off_start; > binder_size_t off_min; > @@ -2864,6 +2860,2
Re: [PATCH] android: binder: no outgoing transaction when thread todo has transaction
Sherry, this was found by syzkaller, right? In that case, can you add the tag so the issue gets closed automatically when this gets merged? On Tue, Aug 14, 2018 at 2:28 AM, Sherry Yang wrote: > When a process dies, failed reply is sent to the sender of any transaction > queued on a dead thread's todo list. The sender asserts that the > received failed reply corresponds to the head of the transaction stack. > This assert can fail if the dead thread is allowed to send outgoing > transactions when there is already a transaction on its todo list, > because this new transaction can end up on the transaction stack of the > original sender. The following steps illustrate how this assertion can > fail. > > 1. Thread1 sends txn19 to Thread2 >(T1->transaction_stack=txn19, T2->todo+=txn19) > 2. Without processing todo list, Thread2 sends txn20 to Thread1 >(T1->todo+=txn20, T2->transaction_stack=txn20) > 3. T1 processes txn20 on its todo list >(T1->transaction_stack=txn20->txn19, T1->todo=) > 4. T2 dies, T2->todo cleanup attempts to send failed reply for txn19, but >T1->transaction_stack points to txn20 -- assertion failes > > Step 2. is the incorrect behavior. When there is a transaction on a > thread's todo list, this thread should not be able to send any outgoing > synchronous transactions. Only the head of the todo list needs to be > checked because only threads that are waiting for proc work can directly > receive work from another thread, and no work is allowed to be queued > on such a thread without waking up the thread. This patch also enforces > that a thread is not waiting for proc work when a work is directly > enqueued to its todo list. > > Acked-by: Arve Hjønnevåg > Signed-off-by: Sherry Yang Reviewed-by: Martijn Coenen Thanks, Martijn > --- > drivers/android/binder.c | 44 +--- > 1 file changed, 32 insertions(+), 12 deletions(-) > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c > index d58763b6b009..f2081934eb8b 100644 > --- a/drivers/android/binder.c > +++ b/drivers/android/binder.c > @@ -822,6 +822,7 @@ static void > binder_enqueue_deferred_thread_work_ilocked(struct binder_thread *thread, > struct binder_work *work) > { > + WARN_ON(!list_empty(>waiting_thread_node)); > binder_enqueue_work_ilocked(work, >todo); > } > > @@ -839,6 +840,7 @@ static void > binder_enqueue_thread_work_ilocked(struct binder_thread *thread, >struct binder_work *work) > { > + WARN_ON(!list_empty(>waiting_thread_node)); > binder_enqueue_work_ilocked(work, >todo); > thread->process_todo = true; > } > @@ -1270,19 +1272,12 @@ static int binder_inc_node_nilocked(struct > binder_node *node, int strong, > } else > node->local_strong_refs++; > if (!node->has_strong_ref && target_list) { > + struct binder_thread *thread = > container_of(target_list, > + struct binder_thread, > todo); > binder_dequeue_work_ilocked(>work); > - /* > -* Note: this function is the only place where we > queue > -* directly to a thread->todo without using the > -* corresponding binder_enqueue_thread_work() helper > -* functions; in this case it's ok to not set the > -* process_todo flag, since we know this node work > will > -* always be followed by other work that starts queue > -* processing: in case of synchronous transactions, a > -* BR_REPLY or BR_ERROR; in case of oneway > -* transactions, a BR_TRANSACTION_COMPLETE. > -*/ > - binder_enqueue_work_ilocked(>work, target_list); > + BUG_ON(>todo != target_list); > + binder_enqueue_deferred_thread_work_ilocked(thread, > + > >work); > } > } else { > if (!internal) > @@ -2723,6 +2718,7 @@ static void binder_transaction(struct binder_proc *proc, > { > int ret; > struct binder_transaction *t; > + struct binder_work *w; > struct binder_work *tcomplete; > binder_size_t *offp, *off_end, *off_start; > binder_size_t off_min; > @@ -2864,6 +2860,2
Re: [PATCH 1/4] file: export __alloc_fd()
On Tue, Jul 31, 2018 at 12:07 PM, Christian Brauner wrote: > Ah, this wasn't brought up in the original thread when discussing to > turn this into a module. If using internal functions like this is going > away it makes sense to wait for this work to happen first. Is there a > time-frame for this? Not yet, it depends on the solution and who has cycles to work on it. Martijn > > Thanks! > Christian > >> >> > >> > But I think the binder user-space API relies on this. The userspace API >> > seems to rely on passing fd numbers around ... but I'm having trouble >> > figuring most of this user API out. Perhaps Martijn can help here. >> >> The UAPI does expect a file descriptor, but we may be able to do the >> mapping from fd to struct file (and vice-versa) in the kernel driver, >> so userspace wouldn't notice. >> >> Thanks, >> Martijn
Re: [PATCH 1/4] file: export __alloc_fd()
On Tue, Jul 31, 2018 at 12:07 PM, Christian Brauner wrote: > Ah, this wasn't brought up in the original thread when discussing to > turn this into a module. If using internal functions like this is going > away it makes sense to wait for this work to happen first. Is there a > time-frame for this? Not yet, it depends on the solution and who has cycles to work on it. Martijn > > Thanks! > Christian > >> >> > >> > But I think the binder user-space API relies on this. The userspace API >> > seems to rely on passing fd numbers around ... but I'm having trouble >> > figuring most of this user API out. Perhaps Martijn can help here. >> >> The UAPI does expect a file descriptor, but we may be able to do the >> mapping from fd to struct file (and vice-versa) in the kernel driver, >> so userspace wouldn't notice. >> >> Thanks, >> Martijn
Re: [PATCH 1/4] file: export __alloc_fd()
On Mon, Jul 30, 2018 at 10:36 PM, Matthew Wilcox wrote: > I'm not entirely sure I understand the binder code (... does anyone?) > but from what I can see, it intends to open a file descriptor in the > process which is the target of the message being sent. You're right. > That strikes > me as wrong-headed; it should be allocating a struct file and passing > that file to the other process. When that process receives the message, > *it* allocates a file descriptor for itself.ho We're looking into cleaning this up (historically it was done this way because VIVT caches made this not very efficient), and this is indeed a very good candidate for fixing it. > > But I think the binder user-space API relies on this. The userspace API > seems to rely on passing fd numbers around ... but I'm having trouble > figuring most of this user API out. Perhaps Martijn can help here. The UAPI does expect a file descriptor, but we may be able to do the mapping from fd to struct file (and vice-versa) in the kernel driver, so userspace wouldn't notice. Thanks, Martijn
Re: [PATCH 1/4] file: export __alloc_fd()
On Mon, Jul 30, 2018 at 10:36 PM, Matthew Wilcox wrote: > I'm not entirely sure I understand the binder code (... does anyone?) > but from what I can see, it intends to open a file descriptor in the > process which is the target of the message being sent. You're right. > That strikes > me as wrong-headed; it should be allocating a struct file and passing > that file to the other process. When that process receives the message, > *it* allocates a file descriptor for itself.ho We're looking into cleaning this up (historically it was done this way because VIVT caches made this not very efficient), and this is indeed a very good candidate for fixing it. > > But I think the binder user-space API relies on this. The userspace API > seems to rely on passing fd numbers around ... but I'm having trouble > figuring most of this user API out. Perhaps Martijn can help here. The UAPI does expect a file descriptor, but we may be able to do the mapping from fd to struct file (and vice-versa) in the kernel driver, so userspace wouldn't notice. Thanks, Martijn
Re: [PATCH 2/6] module: add support for symbol namespaces.
On Wed, Jul 25, 2018 at 6:48 PM, Lucas De Marchi wrote: > this might be because you are looking to the wrong project > (module-init-tools) rather than kmod that replaced it some years ago? Oh yes indeed, thanks for pointing that out :-) >> Just scanning the depmod source code, it seems really trivial to just >> have depmod accommodate the new symbol name format. Adding Lucas (kmod >> maintainer) to CC. > > Yep, that would be easy and allow depmod to be backward compatible > since we would do nothing if the symbol doesn't have the suffix. > As for dependency on a new version, this seems trivial enough to be > backported to previous releases used on distros so even if they are > not rolling they would get a compatible depmod. Thanks for the suggestion. I had also considered modifying the script that generates System.map to strip out the namespace; but that seems a bit hacky since it will then mismatch with the actual symbol table that's in the kernel image. The situation with depmod has gotten me a bit worried that perhaps there are more tools that look at System.map or the kernel's symbol table that won't be able to understand the new format. But of course those could be updated, too. Another alternative I was considering is leaving the namespace out of the __ksymtab symbols, and instead for the symbols that have a namespace, have an entry in __ksymtab (without namespace) and an entry in a new section __ksymtab_ns (with namespace). We can then discard __ksymtab_ns when building the final executable/modules, since that information is not needed at runtime (it's encoded in struct kernel_symbol). What makes this a bit more complex is that modpost takes the fully linked version of vmlinux which would already have __ksymtab_ns stripped, so we'd need to use objcopy to copy out that section so modpost can use it to read vmlinux's namespaced symbols. Modules wouldn't have that problem, since modpost processes modules before their final linking step. This last approach is more complex than just modifying depmod, but it does have the benefit that we won't be break userspace tools depending on the kernel __ksymtab section symbols. On the other hand, perhaps userspace tools would like to have the namespace information at some point. For now I'll keep things as is and write up a patch for depmod, but if others prefer the __ksymtab_ns approach I described above I don't mind making that work. Thanks, Martijn > > CC'ing linux-modu...@vger.kernel.org to keep track of this there > > > thanks > Lucas De Marchi > >> >> Thanks, >> >> Jessica >> >> > >> >> >> >> On x86_64 I saw no difference in binary size (compression), but at >> >> runtime this will require a word of memory per export to hold the >> >> namespace. An alternative could be to store namespaced symbols in their >> >> own section and use a separate 'struct namespaced_kernel_symbol' for >> >> that section, at the cost of making the module loader more complex. >> >> >> >> Signed-off-by: Martijn Coenen >> >> --- >> >> include/asm-generic/export.h | 2 +- >> >> include/linux/export.h | 83 +++- >> >> include/linux/module.h | 13 ++ >> >> kernel/module.c | 79 ++ >> >> 4 files changed, 156 insertions(+), 21 deletions(-) >> >> >> >> diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h >> >> index 68efb950a918..4c3d1afb702f 100644 >> >> --- a/include/asm-generic/export.h >> >> +++ b/include/asm-generic/export.h >> >> @@ -29,7 +29,7 @@ >> >> .section ___ksymtab\sec+\name,"a" >> >> .balign KSYM_ALIGN >> >> __ksymtab_\name: >> >> - __put \val, __kstrtab_\name >> >> + __put \val, __kstrtab_\name, 0 >> >> .previous >> >> .section __ksymtab_strings,"a" >> >> __kstrtab_\name: >> >> diff --git a/include/linux/export.h b/include/linux/export.h >> >> index ad6b8e697b27..9f6e70eeb85f 100644 >> >> --- a/include/linux/export.h >> >> +++ b/include/linux/export.h >> >> @@ -22,6 +22,11 @@ struct kernel_symbol >> >> { >> >> unsigned long value; >> >> const char *name; >> >> + const char *namespace; >> >> +}; >> >> + >> >> +struct namespace_import { >> >> + const char *namespace; >> >> }; >> >> >> >> #ifdef MODULE >> >> @@ -54,18 +59,28 @
Re: [PATCH 2/6] module: add support for symbol namespaces.
On Wed, Jul 25, 2018 at 6:48 PM, Lucas De Marchi wrote: > this might be because you are looking to the wrong project > (module-init-tools) rather than kmod that replaced it some years ago? Oh yes indeed, thanks for pointing that out :-) >> Just scanning the depmod source code, it seems really trivial to just >> have depmod accommodate the new symbol name format. Adding Lucas (kmod >> maintainer) to CC. > > Yep, that would be easy and allow depmod to be backward compatible > since we would do nothing if the symbol doesn't have the suffix. > As for dependency on a new version, this seems trivial enough to be > backported to previous releases used on distros so even if they are > not rolling they would get a compatible depmod. Thanks for the suggestion. I had also considered modifying the script that generates System.map to strip out the namespace; but that seems a bit hacky since it will then mismatch with the actual symbol table that's in the kernel image. The situation with depmod has gotten me a bit worried that perhaps there are more tools that look at System.map or the kernel's symbol table that won't be able to understand the new format. But of course those could be updated, too. Another alternative I was considering is leaving the namespace out of the __ksymtab symbols, and instead for the symbols that have a namespace, have an entry in __ksymtab (without namespace) and an entry in a new section __ksymtab_ns (with namespace). We can then discard __ksymtab_ns when building the final executable/modules, since that information is not needed at runtime (it's encoded in struct kernel_symbol). What makes this a bit more complex is that modpost takes the fully linked version of vmlinux which would already have __ksymtab_ns stripped, so we'd need to use objcopy to copy out that section so modpost can use it to read vmlinux's namespaced symbols. Modules wouldn't have that problem, since modpost processes modules before their final linking step. This last approach is more complex than just modifying depmod, but it does have the benefit that we won't be break userspace tools depending on the kernel __ksymtab section symbols. On the other hand, perhaps userspace tools would like to have the namespace information at some point. For now I'll keep things as is and write up a patch for depmod, but if others prefer the __ksymtab_ns approach I described above I don't mind making that work. Thanks, Martijn > > CC'ing linux-modu...@vger.kernel.org to keep track of this there > > > thanks > Lucas De Marchi > >> >> Thanks, >> >> Jessica >> >> > >> >> >> >> On x86_64 I saw no difference in binary size (compression), but at >> >> runtime this will require a word of memory per export to hold the >> >> namespace. An alternative could be to store namespaced symbols in their >> >> own section and use a separate 'struct namespaced_kernel_symbol' for >> >> that section, at the cost of making the module loader more complex. >> >> >> >> Signed-off-by: Martijn Coenen >> >> --- >> >> include/asm-generic/export.h | 2 +- >> >> include/linux/export.h | 83 +++- >> >> include/linux/module.h | 13 ++ >> >> kernel/module.c | 79 ++ >> >> 4 files changed, 156 insertions(+), 21 deletions(-) >> >> >> >> diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h >> >> index 68efb950a918..4c3d1afb702f 100644 >> >> --- a/include/asm-generic/export.h >> >> +++ b/include/asm-generic/export.h >> >> @@ -29,7 +29,7 @@ >> >> .section ___ksymtab\sec+\name,"a" >> >> .balign KSYM_ALIGN >> >> __ksymtab_\name: >> >> - __put \val, __kstrtab_\name >> >> + __put \val, __kstrtab_\name, 0 >> >> .previous >> >> .section __ksymtab_strings,"a" >> >> __kstrtab_\name: >> >> diff --git a/include/linux/export.h b/include/linux/export.h >> >> index ad6b8e697b27..9f6e70eeb85f 100644 >> >> --- a/include/linux/export.h >> >> +++ b/include/linux/export.h >> >> @@ -22,6 +22,11 @@ struct kernel_symbol >> >> { >> >> unsigned long value; >> >> const char *name; >> >> + const char *namespace; >> >> +}; >> >> + >> >> +struct namespace_import { >> >> + const char *namespace; >> >> }; >> >> >> >> #ifdef MODULE >> >> @@ -54,18 +59,28 @
Re: drivers/android/.tmp_gl_binder.o:undefined reference to `__user_bad'
Hi Michal, See below - this error was introduced because the binder driver uses a 64-bit get_user(), even on a 32-bit architecture. Would it be possible to implement a 64-bit get_user() for microblaze? Thanks, Martijn On Mon, Jul 2, 2018 at 6:53 AM kbuild test robot wrote: > > Hi Martijn, > > FYI, the error/warning still remains. > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > master > head: 021c91791a5e7e85c567452f1be3e4c2c6cb6063 > commit: 1190b4e38f97023154e6b3bef61b251aa5f970d0 ANDROID: binder: remove > 32-bit binder interface. > date: 7 weeks ago > config: microblaze-allmodconfig (attached as .config) > compiler: microblaze-linux-gcc (GCC) 8.1.0 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > git checkout 1190b4e38f97023154e6b3bef61b251aa5f970d0 > # save the attached .config to linux build tree > GCC_VERSION=8.1.0 make.cross ARCH=microblaze > > All errors (new ones prefixed by >>): > >drivers/android/binder.o: In function `binder_thread_write': > >> drivers/android/.tmp_gl_binder.o:(.text+0xcbb0): undefined reference to > >> `__user_bad' >drivers/android/.tmp_gl_binder.o:(.text+0xcbdc): undefined reference to > `__user_bad' >drivers/android/.tmp_gl_binder.o:(.text+0xcfc4): undefined reference to > `__user_bad' >drivers/android/.tmp_gl_binder.o:(.text+0xd650): undefined reference to > `__user_bad' >drivers/android/.tmp_gl_binder.o:(.text+0xdbc8): undefined reference to > `__user_bad' > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: drivers/android/.tmp_gl_binder.o:undefined reference to `__user_bad'
Hi Michal, See below - this error was introduced because the binder driver uses a 64-bit get_user(), even on a 32-bit architecture. Would it be possible to implement a 64-bit get_user() for microblaze? Thanks, Martijn On Mon, Jul 2, 2018 at 6:53 AM kbuild test robot wrote: > > Hi Martijn, > > FYI, the error/warning still remains. > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > master > head: 021c91791a5e7e85c567452f1be3e4c2c6cb6063 > commit: 1190b4e38f97023154e6b3bef61b251aa5f970d0 ANDROID: binder: remove > 32-bit binder interface. > date: 7 weeks ago > config: microblaze-allmodconfig (attached as .config) > compiler: microblaze-linux-gcc (GCC) 8.1.0 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > git checkout 1190b4e38f97023154e6b3bef61b251aa5f970d0 > # save the attached .config to linux build tree > GCC_VERSION=8.1.0 make.cross ARCH=microblaze > > All errors (new ones prefixed by >>): > >drivers/android/binder.o: In function `binder_thread_write': > >> drivers/android/.tmp_gl_binder.o:(.text+0xcbb0): undefined reference to > >> `__user_bad' >drivers/android/.tmp_gl_binder.o:(.text+0xcbdc): undefined reference to > `__user_bad' >drivers/android/.tmp_gl_binder.o:(.text+0xcfc4): undefined reference to > `__user_bad' >drivers/android/.tmp_gl_binder.o:(.text+0xd650): undefined reference to > `__user_bad' >drivers/android/.tmp_gl_binder.o:(.text+0xdbc8): undefined reference to > `__user_bad' > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH 2/2 v2] staging: android: ashmem: Fix mmap size validation
On Thu, Jun 21, 2018 at 1:29 AM, Joel Fernandes wrote: > Also if you look at the kernel sources, there are dozens of drivers that > check for correct VMA size in mmap handler and fail if it isn't sized > correctly. If that's the case, we should definitely do it this way for ashmem as well. Since its size is fixed after creation, preventing anyone from mapping a larger size seems reasonable to me. Reviewed-by: Martijn Coenen > > thanks! > > - Joel >
Re: [PATCH 2/2 v2] staging: android: ashmem: Fix mmap size validation
On Thu, Jun 21, 2018 at 1:29 AM, Joel Fernandes wrote: > Also if you look at the kernel sources, there are dozens of drivers that > check for correct VMA size in mmap handler and fail if it isn't sized > correctly. If that's the case, we should definitely do it this way for ashmem as well. Since its size is fixed after creation, preventing anyone from mapping a larger size seems reasonable to me. Reviewed-by: Martijn Coenen > > thanks! > > - Joel >
Re: [PATCH v6] ANDROID: binder: change down_write to down_read
On Tue, May 15, 2018 at 8:27 AM, Minchan Kim <minchan.ker...@gmail.com> wrote: > Hi Joel, > > Sorry for the late response. I was off. > > On Wed, May 09, 2018 at 04:19:41PM -0700, Joel Fernandes wrote: >> On Wed, May 09, 2018 at 03:40:23PM +0900, Minchan Kim wrote: >> > On Tue, May 08, 2018 at 04:08:13PM -0700, Joel Fernandes wrote: >> > > On Tue, May 08, 2018 at 07:51:01PM +0900, Minchan Kim wrote: >> > > > On Mon, May 07, 2018 at 10:28:29AM -0700, Joel Fernandes wrote: >> > > > > On Mon, May 07, 2018 at 11:15:37PM +0900, Minchan Kim wrote: >> > > > > > binder_update_page_range needs down_write of mmap_sem because >> > > > > > vm_insert_page need to change vma->vm_flags to VM_MIXEDMAP unless >> > > > > > it is set. However, when I profile binder working, it seems >> > > > > > every binder buffers should be mapped in advance by binder_mmap. >> > > > > > It means we could set VM_MIXEDMAP in binder_mmap time which is >> > > > > > already hold a mmap_sem as down_write so binder_update_page_range >> > > > > > doesn't need to hold a mmap_sem as down_write. >> > > > > > Please use proper API down_read. It would help mmap_sem contention >> > > > > > problem as well as fixing down_write abuse. >> > > > > > >> > > > > > Ganesh Mahendran tested app launching and binder throughput test >> > > > > > and he said he couldn't find any problem and I did binder latency >> > > > > > test per Greg KH request(Thanks Martijn to teach me how I can do) >> > > > > > I cannot find any problem, too. >> > > > > > >> > > > > > Cc: Ganesh Mahendran <opensource.gan...@gmail.com> >> > > > > > Cc: Joe Perches <j...@perches.com> >> > > > > > Cc: Arve Hjønnevåg <a...@android.com> >> > > > > > Cc: Todd Kjos <tk...@google.com> >> > > > > > Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org> >> > > > > > Reviewed-by: Martijn Coenen <m...@android.com> >> > > > > > Signed-off-by: Minchan Kim <minc...@kernel.org> >> > > > > > --- >> > > > > > drivers/android/binder.c | 4 +++- >> > > > > > drivers/android/binder_alloc.c | 6 +++--- >> > > > > > 2 files changed, 6 insertions(+), 4 deletions(-) >> > > > > > >> > > > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c >> > > > > > index 4eab5be3d00f..7b8e96f60719 100644 >> > > > > > --- a/drivers/android/binder.c >> > > > > > +++ b/drivers/android/binder.c >> > > > > > @@ -4730,7 +4730,9 @@ static int binder_mmap(struct file *filp, >> > > > > > struct vm_area_struct *vma) >> > > > > > failure_string = "bad vm_flags"; >> > > > > > goto err_bad_arg; >> > > > > > } >> > > > > > - vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE; >> > > > > > + vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP; >> > > > > > + vma->vm_flags &= ~VM_MAYWRITE; >> > > > > > + >> > > > > > vma->vm_ops = _vm_ops; >> > > > > > vma->vm_private_data = proc; >> > > > > > >> > > > > > diff --git a/drivers/android/binder_alloc.c >> > > > > > b/drivers/android/binder_alloc.c >> > > > > > index 5a426c877dfb..4f382d51def1 100644 >> > > > > > --- a/drivers/android/binder_alloc.c >> > > > > > +++ b/drivers/android/binder_alloc.c >> > > > > > @@ -219,7 +219,7 @@ static int binder_update_page_range(struct >> > > > > > binder_alloc *alloc, int allocate, >> > > > > > mm = alloc->vma_vm_mm; >> > > > > > >> > > > > > if (mm) { >> > > > > > - down_write(>mmap_sem); >> > > > > > + down_read(>mmap_sem); >> > > > > >> > > > > >> > > > > Nice. Is there a need to hold the reader-lock at all here? Just >> > > > > curious what >> > > > > else is it pr
Re: [PATCH v6] ANDROID: binder: change down_write to down_read
On Tue, May 15, 2018 at 8:27 AM, Minchan Kim wrote: > Hi Joel, > > Sorry for the late response. I was off. > > On Wed, May 09, 2018 at 04:19:41PM -0700, Joel Fernandes wrote: >> On Wed, May 09, 2018 at 03:40:23PM +0900, Minchan Kim wrote: >> > On Tue, May 08, 2018 at 04:08:13PM -0700, Joel Fernandes wrote: >> > > On Tue, May 08, 2018 at 07:51:01PM +0900, Minchan Kim wrote: >> > > > On Mon, May 07, 2018 at 10:28:29AM -0700, Joel Fernandes wrote: >> > > > > On Mon, May 07, 2018 at 11:15:37PM +0900, Minchan Kim wrote: >> > > > > > binder_update_page_range needs down_write of mmap_sem because >> > > > > > vm_insert_page need to change vma->vm_flags to VM_MIXEDMAP unless >> > > > > > it is set. However, when I profile binder working, it seems >> > > > > > every binder buffers should be mapped in advance by binder_mmap. >> > > > > > It means we could set VM_MIXEDMAP in binder_mmap time which is >> > > > > > already hold a mmap_sem as down_write so binder_update_page_range >> > > > > > doesn't need to hold a mmap_sem as down_write. >> > > > > > Please use proper API down_read. It would help mmap_sem contention >> > > > > > problem as well as fixing down_write abuse. >> > > > > > >> > > > > > Ganesh Mahendran tested app launching and binder throughput test >> > > > > > and he said he couldn't find any problem and I did binder latency >> > > > > > test per Greg KH request(Thanks Martijn to teach me how I can do) >> > > > > > I cannot find any problem, too. >> > > > > > >> > > > > > Cc: Ganesh Mahendran >> > > > > > Cc: Joe Perches >> > > > > > Cc: Arve Hjønnevåg >> > > > > > Cc: Todd Kjos >> > > > > > Cc: Greg Kroah-Hartman >> > > > > > Reviewed-by: Martijn Coenen >> > > > > > Signed-off-by: Minchan Kim >> > > > > > --- >> > > > > > drivers/android/binder.c | 4 +++- >> > > > > > drivers/android/binder_alloc.c | 6 +++--- >> > > > > > 2 files changed, 6 insertions(+), 4 deletions(-) >> > > > > > >> > > > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c >> > > > > > index 4eab5be3d00f..7b8e96f60719 100644 >> > > > > > --- a/drivers/android/binder.c >> > > > > > +++ b/drivers/android/binder.c >> > > > > > @@ -4730,7 +4730,9 @@ static int binder_mmap(struct file *filp, >> > > > > > struct vm_area_struct *vma) >> > > > > > failure_string = "bad vm_flags"; >> > > > > > goto err_bad_arg; >> > > > > > } >> > > > > > - vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE; >> > > > > > + vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP; >> > > > > > + vma->vm_flags &= ~VM_MAYWRITE; >> > > > > > + >> > > > > > vma->vm_ops = _vm_ops; >> > > > > > vma->vm_private_data = proc; >> > > > > > >> > > > > > diff --git a/drivers/android/binder_alloc.c >> > > > > > b/drivers/android/binder_alloc.c >> > > > > > index 5a426c877dfb..4f382d51def1 100644 >> > > > > > --- a/drivers/android/binder_alloc.c >> > > > > > +++ b/drivers/android/binder_alloc.c >> > > > > > @@ -219,7 +219,7 @@ static int binder_update_page_range(struct >> > > > > > binder_alloc *alloc, int allocate, >> > > > > > mm = alloc->vma_vm_mm; >> > > > > > >> > > > > > if (mm) { >> > > > > > - down_write(>mmap_sem); >> > > > > > + down_read(>mmap_sem); >> > > > > >> > > > > >> > > > > Nice. Is there a need to hold the reader-lock at all here? Just >> > > > > curious what >> > > > > else is it protecting (here or in vm_insert_page). >> > > > >> > > > It should protect vm_area_struct. IOW, when we try insert page into >> > > > virtual address area, >> > >
Re: [PATCH] m68k/uaccess: Revive 64-bit get_user()
On Mon, May 14, 2018 at 3:57 PM, Geert Uytterhoeven <ge...@linux-m68k.org> wrote: > Revive support for 64-bit get_user(), which was disabled in commit > d94af931af42152e ("[PATCH] m68k: clean up uaccess.h") due to a "broken" > typeof in (then brand new) gcc-4.1. > > - Keep on using u64 for the temporary, as __typeof__() doesn't drop > the const qualifier, > - Move it into a union (like mips32 does) to get rid of the cast, as > using get_user() to fetch a __user pointer would cause a "cast to > pointer from integer of different size" warning otherwise. Can't vouch for the existing asm, but the changes to it look good to me otherwise. > > Signed-off-by: Geert Uytterhoeven <ge...@linux-m68k.org> Acked-by: Martijn Coenen <m...@android.com> > --- > arch/m68k/include/asm/uaccess_mm.h | 16 +--- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/arch/m68k/include/asm/uaccess_mm.h > b/arch/m68k/include/asm/uaccess_mm.h > index 75c172e909acdf18..c4cb889660aa0c35 100644 > --- a/arch/m68k/include/asm/uaccess_mm.h > +++ b/arch/m68k/include/asm/uaccess_mm.h > @@ -141,10 +141,12 @@ asm volatile ("\n" > \ > case 4: \ > __get_user_asm(__gu_err, x, ptr, u32, l, r, -EFAULT); \ > break; \ > -/* case 8: disabled because gcc-4.1 has a broken typeof\ > - { \ > - const void *__gu_ptr = (ptr); \ > - u64 __gu_val; \ > + case 8: { \ > + const void *__gu_ptr = (ptr); \ > + union { \ > + u64 l; \ > + __typeof__(*(ptr)) t; \ > + } __gu_val; \ > asm volatile ("\n" \ > "1: "MOVES".l (%2)+,%1\n" \ > "2: "MOVES".l (%2),%R1\n" \ > @@ -162,13 +164,13 @@ asm volatile ("\n" > \ > " .long 1b,10b\n" \ > " .long 2b,10b\n" \ > " .previous" \ > - : "+d" (__gu_err), "=" (__gu_val),\ > + : "+d" (__gu_err), "=" (__gu_val.l), \ > "+a" (__gu_ptr) \ > : "i" (-EFAULT) \ > : "memory");\ > - (x) = (__force typeof(*(ptr)))__gu_val; \ > + (x) = __gu_val.t; \ > break; \ > - } */ \ > + } \ > default:\ > __gu_err = __get_user_bad();\ > break; \ > -- > 2.7.4 >
Re: [PATCH] m68k/uaccess: Revive 64-bit get_user()
On Mon, May 14, 2018 at 3:57 PM, Geert Uytterhoeven wrote: > Revive support for 64-bit get_user(), which was disabled in commit > d94af931af42152e ("[PATCH] m68k: clean up uaccess.h") due to a "broken" > typeof in (then brand new) gcc-4.1. > > - Keep on using u64 for the temporary, as __typeof__() doesn't drop > the const qualifier, > - Move it into a union (like mips32 does) to get rid of the cast, as > using get_user() to fetch a __user pointer would cause a "cast to > pointer from integer of different size" warning otherwise. Can't vouch for the existing asm, but the changes to it look good to me otherwise. > > Signed-off-by: Geert Uytterhoeven Acked-by: Martijn Coenen > --- > arch/m68k/include/asm/uaccess_mm.h | 16 +--- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/arch/m68k/include/asm/uaccess_mm.h > b/arch/m68k/include/asm/uaccess_mm.h > index 75c172e909acdf18..c4cb889660aa0c35 100644 > --- a/arch/m68k/include/asm/uaccess_mm.h > +++ b/arch/m68k/include/asm/uaccess_mm.h > @@ -141,10 +141,12 @@ asm volatile ("\n" > \ > case 4: \ > __get_user_asm(__gu_err, x, ptr, u32, l, r, -EFAULT); \ > break; \ > -/* case 8: disabled because gcc-4.1 has a broken typeof\ > - { \ > - const void *__gu_ptr = (ptr); \ > - u64 __gu_val; \ > + case 8: { \ > + const void *__gu_ptr = (ptr); \ > + union { \ > + u64 l; \ > + __typeof__(*(ptr)) t; \ > + } __gu_val; \ > asm volatile ("\n" \ > "1: "MOVES".l (%2)+,%1\n" \ > "2: "MOVES".l (%2),%R1\n" \ > @@ -162,13 +164,13 @@ asm volatile ("\n" > \ > " .long 1b,10b\n" \ > " .long 2b,10b\n" \ > " .previous" \ > - : "+d" (__gu_err), "=" (__gu_val),\ > + : "+d" (__gu_err), "=" (__gu_val.l), \ > "+a" (__gu_ptr) \ > : "i" (-EFAULT) \ > : "memory");\ > - (x) = (__force typeof(*(ptr)))__gu_val; \ > + (x) = __gu_val.t; \ > break; \ > - } */ \ > + } \ > default:\ > __gu_err = __get_user_bad();\ > break; \ > -- > 2.7.4 >
Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.
On Mon, May 14, 2018 at 4:00 PM, Geert Uytterhoevenwrote: > Patch sent. Thanks for the quick turn-around! > > BTW, sh also doesn't seem to have 64-bit get_user(). > There may be others. I checked quickly, nios2 is the only other arch that explicitly doesn't support it and would result in a build error; some other archs don't define __get_user, but in that case they just fall back to raw_copy_from_user(). > > BTW2, does the Android Binder need to care about endianness when talking > to userspace? No, I don't think it should. Thanks, Martijn > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- > ge...@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like > that. > -- Linus Torvalds
Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.
On Mon, May 14, 2018 at 4:00 PM, Geert Uytterhoeven wrote: > Patch sent. Thanks for the quick turn-around! > > BTW, sh also doesn't seem to have 64-bit get_user(). > There may be others. I checked quickly, nios2 is the only other arch that explicitly doesn't support it and would result in a build error; some other archs don't define __get_user, but in that case they just fall back to raw_copy_from_user(). > > BTW2, does the Android Binder need to care about endianness when talking > to userspace? No, I don't think it should. Thanks, Martijn > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- > ge...@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like > that. > -- Linus Torvalds
[PATCH v2] ANDROID: binder: remove 32-bit binder interface.
From: Martijn Coenen <m...@google.com> New devices launching with Android P need to use the 64-bit binder interface, even on 32-bit SoCs [0]. This change removes the Kconfig option to select the 32-bit binder interface. We don't think this will affect existing userspace for the following reasons: 1) The latest Android common tree is 4.14, so we don't believe any Android devices are on kernels >4.14. 2) Android devices launch on an LTS release and stick with it, so we wouldn't expect devices running on <= 4.14 now to upgrade to 4.17 or later. But even if they did, they'd rebuild the world (kernel + userspace) anyway. 3) Other userspaces like 'anbox' are already using the 64-bit interface. Note that this change doesn't remove the 32-bit UAPI itself; the reason for that is that Android userspace always uses the latest UAPI headers from upstream, and userspace retains 32-bit support for devices that are upgrading. This will be removed as well in 2-3 years, at which point we can remove the code from the UAPI as well. Finally, this change introduces build errors on archs where 64-bit get_user/put_user is not supported, so make binder unavailable on m68k (which wouldn't want it anyway). [0]: https://android-review.googlesource.com/c/platform/build/+/595193 Signed-off-by: Martijn Coenen <m...@android.com> --- Changes in v2: - Dropped support for m68k to avoid build errors. drivers/android/Kconfig | 15 +-- drivers/android/binder.c | 4 2 files changed, 1 insertion(+), 18 deletions(-) diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig index 7dce3795b887..ee4880bfdcdc 100644 --- a/drivers/android/Kconfig +++ b/drivers/android/Kconfig @@ -10,7 +10,7 @@ if ANDROID config ANDROID_BINDER_IPC bool "Android Binder IPC Driver" - depends on MMU + depends on MMU && !M68K default n ---help--- Binder is used in Android for both communication between processes, @@ -32,19 +32,6 @@ config ANDROID_BINDER_DEVICES created. Each binder device has its own context manager, and is therefore logically separated from the other devices. -config ANDROID_BINDER_IPC_32BIT - bool "Use old (Android 4.4 and earlier) 32-bit binder API" - depends on !64BIT && ANDROID_BINDER_IPC - default y - ---help--- - The Binder API has been changed to support both 32 and 64bit - applications in a mixed environment. - - Enable this to support an old 32-bit Android user-space (v4.4 and - earlier). - - Note that enabling this will break newer Android user-space. - config ANDROID_BINDER_IPC_SELFTEST bool "Android Binder IPC Driver Selftest" depends on ANDROID_BINDER_IPC diff --git a/drivers/android/binder.c b/drivers/android/binder.c index e578eee31589..2ee9fb02dfb8 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -72,10 +72,6 @@ #include #include -#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT -#define BINDER_IPC_32BIT 1 -#endif - #include #include "binder_alloc.h" #include "binder_trace.h" -- 2.17.0.441.gb46fe60e1d-goog
[PATCH v2] ANDROID: binder: remove 32-bit binder interface.
From: Martijn Coenen New devices launching with Android P need to use the 64-bit binder interface, even on 32-bit SoCs [0]. This change removes the Kconfig option to select the 32-bit binder interface. We don't think this will affect existing userspace for the following reasons: 1) The latest Android common tree is 4.14, so we don't believe any Android devices are on kernels >4.14. 2) Android devices launch on an LTS release and stick with it, so we wouldn't expect devices running on <= 4.14 now to upgrade to 4.17 or later. But even if they did, they'd rebuild the world (kernel + userspace) anyway. 3) Other userspaces like 'anbox' are already using the 64-bit interface. Note that this change doesn't remove the 32-bit UAPI itself; the reason for that is that Android userspace always uses the latest UAPI headers from upstream, and userspace retains 32-bit support for devices that are upgrading. This will be removed as well in 2-3 years, at which point we can remove the code from the UAPI as well. Finally, this change introduces build errors on archs where 64-bit get_user/put_user is not supported, so make binder unavailable on m68k (which wouldn't want it anyway). [0]: https://android-review.googlesource.com/c/platform/build/+/595193 Signed-off-by: Martijn Coenen --- Changes in v2: - Dropped support for m68k to avoid build errors. drivers/android/Kconfig | 15 +-- drivers/android/binder.c | 4 2 files changed, 1 insertion(+), 18 deletions(-) diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig index 7dce3795b887..ee4880bfdcdc 100644 --- a/drivers/android/Kconfig +++ b/drivers/android/Kconfig @@ -10,7 +10,7 @@ if ANDROID config ANDROID_BINDER_IPC bool "Android Binder IPC Driver" - depends on MMU + depends on MMU && !M68K default n ---help--- Binder is used in Android for both communication between processes, @@ -32,19 +32,6 @@ config ANDROID_BINDER_DEVICES created. Each binder device has its own context manager, and is therefore logically separated from the other devices. -config ANDROID_BINDER_IPC_32BIT - bool "Use old (Android 4.4 and earlier) 32-bit binder API" - depends on !64BIT && ANDROID_BINDER_IPC - default y - ---help--- - The Binder API has been changed to support both 32 and 64bit - applications in a mixed environment. - - Enable this to support an old 32-bit Android user-space (v4.4 and - earlier). - - Note that enabling this will break newer Android user-space. - config ANDROID_BINDER_IPC_SELFTEST bool "Android Binder IPC Driver Selftest" depends on ANDROID_BINDER_IPC diff --git a/drivers/android/binder.c b/drivers/android/binder.c index e578eee31589..2ee9fb02dfb8 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -72,10 +72,6 @@ #include #include -#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT -#define BINDER_IPC_32BIT 1 -#endif - #include #include "binder_alloc.h" #include "binder_trace.h" -- 2.17.0.441.gb46fe60e1d-goog
Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.
On Fri, May 11, 2018 at 10:08 AM, Greg KHwrote: > I think using !CONFIG_M68K is a good start. We can blacklist any other > arch that doesn't support this, and that list should be small as I doubt > any new ones will be added without this support. Thanks, I will send a v2. > > thanks, > > greg k-h
Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.
On Fri, May 11, 2018 at 10:08 AM, Greg KH wrote: > I think using !CONFIG_M68K is a good start. We can blacklist any other > arch that doesn't support this, and that list should be small as I doubt > any new ones will be added without this support. Thanks, I will send a v2. > > thanks, > > greg k-h
Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.
On Sat, May 5, 2018 at 2:10 PM, kbuild test robotwrote: >drivers/android/binder.o: In function `binder_thread_write': >>> binder.c:(.text+0x6a16): undefined reference to `__get_user_bad' Looks like m68k doesn't support 64-bit get_user(). I could just have binder depend on !CONFIG_M68K, but there may be other architectures still that don't support this. Another alternative would be to whitelist the architectures Android supports - eg arm, arm64, x86, x86_64. But I'm not sure if arch-limited drivers are considered bad form. Does anybody have suggestions for how to deal with this? Thanks, Martijn >binder.c:(.text+0x6c9a): undefined reference to `__get_user_bad' >binder.c:(.text+0x701e): undefined reference to `__get_user_bad' >binder.c:(.text+0x7414): undefined reference to `__get_user_bad' > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH] ANDROID: binder: remove 32-bit binder interface.
On Sat, May 5, 2018 at 2:10 PM, kbuild test robot wrote: >drivers/android/binder.o: In function `binder_thread_write': >>> binder.c:(.text+0x6a16): undefined reference to `__get_user_bad' Looks like m68k doesn't support 64-bit get_user(). I could just have binder depend on !CONFIG_M68K, but there may be other architectures still that don't support this. Another alternative would be to whitelist the architectures Android supports - eg arm, arm64, x86, x86_64. But I'm not sure if arch-limited drivers are considered bad form. Does anybody have suggestions for how to deal with this? Thanks, Martijn >binder.c:(.text+0x6c9a): undefined reference to `__get_user_bad' >binder.c:(.text+0x701e): undefined reference to `__get_user_bad' >binder.c:(.text+0x7414): undefined reference to `__get_user_bad' > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH] android: binder: Fix a possible data race in binder_alloc_mmap_handler
On Tue, May 8, 2018 at 2:06 AM, Jia-Ju Baiwrote: > The write operations to "alloc->buffer" are protected by > the lock on line 679 and 730, but the read operation to > this data on line 712 is not protected by the lock. > Thus, there may exist a data race for "alloc->buffer". It's read by the same thread that just wrote it, there is no data race. The locks at line 679 and 730 protect against multiple threads calling mmap() at the same time. > > To fix this data race, the read operation to "alloc->buffer" > should be also protected by the lock. > > Signed-off-by: Jia-Ju Bai > --- > drivers/android/binder_alloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index 5a426c877dfb..596acc3a84e4 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -709,7 +709,9 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc, > goto err_alloc_buf_struct_failed; > } > > + mutex_lock(_alloc_mmap_lock); > buffer->data = alloc->buffer; > + mutex_unlock(_alloc_mmap_lock); > list_add(>entry, >buffers); > buffer->free = 1; > binder_insert_free_buffer(alloc, buffer); > -- > 2.17.0 >
Re: [PATCH] android: binder: Fix a possible data race in binder_alloc_mmap_handler
On Tue, May 8, 2018 at 2:06 AM, Jia-Ju Bai wrote: > The write operations to "alloc->buffer" are protected by > the lock on line 679 and 730, but the read operation to > this data on line 712 is not protected by the lock. > Thus, there may exist a data race for "alloc->buffer". It's read by the same thread that just wrote it, there is no data race. The locks at line 679 and 730 protect against multiple threads calling mmap() at the same time. > > To fix this data race, the read operation to "alloc->buffer" > should be also protected by the lock. > > Signed-off-by: Jia-Ju Bai > --- > drivers/android/binder_alloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index 5a426c877dfb..596acc3a84e4 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -709,7 +709,9 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc, > goto err_alloc_buf_struct_failed; > } > > + mutex_lock(_alloc_mmap_lock); > buffer->data = alloc->buffer; > + mutex_unlock(_alloc_mmap_lock); > list_add(>entry, >buffers); > buffer->free = 1; > binder_insert_free_buffer(alloc, buffer); > -- > 2.17.0 >
[PATCH] ANDROID: binder: remove 32-bit binder interface.
New devices launching with Android P need to use the 64-bit binder interface, even on 32-bit SoCs [0]. This change removes the Kconfig option to select the 32-bit binder interface. We don't think this will affect existing userspace for the following reasons: 1) The latest Android common tree is 4.14, so we don't believe any Android devices are on kernels >4.14. 2) Android devices launch on an LTS release and stick with it, so we wouldn't expect devices running on <= 4.14 now to upgrade to 4.17 or later. But even if they did, they'd rebuild the world (kernel + userspace) anyway. 3) Other userspaces like 'anbox' are already using the 64-bit interface. Note that this change doesn't remove the 32-bit UAPI itself; the reason for that is that Android userspace always uses the latest UAPI headers from upstream, and userspace retains 32-bit support for devices that are upgrading. This will be removed as well in 2-3 years, at which point we can remove the code from the UAPI as well. [0]: https://android-review.googlesource.com/c/platform/build/+/595193 Signed-off-by: Martijn Coenen <m...@android.com> --- drivers/android/Kconfig | 13 - drivers/android/binder.c | 4 2 files changed, 17 deletions(-) diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig index 7dce3795b887..432e9ad77070 100644 --- a/drivers/android/Kconfig +++ b/drivers/android/Kconfig @@ -32,19 +32,6 @@ config ANDROID_BINDER_DEVICES created. Each binder device has its own context manager, and is therefore logically separated from the other devices. -config ANDROID_BINDER_IPC_32BIT - bool "Use old (Android 4.4 and earlier) 32-bit binder API" - depends on !64BIT && ANDROID_BINDER_IPC - default y - ---help--- - The Binder API has been changed to support both 32 and 64bit - applications in a mixed environment. - - Enable this to support an old 32-bit Android user-space (v4.4 and - earlier). - - Note that enabling this will break newer Android user-space. - config ANDROID_BINDER_IPC_SELFTEST bool "Android Binder IPC Driver Selftest" depends on ANDROID_BINDER_IPC diff --git a/drivers/android/binder.c b/drivers/android/binder.c index e578eee31589..2ee9fb02dfb8 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -72,10 +72,6 @@ #include #include -#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT -#define BINDER_IPC_32BIT 1 -#endif - #include #include "binder_alloc.h" #include "binder_trace.h" -- 2.17.0.441.gb46fe60e1d-goog
[PATCH] ANDROID: binder: remove 32-bit binder interface.
New devices launching with Android P need to use the 64-bit binder interface, even on 32-bit SoCs [0]. This change removes the Kconfig option to select the 32-bit binder interface. We don't think this will affect existing userspace for the following reasons: 1) The latest Android common tree is 4.14, so we don't believe any Android devices are on kernels >4.14. 2) Android devices launch on an LTS release and stick with it, so we wouldn't expect devices running on <= 4.14 now to upgrade to 4.17 or later. But even if they did, they'd rebuild the world (kernel + userspace) anyway. 3) Other userspaces like 'anbox' are already using the 64-bit interface. Note that this change doesn't remove the 32-bit UAPI itself; the reason for that is that Android userspace always uses the latest UAPI headers from upstream, and userspace retains 32-bit support for devices that are upgrading. This will be removed as well in 2-3 years, at which point we can remove the code from the UAPI as well. [0]: https://android-review.googlesource.com/c/platform/build/+/595193 Signed-off-by: Martijn Coenen --- drivers/android/Kconfig | 13 - drivers/android/binder.c | 4 2 files changed, 17 deletions(-) diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig index 7dce3795b887..432e9ad77070 100644 --- a/drivers/android/Kconfig +++ b/drivers/android/Kconfig @@ -32,19 +32,6 @@ config ANDROID_BINDER_DEVICES created. Each binder device has its own context manager, and is therefore logically separated from the other devices. -config ANDROID_BINDER_IPC_32BIT - bool "Use old (Android 4.4 and earlier) 32-bit binder API" - depends on !64BIT && ANDROID_BINDER_IPC - default y - ---help--- - The Binder API has been changed to support both 32 and 64bit - applications in a mixed environment. - - Enable this to support an old 32-bit Android user-space (v4.4 and - earlier). - - Note that enabling this will break newer Android user-space. - config ANDROID_BINDER_IPC_SELFTEST bool "Android Binder IPC Driver Selftest" depends on ANDROID_BINDER_IPC diff --git a/drivers/android/binder.c b/drivers/android/binder.c index e578eee31589..2ee9fb02dfb8 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -72,10 +72,6 @@ #include #include -#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT -#define BINDER_IPC_32BIT 1 -#endif - #include #include "binder_alloc.h" #include "binder_trace.h" -- 2.17.0.441.gb46fe60e1d-goog
Re: [PATCH v3 2/5] efi: Add embedded peripheral firmware support
On Wed, Apr 25, 2018 at 10:55 AM, Luis R. Rodriguezwrote: > Android became the primary user of CONFIG_FW_LOADER_USER_HELPER_FALLBACK. > > It would be good for us to hear from Android folks if their current use of > request_firmware_into_buf() is designed in practice to *never* use the direct > filesystem firmware loading interface, and always rely instead on the > fallback mechanism. It's hard to answer this question for Android in general. As far as I can tell the reasons we use CONFIG_FW_LOADER_USER_HELPER(_FALLBACK) are: 1) We have multiple different paths on our devices where firmware can be located, and the direct loader only supports one custom path 2) Most of those paths are not mounted by the time the corresponding drivers are loaded, because pretty much all Android kernels today are built without module support, and therefore drivers are loaded well before the firmware partition is mounted 3) I think we use _FALLBACK because doing this with uevents is just the easiest thing to do; our init code has a firmware helper that deals with this and searches the paths that we care about 2) will change at some point, because Android is moving towards a model where device-specific peripheral drivers will be loaded as modules, and since those modules would likely come from the same partition as the firmware, it's possible that the direct load would succeed (depending on whether the custom path is configured there or not). But I don't think we can rely on the direct loader even in those cases, unless we could configure it with multiple custom paths. I have no reason to believe request_firmware_into_buf() is special in this regard; drivers that depend on it may have their corresponding firmware in different locations, so just depending on the direct loader would not be good enough. > > Is ptr below > > ret = request_firmware_into_buf(_fw, fw_name, dev, > ptr, phdr->p_filesz); > > Also part of the DMA buffer allocated earlier via: > > ret = qcom_scm_pas_init_image(pas_id, fw->data, fw->size); > > Android folks? I think the Qualcomm folks owning this (Andy, David, Bjorn, already cc'd here) are better suited to answer that question. Thanks, Martijn
Re: [PATCH v3 2/5] efi: Add embedded peripheral firmware support
On Wed, Apr 25, 2018 at 10:55 AM, Luis R. Rodriguez wrote: > Android became the primary user of CONFIG_FW_LOADER_USER_HELPER_FALLBACK. > > It would be good for us to hear from Android folks if their current use of > request_firmware_into_buf() is designed in practice to *never* use the direct > filesystem firmware loading interface, and always rely instead on the > fallback mechanism. It's hard to answer this question for Android in general. As far as I can tell the reasons we use CONFIG_FW_LOADER_USER_HELPER(_FALLBACK) are: 1) We have multiple different paths on our devices where firmware can be located, and the direct loader only supports one custom path 2) Most of those paths are not mounted by the time the corresponding drivers are loaded, because pretty much all Android kernels today are built without module support, and therefore drivers are loaded well before the firmware partition is mounted 3) I think we use _FALLBACK because doing this with uevents is just the easiest thing to do; our init code has a firmware helper that deals with this and searches the paths that we care about 2) will change at some point, because Android is moving towards a model where device-specific peripheral drivers will be loaded as modules, and since those modules would likely come from the same partition as the firmware, it's possible that the direct load would succeed (depending on whether the custom path is configured there or not). But I don't think we can rely on the direct loader even in those cases, unless we could configure it with multiple custom paths. I have no reason to believe request_firmware_into_buf() is special in this regard; drivers that depend on it may have their corresponding firmware in different locations, so just depending on the direct loader would not be good enough. > > Is ptr below > > ret = request_firmware_into_buf(_fw, fw_name, dev, > ptr, phdr->p_filesz); > > Also part of the DMA buffer allocated earlier via: > > ret = qcom_scm_pas_init_image(pas_id, fw->data, fw->size); > > Android folks? I think the Qualcomm folks owning this (Andy, David, Bjorn, already cc'd here) are better suited to answer that question. Thanks, Martijn
Re: [PATCH v3 2/5] efi: Add embedded peripheral firmware support
On Thu, May 3, 2018 at 5:21 PM, Luis R. Rodriguezwrote: > Android folks, poke below. otherwise we'll have no option but to seriously > consider Mimi's patch to prevent these calls when IMA appraisal is enforced: Sorry, figuring out who's the right person to answer this, will get back to you ASAP. Martijn > > http://lkml.kernel.org/r/1525182503-13849-7-git-send-email-zo...@linux.vnet.ibm.com > > Please read below > > On Wed, Apr 25, 2018 at 05:55:57PM +, Luis R. Rodriguez wrote: >> On Wed, Apr 25, 2018 at 01:00:09AM -0400, Mimi Zohar wrote: >> > On Tue, 2018-04-24 at 23:42 +, Luis R. Rodriguez wrote: >> > > On Tue, Apr 24, 2018 at 12:07:01PM -0400, Mimi Zohar wrote: >> > > > On Tue, 2018-04-24 at 17:09 +0200, Hans de Goede wrote: >> > > If its of any help -- >> > > >> > > drivers/soc/qcom/mdt_loader.c is the only driver currently using >> > > request_firmware_into_buf() however I'll note qcom_mdt_load() is used in >> > > many >> > > other drivers so they are wrappers around request_firmware_into_buf(): >> > > >> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c: * adreno_request_fw() handles >> > > this, but qcom_mdt_load() does >> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c: ret = qcom_mdt_load(dev, >> > > fw, fwname, GPU_PAS_ID, >> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c: ret = qcom_mdt_load(dev, >> > > fw, newname, GPU_PAS_ID, >> > > drivers/media/platform/qcom/venus/firmware.c: ret = qcom_mdt_load(dev, >> > > mdt, fwname, VENUS_PAS_ID, mem_va, mem_phys, >> > > drivers/remoteproc/qcom_adsp_pil.c: return qcom_mdt_load(adsp->dev, >> > > fw, rproc->firmware, adsp->pas_id, >> > > drivers/remoteproc/qcom_wcnss.c:return qcom_mdt_load(wcnss->dev, >> > > fw, rproc->firmware, WCNSS_PAS_ID, >> > > >> > > > > As such the current IMA code (from v4.17-rc2) actually does >> > > > > not handle READING_FIRMWARE_PREALLOC_BUFFER at all, >> > > > >> > > > Right, it doesn't yet address READING_FIRMWARE_PREALLOC_BUFFER, but >> > > > should. >> > > > >> > > > Depending on whether the device requesting the firmware has access to >> > > > the DMA memory, before the signature verification, >> > > >> > > It would seem from the original patch review about >> > > READING_FIRMWARE_PREALLOC_BUFFER >> > > that this is not a DMA buffer. > > To be very clear I believe Stephen implied this was not DMA buffer. Mimi > asked for READING_FIRMWARE_DMA if it was: > > https://patchwork.kernel.org/patch/9074611/ > >> > The call sequence: >> > qcom_mdt_load() -> qcom_scm_pas_init_image() -> dma_alloc_coherent() >> > >> > If dma_alloc_coherent() isn't allocating a DMA buffer, then the >> > function name is misleading/confusing. >> >> Hah, by *definition* the device *and* processor has immediate access >> to data written *immediately* when dma_alloc_coherent() is used. From >> Documentation/DMA-API.txt: >> >> --- >> Part Ia - Using large DMA-coherent buffers >> -- >> >> :: >> >> void * >> dma_alloc_coherent(struct device *dev, size_t size, >>dma_addr_t *dma_handle, gfp_t flag) >> >> Consistent memory is memory for which a write by either the device or >> the processor can immediately be read by the processor or device >> without having to worry about caching effects. (You may however need >> to make sure to flush the processor's write buffers before telling >> devices to read that memory.) >> >> >> Is ptr below >> >> ret = request_firmware_into_buf(_fw, fw_name, dev, >> ptr, phdr->p_filesz); >> >> Also part of the DMA buffer allocated earlier via: >> >> ret = qcom_scm_pas_init_image(pas_id, fw->data, fw->size); >> >> Android folks? > > Android folks? > >> > > The device driver should have access to the buffer pointer with write >> > > given >> > > that with request_firmware_into_buf() the driver is giving full write >> > > access to >> > > the memory pointer so that the firmware API can stuff the firmware it >> > > finds >> > > there. >> > > >> > > Firmware signature verification would be up to the device hardware to do >> > > upon >> > > load *after* request_firmware_into_buf(). >> > >> > We're discussing the kernel's signature verification, not the device >> > hardware's signature verification. Can the device driver access the >> > buffer, before IMA-appraisal has verified the firmware's signature? >> >> It will depend on the above question. > > Luis
Re: [PATCH v3 2/5] efi: Add embedded peripheral firmware support
On Thu, May 3, 2018 at 5:21 PM, Luis R. Rodriguez wrote: > Android folks, poke below. otherwise we'll have no option but to seriously > consider Mimi's patch to prevent these calls when IMA appraisal is enforced: Sorry, figuring out who's the right person to answer this, will get back to you ASAP. Martijn > > http://lkml.kernel.org/r/1525182503-13849-7-git-send-email-zo...@linux.vnet.ibm.com > > Please read below > > On Wed, Apr 25, 2018 at 05:55:57PM +, Luis R. Rodriguez wrote: >> On Wed, Apr 25, 2018 at 01:00:09AM -0400, Mimi Zohar wrote: >> > On Tue, 2018-04-24 at 23:42 +, Luis R. Rodriguez wrote: >> > > On Tue, Apr 24, 2018 at 12:07:01PM -0400, Mimi Zohar wrote: >> > > > On Tue, 2018-04-24 at 17:09 +0200, Hans de Goede wrote: >> > > If its of any help -- >> > > >> > > drivers/soc/qcom/mdt_loader.c is the only driver currently using >> > > request_firmware_into_buf() however I'll note qcom_mdt_load() is used in >> > > many >> > > other drivers so they are wrappers around request_firmware_into_buf(): >> > > >> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c: * adreno_request_fw() handles >> > > this, but qcom_mdt_load() does >> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c: ret = qcom_mdt_load(dev, >> > > fw, fwname, GPU_PAS_ID, >> > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c: ret = qcom_mdt_load(dev, >> > > fw, newname, GPU_PAS_ID, >> > > drivers/media/platform/qcom/venus/firmware.c: ret = qcom_mdt_load(dev, >> > > mdt, fwname, VENUS_PAS_ID, mem_va, mem_phys, >> > > drivers/remoteproc/qcom_adsp_pil.c: return qcom_mdt_load(adsp->dev, >> > > fw, rproc->firmware, adsp->pas_id, >> > > drivers/remoteproc/qcom_wcnss.c:return qcom_mdt_load(wcnss->dev, >> > > fw, rproc->firmware, WCNSS_PAS_ID, >> > > >> > > > > As such the current IMA code (from v4.17-rc2) actually does >> > > > > not handle READING_FIRMWARE_PREALLOC_BUFFER at all, >> > > > >> > > > Right, it doesn't yet address READING_FIRMWARE_PREALLOC_BUFFER, but >> > > > should. >> > > > >> > > > Depending on whether the device requesting the firmware has access to >> > > > the DMA memory, before the signature verification, >> > > >> > > It would seem from the original patch review about >> > > READING_FIRMWARE_PREALLOC_BUFFER >> > > that this is not a DMA buffer. > > To be very clear I believe Stephen implied this was not DMA buffer. Mimi > asked for READING_FIRMWARE_DMA if it was: > > https://patchwork.kernel.org/patch/9074611/ > >> > The call sequence: >> > qcom_mdt_load() -> qcom_scm_pas_init_image() -> dma_alloc_coherent() >> > >> > If dma_alloc_coherent() isn't allocating a DMA buffer, then the >> > function name is misleading/confusing. >> >> Hah, by *definition* the device *and* processor has immediate access >> to data written *immediately* when dma_alloc_coherent() is used. From >> Documentation/DMA-API.txt: >> >> --- >> Part Ia - Using large DMA-coherent buffers >> -- >> >> :: >> >> void * >> dma_alloc_coherent(struct device *dev, size_t size, >>dma_addr_t *dma_handle, gfp_t flag) >> >> Consistent memory is memory for which a write by either the device or >> the processor can immediately be read by the processor or device >> without having to worry about caching effects. (You may however need >> to make sure to flush the processor's write buffers before telling >> devices to read that memory.) >> >> >> Is ptr below >> >> ret = request_firmware_into_buf(_fw, fw_name, dev, >> ptr, phdr->p_filesz); >> >> Also part of the DMA buffer allocated earlier via: >> >> ret = qcom_scm_pas_init_image(pas_id, fw->data, fw->size); >> >> Android folks? > > Android folks? > >> > > The device driver should have access to the buffer pointer with write >> > > given >> > > that with request_firmware_into_buf() the driver is giving full write >> > > access to >> > > the memory pointer so that the firmware API can stuff the firmware it >> > > finds >> > > there. >> > > >> > > Firmware signature verification would be up to the device hardware to do >> > > upon >> > > load *after* request_firmware_into_buf(). >> > >> > We're discussing the kernel's signature verification, not the device >> > hardware's signature verification. Can the device driver access the >> > buffer, before IMA-appraisal has verified the firmware's signature? >> >> It will depend on the above question. > > Luis