Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-24 Thread Vitaly Mayatskikh
On Sun, 24 Sep 2017 10:27:39 -0400,
Al Viro wrote:

> BTW, there's something fishy in bio_copy_user_iov().  If the area we'd asked 
> for
> had been too large for a single bio, we are going to create a bio and have
>  bio_add_pc_page() eventually fill it up to limit.  Then we return into
> __blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again.
> Fine, but... now we might have non-zero iter->iov_offset.  And this
> bmd->is_our_pages = map_data ? 0 : 1;
> memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs);
> iov_iter_init(>iter, iter->type, bmd->iov,
> iter->nr_segs, iter->count);
> does not even look at iter->iov_offset.  As the result, when it gets to
> bio_uncopy_user(), we copy the data from each bio into the *beginning* of
> the user area, overwriting that from the other bio.

Yeah, something is wrong with bio_copy_user_iov. Our datapath hangs when IO 
flows through unmodified SG (it forces bio_copy if iov_count is set). I did not 
look at details, but same IO pattern and memory layout work well with bio_map 
(module refcount problem).

> Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and
> force-pushed the result.

I'll give it a try, thanks!
-- 
wbr, Vitaly


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-24 Thread Vitaly Mayatskikh
On Sun, 24 Sep 2017 10:27:39 -0400,
Al Viro wrote:

> BTW, there's something fishy in bio_copy_user_iov().  If the area we'd asked 
> for
> had been too large for a single bio, we are going to create a bio and have
>  bio_add_pc_page() eventually fill it up to limit.  Then we return into
> __blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again.
> Fine, but... now we might have non-zero iter->iov_offset.  And this
> bmd->is_our_pages = map_data ? 0 : 1;
> memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs);
> iov_iter_init(>iter, iter->type, bmd->iov,
> iter->nr_segs, iter->count);
> does not even look at iter->iov_offset.  As the result, when it gets to
> bio_uncopy_user(), we copy the data from each bio into the *beginning* of
> the user area, overwriting that from the other bio.

Yeah, something is wrong with bio_copy_user_iov. Our datapath hangs when IO 
flows through unmodified SG (it forces bio_copy if iov_count is set). I did not 
look at details, but same IO pattern and memory layout work well with bio_map 
(module refcount problem).

> Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and
> force-pushed the result.

I'll give it a try, thanks!
-- 
wbr, Vitaly


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-24 Thread Al Viro
On Sun, Sep 24, 2017 at 03:27:39PM +0100, Al Viro wrote:

> At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov;
> instead of that iov_iter_init() in there.  I'm not sure how far back does
> it go; looks like "block: support large requests in blk_rq_map_user_iov"
> is the earliest possible point, but it might need more digging to make
> sure.  v4.5+, if that's when the problems began...
> 
> Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and
> force-pushed the result.

While we are at it, calculation of nr_pages in bio_copy_user_iov() is bloody
odd - why, in the name of everything unholy, does it care about the iovec
boundaries in there?  We are copying data anyway; why does allocation of bio
care about the fragmentation of the other end of copying?  Shouldn't it be
simply max(DIV_ROUND_UP(offset + len, PAGE_SIZE), BIO_MAX_PAGES)?


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-24 Thread Al Viro
On Sun, Sep 24, 2017 at 03:27:39PM +0100, Al Viro wrote:

> At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov;
> instead of that iov_iter_init() in there.  I'm not sure how far back does
> it go; looks like "block: support large requests in blk_rq_map_user_iov"
> is the earliest possible point, but it might need more digging to make
> sure.  v4.5+, if that's when the problems began...
> 
> Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and
> force-pushed the result.

While we are at it, calculation of nr_pages in bio_copy_user_iov() is bloody
odd - why, in the name of everything unholy, does it care about the iovec
boundaries in there?  We are copying data anyway; why does allocation of bio
care about the fragmentation of the other end of copying?  Shouldn't it be
simply max(DIV_ROUND_UP(offset + len, PAGE_SIZE), BIO_MAX_PAGES)?


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-24 Thread Al Viro
On Sat, Sep 23, 2017 at 09:33:23PM +0100, Al Viro wrote:
> On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote:
> > On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote:
> > 
> > > IOW, the loop on failure exit should go through the bio, like 
> > > __bio_unmap_user()
> > > does.  We *also* need to put everything left unused in pages[], but only 
> > > from the
> > > last iteration through iov_for_each().
> > > 
> > > Frankly, I would prefer to reuse the pages[], rather than append to it on 
> > > each
> > > iteration.  Used iov_iter_get_pages_alloc(), actually.
> > 
> > Something like completely untested diff below, perhaps...
> 
> > +   unsigned n = PAGE_SIZE - offs;
> > +   unsigned prev_bi_vcnt = bio->bi_vcnt;
> 
> Sorry, that should've been followed by
>   if (n > bytes)
>   n = bytes;
> 
> Anyway, a carved-up variant is in vfs.git#work.iov_iter.  It still needs
> review and testing; the patch Vitaly has posted in this thread plus 6
> followups, hopefully more readable than aggregate diff.
> 
> Comments?

BTW, there's something fishy in bio_copy_user_iov().  If the area we'd asked for
had been too large for a single bio, we are going to create a bio and have
 bio_add_pc_page() eventually fill it up to limit.  Then we return into
__blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again.
Fine, but... now we might have non-zero iter->iov_offset.  And this
bmd->is_our_pages = map_data ? 0 : 1;
memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs);
iov_iter_init(>iter, iter->type, bmd->iov,
iter->nr_segs, iter->count);
does not even look at iter->iov_offset.  As the result, when it gets to
bio_uncopy_user(), we copy the data from each bio into the *beginning* of
the user area, overwriting that from the other bio.

At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov;
instead of that iov_iter_init() in there.  I'm not sure how far back does
it go; looks like "block: support large requests in blk_rq_map_user_iov"
is the earliest possible point, but it might need more digging to make
sure.  v4.5+, if that's when the problems began...

Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and
force-pushed the result.


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-24 Thread Al Viro
On Sat, Sep 23, 2017 at 09:33:23PM +0100, Al Viro wrote:
> On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote:
> > On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote:
> > 
> > > IOW, the loop on failure exit should go through the bio, like 
> > > __bio_unmap_user()
> > > does.  We *also* need to put everything left unused in pages[], but only 
> > > from the
> > > last iteration through iov_for_each().
> > > 
> > > Frankly, I would prefer to reuse the pages[], rather than append to it on 
> > > each
> > > iteration.  Used iov_iter_get_pages_alloc(), actually.
> > 
> > Something like completely untested diff below, perhaps...
> 
> > +   unsigned n = PAGE_SIZE - offs;
> > +   unsigned prev_bi_vcnt = bio->bi_vcnt;
> 
> Sorry, that should've been followed by
>   if (n > bytes)
>   n = bytes;
> 
> Anyway, a carved-up variant is in vfs.git#work.iov_iter.  It still needs
> review and testing; the patch Vitaly has posted in this thread plus 6
> followups, hopefully more readable than aggregate diff.
> 
> Comments?

BTW, there's something fishy in bio_copy_user_iov().  If the area we'd asked for
had been too large for a single bio, we are going to create a bio and have
 bio_add_pc_page() eventually fill it up to limit.  Then we return into
__blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again.
Fine, but... now we might have non-zero iter->iov_offset.  And this
bmd->is_our_pages = map_data ? 0 : 1;
memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs);
iov_iter_init(>iter, iter->type, bmd->iov,
iter->nr_segs, iter->count);
does not even look at iter->iov_offset.  As the result, when it gets to
bio_uncopy_user(), we copy the data from each bio into the *beginning* of
the user area, overwriting that from the other bio.

At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov;
instead of that iov_iter_init() in there.  I'm not sure how far back does
it go; looks like "block: support large requests in blk_rq_map_user_iov"
is the earliest possible point, but it might need more digging to make
sure.  v4.5+, if that's when the problems began...

Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and
force-pushed the result.


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-23 Thread Al Viro
On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote:
> On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote:
> 
> > IOW, the loop on failure exit should go through the bio, like 
> > __bio_unmap_user()
> > does.  We *also* need to put everything left unused in pages[], but only 
> > from the
> > last iteration through iov_for_each().
> > 
> > Frankly, I would prefer to reuse the pages[], rather than append to it on 
> > each
> > iteration.  Used iov_iter_get_pages_alloc(), actually.
> 
> Something like completely untested diff below, perhaps...

> + unsigned n = PAGE_SIZE - offs;
> + unsigned prev_bi_vcnt = bio->bi_vcnt;

Sorry, that should've been followed by
if (n > bytes)
n = bytes;

Anyway, a carved-up variant is in vfs.git#work.iov_iter.  It still needs
review and testing; the patch Vitaly has posted in this thread plus 6
followups, hopefully more readable than aggregate diff.

Comments?


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-23 Thread Al Viro
On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote:
> On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote:
> 
> > IOW, the loop on failure exit should go through the bio, like 
> > __bio_unmap_user()
> > does.  We *also* need to put everything left unused in pages[], but only 
> > from the
> > last iteration through iov_for_each().
> > 
> > Frankly, I would prefer to reuse the pages[], rather than append to it on 
> > each
> > iteration.  Used iov_iter_get_pages_alloc(), actually.
> 
> Something like completely untested diff below, perhaps...

> + unsigned n = PAGE_SIZE - offs;
> + unsigned prev_bi_vcnt = bio->bi_vcnt;

Sorry, that should've been followed by
if (n > bytes)
n = bytes;

Anyway, a carved-up variant is in vfs.git#work.iov_iter.  It still needs
review and testing; the patch Vitaly has posted in this thread plus 6
followups, hopefully more readable than aggregate diff.

Comments?


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-23 Thread Al Viro
On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote:

> IOW, the loop on failure exit should go through the bio, like 
> __bio_unmap_user()
> does.  We *also* need to put everything left unused in pages[], but only from 
> the
> last iteration through iov_for_each().
> 
> Frankly, I would prefer to reuse the pages[], rather than append to it on each
> iteration.  Used iov_iter_get_pages_alloc(), actually.

Something like completely untested diff below, perhaps...

diff --git a/block/bio.c b/block/bio.c
index b38e962fa83e..b5fe23597b41 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1323,94 +1323,60 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 const struct iov_iter *iter,
 gfp_t gfp_mask)
 {
-   int j;
-   int nr_pages = 0;
-   struct page **pages;
struct bio *bio;
-   int cur_page = 0;
-   int ret, offset;
+   struct bio_vec *bvec;
struct iov_iter i;
-   struct iovec iov;
-
-   iov_for_each(iov, i, *iter) {
-   unsigned long uaddr = (unsigned long) iov.iov_base;
-   unsigned long len = iov.iov_len;
-   unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   unsigned long start = uaddr >> PAGE_SHIFT;
-
-   /*
-* Overflow, abort
-*/
-   if (end < start)
-   return ERR_PTR(-EINVAL);
-
-   nr_pages += end - start;
-   /*
-* buffer must be aligned to at least logical block size for now
-*/
-   if (uaddr & queue_dma_alignment(q))
-   return ERR_PTR(-EINVAL);
-   }
+   int ret, j;
 
-   if (!nr_pages)
+   if (!iov_iter_count(iter))
return ERR_PTR(-EINVAL);
 
-   bio = bio_kmalloc(gfp_mask, nr_pages);
+   bio = bio_kmalloc(gfp_mask, iov_iter_npages(iter, BIO_MAX_PAGES));
if (!bio)
return ERR_PTR(-ENOMEM);
 
-   ret = -ENOMEM;
-   pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
-   if (!pages)
-   goto out;
+   i = *iter;
+   while (iov_iter_count()) {
+   struct page **pages;
+   size_t offs;
+   ssize_t bytes;
+   int npages, j;
 
-   iov_for_each(iov, i, *iter) {
-   unsigned long uaddr = (unsigned long) iov.iov_base;
-   unsigned long len = iov.iov_len;
-   unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   unsigned long start = uaddr >> PAGE_SHIFT;
-   const int local_nr_pages = end - start;
-   const int page_limit = cur_page + local_nr_pages;
-
-   ret = get_user_pages_fast(uaddr, local_nr_pages,
-   (iter->type & WRITE) != WRITE,
-   [cur_page]);
-   if (ret < local_nr_pages) {
-   ret = -EFAULT;
+   bytes = iov_iter_get_pages_alloc(, , LONG_MAX, );
+   if (bytes <= 0) {
+   ret = bytes ? bytes : -EFAULT;
goto out_unmap;
}
 
-   offset = offset_in_page(uaddr);
-   for (j = cur_page; j < page_limit; j++) {
-   unsigned int bytes = PAGE_SIZE - offset;
+   npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE);
 
-   if (len <= 0)
-   break;
+   if (unlikely(offs & queue_dma_alignment(q))) {
+   ret = -EINVAL;
+   j = 0;
+   } else {
+   for (j = 0; j < npages; j++) {
+   struct page *page = pages[j];
+   unsigned n = PAGE_SIZE - offs;
+   unsigned prev_bi_vcnt = bio->bi_vcnt;
+
+   if (!bio_add_pc_page(q, bio, page, n, offs)) {
+   iov_iter_truncate(, 0);
+   break;
+   }
+
+   if (bio->bi_vcnt == prev_bi_vcnt)
+   put_page(page);

-   if (bytes > len)
-   bytes = len;
-
-   /*
-* sorry...
-*/
-   if (bio_add_pc_page(q, bio, pages[j], bytes, offset) <
-   bytes)
-   break;
-
-   len -= bytes;
-   offset = 0;
+   iov_iter_advance(, n);
+   bytes -= n;
+   offs = 0;
+   }
}
-
-   cur_page = j;
- 

Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-23 Thread Al Viro
On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote:

> IOW, the loop on failure exit should go through the bio, like 
> __bio_unmap_user()
> does.  We *also* need to put everything left unused in pages[], but only from 
> the
> last iteration through iov_for_each().
> 
> Frankly, I would prefer to reuse the pages[], rather than append to it on each
> iteration.  Used iov_iter_get_pages_alloc(), actually.

Something like completely untested diff below, perhaps...

diff --git a/block/bio.c b/block/bio.c
index b38e962fa83e..b5fe23597b41 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1323,94 +1323,60 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 const struct iov_iter *iter,
 gfp_t gfp_mask)
 {
-   int j;
-   int nr_pages = 0;
-   struct page **pages;
struct bio *bio;
-   int cur_page = 0;
-   int ret, offset;
+   struct bio_vec *bvec;
struct iov_iter i;
-   struct iovec iov;
-
-   iov_for_each(iov, i, *iter) {
-   unsigned long uaddr = (unsigned long) iov.iov_base;
-   unsigned long len = iov.iov_len;
-   unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   unsigned long start = uaddr >> PAGE_SHIFT;
-
-   /*
-* Overflow, abort
-*/
-   if (end < start)
-   return ERR_PTR(-EINVAL);
-
-   nr_pages += end - start;
-   /*
-* buffer must be aligned to at least logical block size for now
-*/
-   if (uaddr & queue_dma_alignment(q))
-   return ERR_PTR(-EINVAL);
-   }
+   int ret, j;
 
-   if (!nr_pages)
+   if (!iov_iter_count(iter))
return ERR_PTR(-EINVAL);
 
-   bio = bio_kmalloc(gfp_mask, nr_pages);
+   bio = bio_kmalloc(gfp_mask, iov_iter_npages(iter, BIO_MAX_PAGES));
if (!bio)
return ERR_PTR(-ENOMEM);
 
-   ret = -ENOMEM;
-   pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
-   if (!pages)
-   goto out;
+   i = *iter;
+   while (iov_iter_count()) {
+   struct page **pages;
+   size_t offs;
+   ssize_t bytes;
+   int npages, j;
 
-   iov_for_each(iov, i, *iter) {
-   unsigned long uaddr = (unsigned long) iov.iov_base;
-   unsigned long len = iov.iov_len;
-   unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   unsigned long start = uaddr >> PAGE_SHIFT;
-   const int local_nr_pages = end - start;
-   const int page_limit = cur_page + local_nr_pages;
-
-   ret = get_user_pages_fast(uaddr, local_nr_pages,
-   (iter->type & WRITE) != WRITE,
-   [cur_page]);
-   if (ret < local_nr_pages) {
-   ret = -EFAULT;
+   bytes = iov_iter_get_pages_alloc(, , LONG_MAX, );
+   if (bytes <= 0) {
+   ret = bytes ? bytes : -EFAULT;
goto out_unmap;
}
 
-   offset = offset_in_page(uaddr);
-   for (j = cur_page; j < page_limit; j++) {
-   unsigned int bytes = PAGE_SIZE - offset;
+   npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE);
 
-   if (len <= 0)
-   break;
+   if (unlikely(offs & queue_dma_alignment(q))) {
+   ret = -EINVAL;
+   j = 0;
+   } else {
+   for (j = 0; j < npages; j++) {
+   struct page *page = pages[j];
+   unsigned n = PAGE_SIZE - offs;
+   unsigned prev_bi_vcnt = bio->bi_vcnt;
+
+   if (!bio_add_pc_page(q, bio, page, n, offs)) {
+   iov_iter_truncate(, 0);
+   break;
+   }
+
+   if (bio->bi_vcnt == prev_bi_vcnt)
+   put_page(page);

-   if (bytes > len)
-   bytes = len;
-
-   /*
-* sorry...
-*/
-   if (bio_add_pc_page(q, bio, pages[j], bytes, offset) <
-   bytes)
-   break;
-
-   len -= bytes;
-   offset = 0;
+   iov_iter_advance(, n);
+   bytes -= n;
+   offs = 0;
+   }
}
-
-   cur_page = j;
- 

Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-23 Thread Al Viro
On Sat, Sep 23, 2017 at 05:39:28PM +0100, Al Viro wrote:
> On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote:
> > bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
> > IO vector has small consecutive buffers belonging to the same page.
> > bio_add_pc_page merges them into one, but the page reference is never
> > dropped.
> > 
> > Signed-off-by: Vitaly Mayatskikh 
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index b38e962fa83e..10cd3b6bed27 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
> > offset = offset_in_page(uaddr);
> > for (j = cur_page; j < page_limit; j++) {
> > unsigned int bytes = PAGE_SIZE - offset;
> > +   unsigned short prev_bi_vcnt = bio->bi_vcnt;
> >  
> > if (len <= 0)
> > break;
> > @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q,
> > bytes)
> > break;
> >  
> > +   /*
> > +* check if vector was merged with previous
> > +* drop page reference if needed
> > +*/
> > +   if (bio->bi_vcnt == prev_bi_vcnt)
> > +   put_page(pages[j]);
> > +
> 
> Except that now you've got double-puts on failure exits ;-/

IOW, the loop on failure exit should go through the bio, like __bio_unmap_user()
does.  We *also* need to put everything left unused in pages[], but only from 
the
last iteration through iov_for_each().

Frankly, I would prefer to reuse the pages[], rather than append to it on each
iteration.  Used iov_iter_get_pages_alloc(), actually.


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-23 Thread Al Viro
On Sat, Sep 23, 2017 at 05:39:28PM +0100, Al Viro wrote:
> On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote:
> > bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
> > IO vector has small consecutive buffers belonging to the same page.
> > bio_add_pc_page merges them into one, but the page reference is never
> > dropped.
> > 
> > Signed-off-by: Vitaly Mayatskikh 
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index b38e962fa83e..10cd3b6bed27 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
> > offset = offset_in_page(uaddr);
> > for (j = cur_page; j < page_limit; j++) {
> > unsigned int bytes = PAGE_SIZE - offset;
> > +   unsigned short prev_bi_vcnt = bio->bi_vcnt;
> >  
> > if (len <= 0)
> > break;
> > @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q,
> > bytes)
> > break;
> >  
> > +   /*
> > +* check if vector was merged with previous
> > +* drop page reference if needed
> > +*/
> > +   if (bio->bi_vcnt == prev_bi_vcnt)
> > +   put_page(pages[j]);
> > +
> 
> Except that now you've got double-puts on failure exits ;-/

IOW, the loop on failure exit should go through the bio, like __bio_unmap_user()
does.  We *also* need to put everything left unused in pages[], but only from 
the
last iteration through iov_for_each().

Frankly, I would prefer to reuse the pages[], rather than append to it on each
iteration.  Used iov_iter_get_pages_alloc(), actually.


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-23 Thread Al Viro
On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote:
> bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
> IO vector has small consecutive buffers belonging to the same page.
> bio_add_pc_page merges them into one, but the page reference is never
> dropped.
> 
> Signed-off-by: Vitaly Mayatskikh 
> 
> diff --git a/block/bio.c b/block/bio.c
> index b38e962fa83e..10cd3b6bed27 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
>   offset = offset_in_page(uaddr);
>   for (j = cur_page; j < page_limit; j++) {
>   unsigned int bytes = PAGE_SIZE - offset;
> + unsigned short prev_bi_vcnt = bio->bi_vcnt;
>  
>   if (len <= 0)
>   break;
> @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q,
>   bytes)
>   break;
>  
> + /*
> +  * check if vector was merged with previous
> +  * drop page reference if needed
> +  */
> + if (bio->bi_vcnt == prev_bi_vcnt)
> + put_page(pages[j]);
> +

Except that now you've got double-puts on failure exits ;-/


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-23 Thread Al Viro
On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote:
> bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
> IO vector has small consecutive buffers belonging to the same page.
> bio_add_pc_page merges them into one, but the page reference is never
> dropped.
> 
> Signed-off-by: Vitaly Mayatskikh 
> 
> diff --git a/block/bio.c b/block/bio.c
> index b38e962fa83e..10cd3b6bed27 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
>   offset = offset_in_page(uaddr);
>   for (j = cur_page; j < page_limit; j++) {
>   unsigned int bytes = PAGE_SIZE - offset;
> + unsigned short prev_bi_vcnt = bio->bi_vcnt;
>  
>   if (len <= 0)
>   break;
> @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q,
>   bytes)
>   break;
>  
> + /*
> +  * check if vector was merged with previous
> +  * drop page reference if needed
> +  */
> + if (bio->bi_vcnt == prev_bi_vcnt)
> + put_page(pages[j]);
> +

Except that now you've got double-puts on failure exits ;-/


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-21 Thread Vitaly Mayatskikh
Reproducer (needs SCSI disk):

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
 
#define NR_IOS 1
#define NR_IOVECS 8
#define SG_IO 0x2285
 
int main(int argc, char *argv[])
{
int fd, i, j;
unsigned char *buf, *ptr, cdb[10];
sg_io_hdr_t io_hdr;
sg_iovec_t iovec[NR_IOVECS];
 
if (argc < 2) {
printf("Run: %s \n", argv[0]);
exit(1);
}
 
buf = ptr = memalign(4096, NR_IOS * NR_IOVECS * 512);
if (!buf) {
printf("can't alloc memory\n");
exit(1);
}
 
fd = open(argv[1], 0);
if (fd < 0) {
printf("open %s failed: %d (%s)\n", argv[1], errno, 
strerror(errno));
exit(1);
}
 
io_hdr.interface_id = 'S';
io_hdr.cmd_len = sizeof(cdb);
io_hdr.cmdp = cdb;
io_hdr.dxfer_direction = SG_DXFER_FROM_DEV;
io_hdr.dxfer_len = 512 * NR_IOVECS;
io_hdr.dxferp = iovec;
io_hdr.iovec_count = NR_IOVECS;
 
cdb[0] = 0x28;  // READ10
cdb[8] = NR_IOVECS; // sectors
 
for (j = 0; j < NR_IOS; j++, ptr += 512) {
for (i = 0; i < NR_IOVECS; i++) {
iovec[i].iov_base = ptr;
iovec[i].iov_len = 512;
}
if (ioctl(fd, SG_IO, _hdr)) {
printf("IOCTL failed: %d (%s)\n", errno, 
strerror(errno));
exit(1);
}
}
 
free(buf);
close(fd);
return 0;
}


# free -m
  totalusedfree  shared  buff/cache   available
Mem:   3827  463601   0 1783568
Swap: 0   0   0
# ./sgio-leak /dev/sdd
# free -m
  totalusedfree  shared  buff/cache   available
Mem:   3827  853562   0 1783529
Swap: 0   0   0
[root@node-A ~]# free -m
  totalusedfree  shared  buff/cache   available
Mem:   3827  853628   0 1133561
Swap: 0   0   0
# ./sgio-leak /dev/sdd
# free -m
  totalusedfree  shared  buff/cache   available
Mem:   3827 1243589   0 1133523
Swap: 0   0   0

-- 
wbr, Vitaly


Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-21 Thread Vitaly Mayatskikh
Reproducer (needs SCSI disk):

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
 
#define NR_IOS 1
#define NR_IOVECS 8
#define SG_IO 0x2285
 
int main(int argc, char *argv[])
{
int fd, i, j;
unsigned char *buf, *ptr, cdb[10];
sg_io_hdr_t io_hdr;
sg_iovec_t iovec[NR_IOVECS];
 
if (argc < 2) {
printf("Run: %s \n", argv[0]);
exit(1);
}
 
buf = ptr = memalign(4096, NR_IOS * NR_IOVECS * 512);
if (!buf) {
printf("can't alloc memory\n");
exit(1);
}
 
fd = open(argv[1], 0);
if (fd < 0) {
printf("open %s failed: %d (%s)\n", argv[1], errno, 
strerror(errno));
exit(1);
}
 
io_hdr.interface_id = 'S';
io_hdr.cmd_len = sizeof(cdb);
io_hdr.cmdp = cdb;
io_hdr.dxfer_direction = SG_DXFER_FROM_DEV;
io_hdr.dxfer_len = 512 * NR_IOVECS;
io_hdr.dxferp = iovec;
io_hdr.iovec_count = NR_IOVECS;
 
cdb[0] = 0x28;  // READ10
cdb[8] = NR_IOVECS; // sectors
 
for (j = 0; j < NR_IOS; j++, ptr += 512) {
for (i = 0; i < NR_IOVECS; i++) {
iovec[i].iov_base = ptr;
iovec[i].iov_len = 512;
}
if (ioctl(fd, SG_IO, _hdr)) {
printf("IOCTL failed: %d (%s)\n", errno, 
strerror(errno));
exit(1);
}
}
 
free(buf);
close(fd);
return 0;
}


# free -m
  totalusedfree  shared  buff/cache   available
Mem:   3827  463601   0 1783568
Swap: 0   0   0
# ./sgio-leak /dev/sdd
# free -m
  totalusedfree  shared  buff/cache   available
Mem:   3827  853562   0 1783529
Swap: 0   0   0
[root@node-A ~]# free -m
  totalusedfree  shared  buff/cache   available
Mem:   3827  853628   0 1133561
Swap: 0   0   0
# ./sgio-leak /dev/sdd
# free -m
  totalusedfree  shared  buff/cache   available
Mem:   3827 1243589   0 1133523
Swap: 0   0   0

-- 
wbr, Vitaly


[PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-21 Thread Vitaly Mayatskikh
bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
IO vector has small consecutive buffers belonging to the same page.
bio_add_pc_page merges them into one, but the page reference is never
dropped.

Signed-off-by: Vitaly Mayatskikh 

diff --git a/block/bio.c b/block/bio.c
index b38e962fa83e..10cd3b6bed27 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
offset = offset_in_page(uaddr);
for (j = cur_page; j < page_limit; j++) {
unsigned int bytes = PAGE_SIZE - offset;
+   unsigned short prev_bi_vcnt = bio->bi_vcnt;
 
if (len <= 0)
break;
@@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q,
bytes)
break;
 
+   /*
+* check if vector was merged with previous
+* drop page reference if needed
+*/
+   if (bio->bi_vcnt == prev_bi_vcnt)
+   put_page(pages[j]);
+
len -= bytes;
offset = 0;
}

-- 
wbr, Vitaly


[PATCH] fix unbalanced page refcounting in bio_map_user_iov

2017-09-21 Thread Vitaly Mayatskikh
bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
IO vector has small consecutive buffers belonging to the same page.
bio_add_pc_page merges them into one, but the page reference is never
dropped.

Signed-off-by: Vitaly Mayatskikh 

diff --git a/block/bio.c b/block/bio.c
index b38e962fa83e..10cd3b6bed27 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
offset = offset_in_page(uaddr);
for (j = cur_page; j < page_limit; j++) {
unsigned int bytes = PAGE_SIZE - offset;
+   unsigned short prev_bi_vcnt = bio->bi_vcnt;
 
if (len <= 0)
break;
@@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q,
bytes)
break;
 
+   /*
+* check if vector was merged with previous
+* drop page reference if needed
+*/
+   if (bio->bi_vcnt == prev_bi_vcnt)
+   put_page(pages[j]);
+
len -= bytes;
offset = 0;
}

-- 
wbr, Vitaly