Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sun, 24 Sep 2017 10:27:39 -0400, Al Viro wrote: > BTW, there's something fishy in bio_copy_user_iov(). If the area we'd asked > for > had been too large for a single bio, we are going to create a bio and have > bio_add_pc_page() eventually fill it up to limit. Then we return into > __blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again. > Fine, but... now we might have non-zero iter->iov_offset. And this > bmd->is_our_pages = map_data ? 0 : 1; > memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs); > iov_iter_init(>iter, iter->type, bmd->iov, > iter->nr_segs, iter->count); > does not even look at iter->iov_offset. As the result, when it gets to > bio_uncopy_user(), we copy the data from each bio into the *beginning* of > the user area, overwriting that from the other bio. Yeah, something is wrong with bio_copy_user_iov. Our datapath hangs when IO flows through unmodified SG (it forces bio_copy if iov_count is set). I did not look at details, but same IO pattern and memory layout work well with bio_map (module refcount problem). > Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and > force-pushed the result. I'll give it a try, thanks! -- wbr, Vitaly
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sun, 24 Sep 2017 10:27:39 -0400, Al Viro wrote: > BTW, there's something fishy in bio_copy_user_iov(). If the area we'd asked > for > had been too large for a single bio, we are going to create a bio and have > bio_add_pc_page() eventually fill it up to limit. Then we return into > __blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again. > Fine, but... now we might have non-zero iter->iov_offset. And this > bmd->is_our_pages = map_data ? 0 : 1; > memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs); > iov_iter_init(>iter, iter->type, bmd->iov, > iter->nr_segs, iter->count); > does not even look at iter->iov_offset. As the result, when it gets to > bio_uncopy_user(), we copy the data from each bio into the *beginning* of > the user area, overwriting that from the other bio. Yeah, something is wrong with bio_copy_user_iov. Our datapath hangs when IO flows through unmodified SG (it forces bio_copy if iov_count is set). I did not look at details, but same IO pattern and memory layout work well with bio_map (module refcount problem). > Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and > force-pushed the result. I'll give it a try, thanks! -- wbr, Vitaly
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sun, Sep 24, 2017 at 03:27:39PM +0100, Al Viro wrote: > At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov; > instead of that iov_iter_init() in there. I'm not sure how far back does > it go; looks like "block: support large requests in blk_rq_map_user_iov" > is the earliest possible point, but it might need more digging to make > sure. v4.5+, if that's when the problems began... > > Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and > force-pushed the result. While we are at it, calculation of nr_pages in bio_copy_user_iov() is bloody odd - why, in the name of everything unholy, does it care about the iovec boundaries in there? We are copying data anyway; why does allocation of bio care about the fragmentation of the other end of copying? Shouldn't it be simply max(DIV_ROUND_UP(offset + len, PAGE_SIZE), BIO_MAX_PAGES)?
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sun, Sep 24, 2017 at 03:27:39PM +0100, Al Viro wrote: > At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov; > instead of that iov_iter_init() in there. I'm not sure how far back does > it go; looks like "block: support large requests in blk_rq_map_user_iov" > is the earliest possible point, but it might need more digging to make > sure. v4.5+, if that's when the problems began... > > Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and > force-pushed the result. While we are at it, calculation of nr_pages in bio_copy_user_iov() is bloody odd - why, in the name of everything unholy, does it care about the iovec boundaries in there? We are copying data anyway; why does allocation of bio care about the fragmentation of the other end of copying? Shouldn't it be simply max(DIV_ROUND_UP(offset + len, PAGE_SIZE), BIO_MAX_PAGES)?
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sat, Sep 23, 2017 at 09:33:23PM +0100, Al Viro wrote: > On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote: > > On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > > > > > IOW, the loop on failure exit should go through the bio, like > > > __bio_unmap_user() > > > does. We *also* need to put everything left unused in pages[], but only > > > from the > > > last iteration through iov_for_each(). > > > > > > Frankly, I would prefer to reuse the pages[], rather than append to it on > > > each > > > iteration. Used iov_iter_get_pages_alloc(), actually. > > > > Something like completely untested diff below, perhaps... > > > + unsigned n = PAGE_SIZE - offs; > > + unsigned prev_bi_vcnt = bio->bi_vcnt; > > Sorry, that should've been followed by > if (n > bytes) > n = bytes; > > Anyway, a carved-up variant is in vfs.git#work.iov_iter. It still needs > review and testing; the patch Vitaly has posted in this thread plus 6 > followups, hopefully more readable than aggregate diff. > > Comments? BTW, there's something fishy in bio_copy_user_iov(). If the area we'd asked for had been too large for a single bio, we are going to create a bio and have bio_add_pc_page() eventually fill it up to limit. Then we return into __blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again. Fine, but... now we might have non-zero iter->iov_offset. And this bmd->is_our_pages = map_data ? 0 : 1; memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs); iov_iter_init(>iter, iter->type, bmd->iov, iter->nr_segs, iter->count); does not even look at iter->iov_offset. As the result, when it gets to bio_uncopy_user(), we copy the data from each bio into the *beginning* of the user area, overwriting that from the other bio. At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov; instead of that iov_iter_init() in there. I'm not sure how far back does it go; looks like "block: support large requests in blk_rq_map_user_iov" is the earliest possible point, but it might need more digging to make sure. v4.5+, if that's when the problems began... Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and force-pushed the result.
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sat, Sep 23, 2017 at 09:33:23PM +0100, Al Viro wrote: > On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote: > > On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > > > > > IOW, the loop on failure exit should go through the bio, like > > > __bio_unmap_user() > > > does. We *also* need to put everything left unused in pages[], but only > > > from the > > > last iteration through iov_for_each(). > > > > > > Frankly, I would prefer to reuse the pages[], rather than append to it on > > > each > > > iteration. Used iov_iter_get_pages_alloc(), actually. > > > > Something like completely untested diff below, perhaps... > > > + unsigned n = PAGE_SIZE - offs; > > + unsigned prev_bi_vcnt = bio->bi_vcnt; > > Sorry, that should've been followed by > if (n > bytes) > n = bytes; > > Anyway, a carved-up variant is in vfs.git#work.iov_iter. It still needs > review and testing; the patch Vitaly has posted in this thread plus 6 > followups, hopefully more readable than aggregate diff. > > Comments? BTW, there's something fishy in bio_copy_user_iov(). If the area we'd asked for had been too large for a single bio, we are going to create a bio and have bio_add_pc_page() eventually fill it up to limit. Then we return into __blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again. Fine, but... now we might have non-zero iter->iov_offset. And this bmd->is_our_pages = map_data ? 0 : 1; memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs); iov_iter_init(>iter, iter->type, bmd->iov, iter->nr_segs, iter->count); does not even look at iter->iov_offset. As the result, when it gets to bio_uncopy_user(), we copy the data from each bio into the *beginning* of the user area, overwriting that from the other bio. At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov; instead of that iov_iter_init() in there. I'm not sure how far back does it go; looks like "block: support large requests in blk_rq_map_user_iov" is the earliest possible point, but it might need more digging to make sure. v4.5+, if that's when the problems began... Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and force-pushed the result.
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote: > On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > > > IOW, the loop on failure exit should go through the bio, like > > __bio_unmap_user() > > does. We *also* need to put everything left unused in pages[], but only > > from the > > last iteration through iov_for_each(). > > > > Frankly, I would prefer to reuse the pages[], rather than append to it on > > each > > iteration. Used iov_iter_get_pages_alloc(), actually. > > Something like completely untested diff below, perhaps... > + unsigned n = PAGE_SIZE - offs; > + unsigned prev_bi_vcnt = bio->bi_vcnt; Sorry, that should've been followed by if (n > bytes) n = bytes; Anyway, a carved-up variant is in vfs.git#work.iov_iter. It still needs review and testing; the patch Vitaly has posted in this thread plus 6 followups, hopefully more readable than aggregate diff. Comments?
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote: > On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > > > IOW, the loop on failure exit should go through the bio, like > > __bio_unmap_user() > > does. We *also* need to put everything left unused in pages[], but only > > from the > > last iteration through iov_for_each(). > > > > Frankly, I would prefer to reuse the pages[], rather than append to it on > > each > > iteration. Used iov_iter_get_pages_alloc(), actually. > > Something like completely untested diff below, perhaps... > + unsigned n = PAGE_SIZE - offs; > + unsigned prev_bi_vcnt = bio->bi_vcnt; Sorry, that should've been followed by if (n > bytes) n = bytes; Anyway, a carved-up variant is in vfs.git#work.iov_iter. It still needs review and testing; the patch Vitaly has posted in this thread plus 6 followups, hopefully more readable than aggregate diff. Comments?
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > IOW, the loop on failure exit should go through the bio, like > __bio_unmap_user() > does. We *also* need to put everything left unused in pages[], but only from > the > last iteration through iov_for_each(). > > Frankly, I would prefer to reuse the pages[], rather than append to it on each > iteration. Used iov_iter_get_pages_alloc(), actually. Something like completely untested diff below, perhaps... diff --git a/block/bio.c b/block/bio.c index b38e962fa83e..b5fe23597b41 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1323,94 +1323,60 @@ struct bio *bio_map_user_iov(struct request_queue *q, const struct iov_iter *iter, gfp_t gfp_mask) { - int j; - int nr_pages = 0; - struct page **pages; struct bio *bio; - int cur_page = 0; - int ret, offset; + struct bio_vec *bvec; struct iov_iter i; - struct iovec iov; - - iov_for_each(iov, i, *iter) { - unsigned long uaddr = (unsigned long) iov.iov_base; - unsigned long len = iov.iov_len; - unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - unsigned long start = uaddr >> PAGE_SHIFT; - - /* -* Overflow, abort -*/ - if (end < start) - return ERR_PTR(-EINVAL); - - nr_pages += end - start; - /* -* buffer must be aligned to at least logical block size for now -*/ - if (uaddr & queue_dma_alignment(q)) - return ERR_PTR(-EINVAL); - } + int ret, j; - if (!nr_pages) + if (!iov_iter_count(iter)) return ERR_PTR(-EINVAL); - bio = bio_kmalloc(gfp_mask, nr_pages); + bio = bio_kmalloc(gfp_mask, iov_iter_npages(iter, BIO_MAX_PAGES)); if (!bio) return ERR_PTR(-ENOMEM); - ret = -ENOMEM; - pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask); - if (!pages) - goto out; + i = *iter; + while (iov_iter_count()) { + struct page **pages; + size_t offs; + ssize_t bytes; + int npages, j; - iov_for_each(iov, i, *iter) { - unsigned long uaddr = (unsigned long) iov.iov_base; - unsigned long len = iov.iov_len; - unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - unsigned long start = uaddr >> PAGE_SHIFT; - const int local_nr_pages = end - start; - const int page_limit = cur_page + local_nr_pages; - - ret = get_user_pages_fast(uaddr, local_nr_pages, - (iter->type & WRITE) != WRITE, - [cur_page]); - if (ret < local_nr_pages) { - ret = -EFAULT; + bytes = iov_iter_get_pages_alloc(, , LONG_MAX, ); + if (bytes <= 0) { + ret = bytes ? bytes : -EFAULT; goto out_unmap; } - offset = offset_in_page(uaddr); - for (j = cur_page; j < page_limit; j++) { - unsigned int bytes = PAGE_SIZE - offset; + npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE); - if (len <= 0) - break; + if (unlikely(offs & queue_dma_alignment(q))) { + ret = -EINVAL; + j = 0; + } else { + for (j = 0; j < npages; j++) { + struct page *page = pages[j]; + unsigned n = PAGE_SIZE - offs; + unsigned prev_bi_vcnt = bio->bi_vcnt; + + if (!bio_add_pc_page(q, bio, page, n, offs)) { + iov_iter_truncate(, 0); + break; + } + + if (bio->bi_vcnt == prev_bi_vcnt) + put_page(page); - if (bytes > len) - bytes = len; - - /* -* sorry... -*/ - if (bio_add_pc_page(q, bio, pages[j], bytes, offset) < - bytes) - break; - - len -= bytes; - offset = 0; + iov_iter_advance(, n); + bytes -= n; + offs = 0; + } } - - cur_page = j; -
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > IOW, the loop on failure exit should go through the bio, like > __bio_unmap_user() > does. We *also* need to put everything left unused in pages[], but only from > the > last iteration through iov_for_each(). > > Frankly, I would prefer to reuse the pages[], rather than append to it on each > iteration. Used iov_iter_get_pages_alloc(), actually. Something like completely untested diff below, perhaps... diff --git a/block/bio.c b/block/bio.c index b38e962fa83e..b5fe23597b41 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1323,94 +1323,60 @@ struct bio *bio_map_user_iov(struct request_queue *q, const struct iov_iter *iter, gfp_t gfp_mask) { - int j; - int nr_pages = 0; - struct page **pages; struct bio *bio; - int cur_page = 0; - int ret, offset; + struct bio_vec *bvec; struct iov_iter i; - struct iovec iov; - - iov_for_each(iov, i, *iter) { - unsigned long uaddr = (unsigned long) iov.iov_base; - unsigned long len = iov.iov_len; - unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - unsigned long start = uaddr >> PAGE_SHIFT; - - /* -* Overflow, abort -*/ - if (end < start) - return ERR_PTR(-EINVAL); - - nr_pages += end - start; - /* -* buffer must be aligned to at least logical block size for now -*/ - if (uaddr & queue_dma_alignment(q)) - return ERR_PTR(-EINVAL); - } + int ret, j; - if (!nr_pages) + if (!iov_iter_count(iter)) return ERR_PTR(-EINVAL); - bio = bio_kmalloc(gfp_mask, nr_pages); + bio = bio_kmalloc(gfp_mask, iov_iter_npages(iter, BIO_MAX_PAGES)); if (!bio) return ERR_PTR(-ENOMEM); - ret = -ENOMEM; - pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask); - if (!pages) - goto out; + i = *iter; + while (iov_iter_count()) { + struct page **pages; + size_t offs; + ssize_t bytes; + int npages, j; - iov_for_each(iov, i, *iter) { - unsigned long uaddr = (unsigned long) iov.iov_base; - unsigned long len = iov.iov_len; - unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - unsigned long start = uaddr >> PAGE_SHIFT; - const int local_nr_pages = end - start; - const int page_limit = cur_page + local_nr_pages; - - ret = get_user_pages_fast(uaddr, local_nr_pages, - (iter->type & WRITE) != WRITE, - [cur_page]); - if (ret < local_nr_pages) { - ret = -EFAULT; + bytes = iov_iter_get_pages_alloc(, , LONG_MAX, ); + if (bytes <= 0) { + ret = bytes ? bytes : -EFAULT; goto out_unmap; } - offset = offset_in_page(uaddr); - for (j = cur_page; j < page_limit; j++) { - unsigned int bytes = PAGE_SIZE - offset; + npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE); - if (len <= 0) - break; + if (unlikely(offs & queue_dma_alignment(q))) { + ret = -EINVAL; + j = 0; + } else { + for (j = 0; j < npages; j++) { + struct page *page = pages[j]; + unsigned n = PAGE_SIZE - offs; + unsigned prev_bi_vcnt = bio->bi_vcnt; + + if (!bio_add_pc_page(q, bio, page, n, offs)) { + iov_iter_truncate(, 0); + break; + } + + if (bio->bi_vcnt == prev_bi_vcnt) + put_page(page); - if (bytes > len) - bytes = len; - - /* -* sorry... -*/ - if (bio_add_pc_page(q, bio, pages[j], bytes, offset) < - bytes) - break; - - len -= bytes; - offset = 0; + iov_iter_advance(, n); + bytes -= n; + offs = 0; + } } - - cur_page = j; -
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sat, Sep 23, 2017 at 05:39:28PM +0100, Al Viro wrote: > On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote: > > bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if > > IO vector has small consecutive buffers belonging to the same page. > > bio_add_pc_page merges them into one, but the page reference is never > > dropped. > > > > Signed-off-by: Vitaly Mayatskikh> > > > diff --git a/block/bio.c b/block/bio.c > > index b38e962fa83e..10cd3b6bed27 100644 > > --- a/block/bio.c > > +++ b/block/bio.c > > @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, > > offset = offset_in_page(uaddr); > > for (j = cur_page; j < page_limit; j++) { > > unsigned int bytes = PAGE_SIZE - offset; > > + unsigned short prev_bi_vcnt = bio->bi_vcnt; > > > > if (len <= 0) > > break; > > @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, > > bytes) > > break; > > > > + /* > > +* check if vector was merged with previous > > +* drop page reference if needed > > +*/ > > + if (bio->bi_vcnt == prev_bi_vcnt) > > + put_page(pages[j]); > > + > > Except that now you've got double-puts on failure exits ;-/ IOW, the loop on failure exit should go through the bio, like __bio_unmap_user() does. We *also* need to put everything left unused in pages[], but only from the last iteration through iov_for_each(). Frankly, I would prefer to reuse the pages[], rather than append to it on each iteration. Used iov_iter_get_pages_alloc(), actually.
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Sat, Sep 23, 2017 at 05:39:28PM +0100, Al Viro wrote: > On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote: > > bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if > > IO vector has small consecutive buffers belonging to the same page. > > bio_add_pc_page merges them into one, but the page reference is never > > dropped. > > > > Signed-off-by: Vitaly Mayatskikh > > > > diff --git a/block/bio.c b/block/bio.c > > index b38e962fa83e..10cd3b6bed27 100644 > > --- a/block/bio.c > > +++ b/block/bio.c > > @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, > > offset = offset_in_page(uaddr); > > for (j = cur_page; j < page_limit; j++) { > > unsigned int bytes = PAGE_SIZE - offset; > > + unsigned short prev_bi_vcnt = bio->bi_vcnt; > > > > if (len <= 0) > > break; > > @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, > > bytes) > > break; > > > > + /* > > +* check if vector was merged with previous > > +* drop page reference if needed > > +*/ > > + if (bio->bi_vcnt == prev_bi_vcnt) > > + put_page(pages[j]); > > + > > Except that now you've got double-puts on failure exits ;-/ IOW, the loop on failure exit should go through the bio, like __bio_unmap_user() does. We *also* need to put everything left unused in pages[], but only from the last iteration through iov_for_each(). Frankly, I would prefer to reuse the pages[], rather than append to it on each iteration. Used iov_iter_get_pages_alloc(), actually.
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote: > bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if > IO vector has small consecutive buffers belonging to the same page. > bio_add_pc_page merges them into one, but the page reference is never > dropped. > > Signed-off-by: Vitaly Mayatskikh> > diff --git a/block/bio.c b/block/bio.c > index b38e962fa83e..10cd3b6bed27 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, > offset = offset_in_page(uaddr); > for (j = cur_page; j < page_limit; j++) { > unsigned int bytes = PAGE_SIZE - offset; > + unsigned short prev_bi_vcnt = bio->bi_vcnt; > > if (len <= 0) > break; > @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, > bytes) > break; > > + /* > + * check if vector was merged with previous > + * drop page reference if needed > + */ > + if (bio->bi_vcnt == prev_bi_vcnt) > + put_page(pages[j]); > + Except that now you've got double-puts on failure exits ;-/
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote: > bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if > IO vector has small consecutive buffers belonging to the same page. > bio_add_pc_page merges them into one, but the page reference is never > dropped. > > Signed-off-by: Vitaly Mayatskikh > > diff --git a/block/bio.c b/block/bio.c > index b38e962fa83e..10cd3b6bed27 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, > offset = offset_in_page(uaddr); > for (j = cur_page; j < page_limit; j++) { > unsigned int bytes = PAGE_SIZE - offset; > + unsigned short prev_bi_vcnt = bio->bi_vcnt; > > if (len <= 0) > break; > @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, > bytes) > break; > > + /* > + * check if vector was merged with previous > + * drop page reference if needed > + */ > + if (bio->bi_vcnt == prev_bi_vcnt) > + put_page(pages[j]); > + Except that now you've got double-puts on failure exits ;-/
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
Reproducer (needs SCSI disk): #include #include #include #include #include #include #include #include #include #define NR_IOS 1 #define NR_IOVECS 8 #define SG_IO 0x2285 int main(int argc, char *argv[]) { int fd, i, j; unsigned char *buf, *ptr, cdb[10]; sg_io_hdr_t io_hdr; sg_iovec_t iovec[NR_IOVECS]; if (argc < 2) { printf("Run: %s \n", argv[0]); exit(1); } buf = ptr = memalign(4096, NR_IOS * NR_IOVECS * 512); if (!buf) { printf("can't alloc memory\n"); exit(1); } fd = open(argv[1], 0); if (fd < 0) { printf("open %s failed: %d (%s)\n", argv[1], errno, strerror(errno)); exit(1); } io_hdr.interface_id = 'S'; io_hdr.cmd_len = sizeof(cdb); io_hdr.cmdp = cdb; io_hdr.dxfer_direction = SG_DXFER_FROM_DEV; io_hdr.dxfer_len = 512 * NR_IOVECS; io_hdr.dxferp = iovec; io_hdr.iovec_count = NR_IOVECS; cdb[0] = 0x28; // READ10 cdb[8] = NR_IOVECS; // sectors for (j = 0; j < NR_IOS; j++, ptr += 512) { for (i = 0; i < NR_IOVECS; i++) { iovec[i].iov_base = ptr; iovec[i].iov_len = 512; } if (ioctl(fd, SG_IO, _hdr)) { printf("IOCTL failed: %d (%s)\n", errno, strerror(errno)); exit(1); } } free(buf); close(fd); return 0; } # free -m totalusedfree shared buff/cache available Mem: 3827 463601 0 1783568 Swap: 0 0 0 # ./sgio-leak /dev/sdd # free -m totalusedfree shared buff/cache available Mem: 3827 853562 0 1783529 Swap: 0 0 0 [root@node-A ~]# free -m totalusedfree shared buff/cache available Mem: 3827 853628 0 1133561 Swap: 0 0 0 # ./sgio-leak /dev/sdd # free -m totalusedfree shared buff/cache available Mem: 3827 1243589 0 1133523 Swap: 0 0 0 -- wbr, Vitaly
Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov
Reproducer (needs SCSI disk): #include #include #include #include #include #include #include #include #include #define NR_IOS 1 #define NR_IOVECS 8 #define SG_IO 0x2285 int main(int argc, char *argv[]) { int fd, i, j; unsigned char *buf, *ptr, cdb[10]; sg_io_hdr_t io_hdr; sg_iovec_t iovec[NR_IOVECS]; if (argc < 2) { printf("Run: %s \n", argv[0]); exit(1); } buf = ptr = memalign(4096, NR_IOS * NR_IOVECS * 512); if (!buf) { printf("can't alloc memory\n"); exit(1); } fd = open(argv[1], 0); if (fd < 0) { printf("open %s failed: %d (%s)\n", argv[1], errno, strerror(errno)); exit(1); } io_hdr.interface_id = 'S'; io_hdr.cmd_len = sizeof(cdb); io_hdr.cmdp = cdb; io_hdr.dxfer_direction = SG_DXFER_FROM_DEV; io_hdr.dxfer_len = 512 * NR_IOVECS; io_hdr.dxferp = iovec; io_hdr.iovec_count = NR_IOVECS; cdb[0] = 0x28; // READ10 cdb[8] = NR_IOVECS; // sectors for (j = 0; j < NR_IOS; j++, ptr += 512) { for (i = 0; i < NR_IOVECS; i++) { iovec[i].iov_base = ptr; iovec[i].iov_len = 512; } if (ioctl(fd, SG_IO, _hdr)) { printf("IOCTL failed: %d (%s)\n", errno, strerror(errno)); exit(1); } } free(buf); close(fd); return 0; } # free -m totalusedfree shared buff/cache available Mem: 3827 463601 0 1783568 Swap: 0 0 0 # ./sgio-leak /dev/sdd # free -m totalusedfree shared buff/cache available Mem: 3827 853562 0 1783529 Swap: 0 0 0 [root@node-A ~]# free -m totalusedfree shared buff/cache available Mem: 3827 853628 0 1133561 Swap: 0 0 0 # ./sgio-leak /dev/sdd # free -m totalusedfree shared buff/cache available Mem: 3827 1243589 0 1133523 Swap: 0 0 0 -- wbr, Vitaly
[PATCH] fix unbalanced page refcounting in bio_map_user_iov
bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if IO vector has small consecutive buffers belonging to the same page. bio_add_pc_page merges them into one, but the page reference is never dropped. Signed-off-by: Vitaly Mayatskikhdiff --git a/block/bio.c b/block/bio.c index b38e962fa83e..10cd3b6bed27 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, offset = offset_in_page(uaddr); for (j = cur_page; j < page_limit; j++) { unsigned int bytes = PAGE_SIZE - offset; + unsigned short prev_bi_vcnt = bio->bi_vcnt; if (len <= 0) break; @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, bytes) break; + /* +* check if vector was merged with previous +* drop page reference if needed +*/ + if (bio->bi_vcnt == prev_bi_vcnt) + put_page(pages[j]); + len -= bytes; offset = 0; } -- wbr, Vitaly
[PATCH] fix unbalanced page refcounting in bio_map_user_iov
bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if IO vector has small consecutive buffers belonging to the same page. bio_add_pc_page merges them into one, but the page reference is never dropped. Signed-off-by: Vitaly Mayatskikh diff --git a/block/bio.c b/block/bio.c index b38e962fa83e..10cd3b6bed27 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, offset = offset_in_page(uaddr); for (j = cur_page; j < page_limit; j++) { unsigned int bytes = PAGE_SIZE - offset; + unsigned short prev_bi_vcnt = bio->bi_vcnt; if (len <= 0) break; @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, bytes) break; + /* +* check if vector was merged with previous +* drop page reference if needed +*/ + if (bio->bi_vcnt == prev_bi_vcnt) + put_page(pages[j]); + len -= bytes; offset = 0; } -- wbr, Vitaly