[PATCH v8 7/8] block: Consolidate bio_alloc_bioset(), bio_kmalloc()

2012-09-06 Thread Kent Overstreet
Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
different because there was some almost-duplicated code - this fixes
some of that.

The important change is that previously bio_kmalloc() always set
bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
bio_alloc_bioset(). This would cause bio_has_data() to return true; I
don't know if this resulted in any actual bugs but it was certainly
wrong.

bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
(BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
at least they're enforced closer together and hopefully they will be
fixed in a later patch.

This'll also help with some future cleanups - there are a fair number of
functions that allocate bios (e.g. bio_clone()), and now they don't have
to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
v7: Re-add dropped comments, improv patch description
---
 fs/bio.c| 110 ++--
 include/linux/bio.h |  16 ++--
 2 files changed, 49 insertions(+), 77 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index d807fe2..357a3af 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -55,6 +55,7 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] 
__read_mostly = {
  * IO code that does not need private memory pools.
  */
 struct bio_set *fs_bio_set;
+EXPORT_SYMBOL(fs_bio_set);
 
 /*
  * Our slab pool management
@@ -291,39 +292,58 @@ EXPORT_SYMBOL(bio_reset);
  * @bs:the bio_set to allocate from.
  *
  * Description:
- *   bio_alloc_bioset will try its own mempool to satisfy the allocation.
- *   If %__GFP_WAIT is set then we will block on the internal pool waiting
- *   for a struct bio to become free.
- **/
+ *   If @bs is NULL, uses kmalloc() to allocate the bio; else the allocation is
+ *   backed by the @bs's mempool.
+ *
+ *   When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
+ *   able to allocate a bio. This is due to the mempool guarantees. To make 
this
+ *   work, callers must never allocate more than 1 bio at a time from this 
pool.
+ *   Callers that need to allocate more than 1 bio must always submit the
+ *   previously allocated bio for IO before attempting to allocate a new one.
+ *   Failure to do so can cause deadlocks under memory pressure.
+ *
+ *   RETURNS:
+ *   Pointer to new bio on success, NULL on failure.
+ */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
+   unsigned front_pad;
+   unsigned inline_vecs;
unsigned long idx = BIO_POOL_NONE;
struct bio_vec *bvl = NULL;
struct bio *bio;
void *p;
 
-   p = mempool_alloc(bs-bio_pool, gfp_mask);
+   if (!bs) {
+   if (nr_iovecs  UIO_MAXIOV)
+   return NULL;
+
+   p = kmalloc(sizeof(struct bio) +
+   nr_iovecs * sizeof(struct bio_vec),
+   gfp_mask);
+   front_pad = 0;
+   inline_vecs = nr_iovecs;
+   } else {
+   p = mempool_alloc(bs-bio_pool, gfp_mask);
+   front_pad = bs-front_pad;
+   inline_vecs = BIO_INLINE_VECS;
+   }
+
if (unlikely(!p))
return NULL;
-   bio = p + bs-front_pad;
 
+   bio = p + front_pad;
bio_init(bio);
-   bio-bi_pool = bs;
-
-   if (unlikely(!nr_iovecs))
-   goto out_set;
 
-   if (nr_iovecs = BIO_INLINE_VECS) {
-   bvl = bio-bi_inline_vecs;
-   nr_iovecs = BIO_INLINE_VECS;
-   } else {
+   if (nr_iovecs  inline_vecs) {
bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, idx, bs);
if (unlikely(!bvl))
goto err_free;
-
-   nr_iovecs = bvec_nr_vecs(idx);
+   } else if (nr_iovecs) {
+   bvl = bio-bi_inline_vecs;
}
-out_set:
+
+   bio-bi_pool = bs;
bio-bi_flags |= idx  BIO_POOL_OFFSET;
bio-bi_max_vecs = nr_iovecs;
bio-bi_io_vec = bvl;
@@ -335,62 +355,6 @@ err_free:
 }
 EXPORT_SYMBOL(bio_alloc_bioset);
 
-/**
- * bio_alloc - allocate a new bio, memory pool backed
- * @gfp_mask: allocation mask to use
- * @nr_iovecs: number of iovecs
- *
- * bio_alloc will allocate a bio and associated bio_vec array that can hold
- * at least @nr_iovecs entries. Allocations will be done from the
- * fs_bio_set. Also see @bio_alloc_bioset and @bio_kmalloc.
- *
- * If %__GFP_WAIT is set, then bio_alloc will always be able to allocate
- * a bio. This is due to the mempool guarantees. To make this work, callers
- * must never allocate more than 1 bio at a time from this pool. Callers
- * that need to allocate more than 1 bio must always submit the previously

[PATCH v9 3/9] dm: Use bioset's front_pad for dm_rq_clone_bio_info

2012-09-06 Thread Kent Overstreet
Previously, dm_rq_clone_bio_info needed to be freed by the bio's
destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
This gets rid of a memory allocation and means we can kill
dm_rq_bio_destructor.

The _rq_bio_info_cache kmem cache is unused now and needs to be deleted,
but due to the way io_pool is used and overloaded this looks not quite
trivial so I'm leaving it for a later patch.

v6: Fix comment on struct dm_rq_clone_bio_info, per Tejun

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Alasdair Kergon a...@redhat.com
Acked-by: Tejun Heo t...@kernel.org
---
 drivers/md/dm.c | 44 
 1 file changed, 16 insertions(+), 28 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f43aaf6..f2eb730 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -86,12 +86,17 @@ struct dm_rq_target_io {
 };
 
 /*
- * For request-based dm.
- * One of these is allocated per bio.
+ * For request-based dm - the bio clones we allocate are embedded in these
+ * structs.
+ *
+ * We allocate these with bio_alloc_bioset, using the front_pad parameter when
+ * the bioset is created - this means the bio has to come at the end of the
+ * struct.
  */
 struct dm_rq_clone_bio_info {
struct bio *orig;
struct dm_rq_target_io *tio;
+   struct bio clone;
 };
 
 union map_info *dm_get_mapinfo(struct bio *bio)
@@ -211,6 +216,11 @@ struct dm_md_mempools {
 static struct kmem_cache *_io_cache;
 static struct kmem_cache *_tio_cache;
 static struct kmem_cache *_rq_tio_cache;
+
+/*
+ * Unused now, and needs to be deleted. But since io_pool is overloaded and 
it's
+ * still used for _io_cache, I'm leaving this for a later cleanup
+ */
 static struct kmem_cache *_rq_bio_info_cache;
 
 static int __init local_init(void)
@@ -467,16 +477,6 @@ static void free_rq_tio(struct dm_rq_target_io *tio)
mempool_free(tio, tio-md-tio_pool);
 }
 
-static struct dm_rq_clone_bio_info *alloc_bio_info(struct mapped_device *md)
-{
-   return mempool_alloc(md-io_pool, GFP_ATOMIC);
-}
-
-static void free_bio_info(struct dm_rq_clone_bio_info *info)
-{
-   mempool_free(info, info-tio-md-io_pool);
-}
-
 static int md_in_flight(struct mapped_device *md)
 {
return atomic_read(md-pending[READ]) +
@@ -1460,30 +1460,17 @@ void dm_dispatch_request(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(dm_dispatch_request);
 
-static void dm_rq_bio_destructor(struct bio *bio)
-{
-   struct dm_rq_clone_bio_info *info = bio-bi_private;
-   struct mapped_device *md = info-tio-md;
-
-   free_bio_info(info);
-   bio_free(bio, md-bs);
-}
-
 static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig,
 void *data)
 {
struct dm_rq_target_io *tio = data;
-   struct mapped_device *md = tio-md;
-   struct dm_rq_clone_bio_info *info = alloc_bio_info(md);
-
-   if (!info)
-   return -ENOMEM;
+   struct dm_rq_clone_bio_info *info =
+   container_of(bio, struct dm_rq_clone_bio_info, clone);
 
info-orig = bio_orig;
info-tio = tio;
bio-bi_end_io = end_clone_bio;
bio-bi_private = info;
-   bio-bi_destructor = dm_rq_bio_destructor;
 
return 0;
 }
@@ -2718,7 +2705,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned 
type, unsigned integrity)
if (!pools-tio_pool)
goto free_io_pool_and_out;
 
-   pools-bs = bioset_create(pool_size, 0);
+   pools-bs = bioset_create(pool_size,
+ offsetof(struct dm_rq_clone_bio_info, clone));
if (!pools-bs)
goto free_tio_pool_and_out;
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 9/9] block: Add bio_clone_bioset(), bio_clone_kmalloc()

2012-09-06 Thread Kent Overstreet
Previously, there was bio_clone() but it only allocated from the fs bio
set; as a result various users were open coding it and using
__bio_clone().

This changes bio_clone() to become bio_clone_bioset(), and then we add
bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
the functionality the last patch adedd.

This will also help in a later patch changing how bio cloning works.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
CC: Alasdair Kergon a...@redhat.com
CC: Boaz Harrosh bharr...@panasas.com
CC: Jeff Garzik j...@garzik.org
Acked-by: Jeff Garzik jgar...@redhat.com
---
 block/blk-core.c   |  8 +---
 drivers/block/osdblk.c |  3 +--
 drivers/md/dm-crypt.c  |  7 +--
 drivers/md/dm.c|  4 ++--
 drivers/md/md.c| 20 +---
 fs/bio.c   | 11 +++
 fs/exofs/ore.c |  5 ++---
 include/linux/bio.h| 17 ++---
 8 files changed, 29 insertions(+), 46 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b776cc9..82aab28 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2781,16 +2781,10 @@ int blk_rq_prep_clone(struct request *rq, struct 
request *rq_src,
blk_rq_init(NULL, rq);
 
__rq_for_each_bio(bio_src, rq_src) {
-   bio = bio_alloc_bioset(gfp_mask, bio_src-bi_max_vecs, bs);
+   bio = bio_clone_bioset(bio_src, gfp_mask, bs);
if (!bio)
goto free_and_out;
 
-   __bio_clone(bio, bio_src);
-
-   if (bio_integrity(bio_src) 
-   bio_integrity_clone(bio, bio_src, gfp_mask))
-   goto free_and_out;
-
if (bio_ctr  bio_ctr(bio, bio_src, data))
goto free_and_out;
 
diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
index 87311eb..1bbc681 100644
--- a/drivers/block/osdblk.c
+++ b/drivers/block/osdblk.c
@@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, 
gfp_t gfpmask)
struct bio *tmp, *new_chain = NULL, *tail = NULL;
 
while (old_chain) {
-   tmp = bio_kmalloc(gfpmask, old_chain-bi_max_vecs);
+   tmp = bio_clone_kmalloc(old_chain, gfpmask);
if (!tmp)
goto err_out;
 
-   __bio_clone(tmp, old_chain);
tmp-bi_bdev = NULL;
gfpmask = ~__GFP_WAIT;
tmp-bi_next = NULL;
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3c0acba..bbf459b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -979,19 +979,14 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t 
gfp)
 * copy the required bvecs because we need the original
 * one in order to decrypt the whole bio data *afterwards*.
 */
-   clone = bio_alloc_bioset(gfp, bio_segments(base_bio), cc-bs);
+   clone = bio_clone_bioset(base_bio, gfp, cc-bs);
if (!clone)
return 1;
 
crypt_inc_pending(io);
 
clone_init(io, clone);
-   clone-bi_idx = 0;
-   clone-bi_vcnt = bio_segments(base_bio);
-   clone-bi_size = base_bio-bi_size;
clone-bi_sector = cc-start + io-sector;
-   memcpy(clone-bi_io_vec, bio_iovec(base_bio),
-  sizeof(struct bio_vec) * clone-bi_vcnt);
 
generic_make_request(clone);
return 0;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f2eb730..41afc66 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1129,8 +1129,8 @@ static void __issue_target_request(struct clone_info *ci, 
struct dm_target *ti,
 * ci-bio-bi_max_vecs is BIO_INLINE_VECS anyway, for both flush
 * and discard, so no need for concern about wasted bvec allocations.
 */
-   clone = bio_alloc_bioset(GFP_NOIO, ci-bio-bi_max_vecs, ci-md-bs);
-   __bio_clone(clone, ci-bio);
+   clone = bio_clone_bioset(ci-bio, GFP_NOIO, ci-md-bs);
+
if (len) {
clone-bi_sector = ci-sector;
clone-bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 457ca84..7a2b079 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -173,28 +173,10 @@ EXPORT_SYMBOL_GPL(bio_alloc_mddev);
 struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev)
 {
-   struct bio *b;
-
if (!mddev || !mddev-bio_set)
return bio_clone(bio, gfp_mask);
 
-   b = bio_alloc_bioset(gfp_mask, bio-bi_max_vecs, mddev-bio_set);
-   if (!b)
-   return NULL;
-
-   __bio_clone(b, bio);
-   if (bio_integrity(bio)) {
-   int ret;
-
-   ret = bio_integrity_clone(b, bio, gfp_mask);
-
-   if (ret  0) {
-   bio_put(b);
-   return NULL;
-   }
-   }
-
-   return b;
+   return bio_clone_bioset

[PATCH v9 8/9] block: Consolidate bio_alloc_bioset(), bio_kmalloc()

2012-09-06 Thread Kent Overstreet
Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
different because there was some almost-duplicated code - this fixes
some of that.

The important change is that previously bio_kmalloc() always set
bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
bio_alloc_bioset(). This would cause bio_has_data() to return true; I
don't know if this resulted in any actual bugs but it was certainly
wrong.

bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
(BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
at least they're enforced closer together and hopefully they will be
fixed in a later patch.

This'll also help with some future cleanups - there are a fair number of
functions that allocate bios (e.g. bio_clone()), and now they don't have
to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
v7: Re-add dropped comments, improv patch description
---
 fs/bio.c| 110 ++--
 include/linux/bio.h |  16 ++--
 2 files changed, 49 insertions(+), 77 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index e6691f6..3a9e578 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -55,6 +55,7 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] 
__read_mostly = {
  * IO code that does not need private memory pools.
  */
 struct bio_set *fs_bio_set;
+EXPORT_SYMBOL(fs_bio_set);
 
 /*
  * Our slab pool management
@@ -315,39 +316,58 @@ EXPORT_SYMBOL(bio_reset);
  * @bs:the bio_set to allocate from.
  *
  * Description:
- *   bio_alloc_bioset will try its own mempool to satisfy the allocation.
- *   If %__GFP_WAIT is set then we will block on the internal pool waiting
- *   for a struct bio to become free.
- **/
+ *   If @bs is NULL, uses kmalloc() to allocate the bio; else the allocation is
+ *   backed by the @bs's mempool.
+ *
+ *   When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
+ *   able to allocate a bio. This is due to the mempool guarantees. To make 
this
+ *   work, callers must never allocate more than 1 bio at a time from this 
pool.
+ *   Callers that need to allocate more than 1 bio must always submit the
+ *   previously allocated bio for IO before attempting to allocate a new one.
+ *   Failure to do so can cause deadlocks under memory pressure.
+ *
+ *   RETURNS:
+ *   Pointer to new bio on success, NULL on failure.
+ */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
+   unsigned front_pad;
+   unsigned inline_vecs;
unsigned long idx = BIO_POOL_NONE;
struct bio_vec *bvl = NULL;
struct bio *bio;
void *p;
 
-   p = mempool_alloc(bs-bio_pool, gfp_mask);
+   if (!bs) {
+   if (nr_iovecs  UIO_MAXIOV)
+   return NULL;
+
+   p = kmalloc(sizeof(struct bio) +
+   nr_iovecs * sizeof(struct bio_vec),
+   gfp_mask);
+   front_pad = 0;
+   inline_vecs = nr_iovecs;
+   } else {
+   p = mempool_alloc(bs-bio_pool, gfp_mask);
+   front_pad = bs-front_pad;
+   inline_vecs = BIO_INLINE_VECS;
+   }
+
if (unlikely(!p))
return NULL;
-   bio = p + bs-front_pad;
 
+   bio = p + front_pad;
bio_init(bio);
-   bio-bi_pool = bs;
-
-   if (unlikely(!nr_iovecs))
-   goto out_set;
 
-   if (nr_iovecs = BIO_INLINE_VECS) {
-   bvl = bio-bi_inline_vecs;
-   nr_iovecs = BIO_INLINE_VECS;
-   } else {
+   if (nr_iovecs  inline_vecs) {
bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, idx, bs);
if (unlikely(!bvl))
goto err_free;
-
-   nr_iovecs = bvec_nr_vecs(idx);
+   } else if (nr_iovecs) {
+   bvl = bio-bi_inline_vecs;
}
-out_set:
+
+   bio-bi_pool = bs;
bio-bi_flags |= idx  BIO_POOL_OFFSET;
bio-bi_max_vecs = nr_iovecs;
bio-bi_io_vec = bvl;
@@ -359,62 +379,6 @@ err_free:
 }
 EXPORT_SYMBOL(bio_alloc_bioset);
 
-/**
- * bio_alloc - allocate a new bio, memory pool backed
- * @gfp_mask: allocation mask to use
- * @nr_iovecs: number of iovecs
- *
- * bio_alloc will allocate a bio and associated bio_vec array that can hold
- * at least @nr_iovecs entries. Allocations will be done from the
- * fs_bio_set. Also see @bio_alloc_bioset and @bio_kmalloc.
- *
- * If %__GFP_WAIT is set, then bio_alloc will always be able to allocate
- * a bio. This is due to the mempool guarantees. To make this work, callers
- * must never allocate more than 1 bio at a time from this pool. Callers
- * that need to allocate more than 1 bio must always submit the previously

[PATCH v8 8/8] block: Add bio_clone_bioset(), bio_clone_kmalloc()

2012-09-06 Thread Kent Overstreet
Previously, there was bio_clone() but it only allocated from the fs bio
set; as a result various users were open coding it and using
__bio_clone().

This changes bio_clone() to become bio_clone_bioset(), and then we add
bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
the functionality the last patch adedd.

This will also help in a later patch changing how bio cloning works.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
CC: Alasdair Kergon a...@redhat.com
CC: Boaz Harrosh bharr...@panasas.com
CC: Jeff Garzik j...@garzik.org
Acked-by: Jeff Garzik jgar...@redhat.com
---
 block/blk-core.c   |  8 +---
 drivers/block/osdblk.c |  3 +--
 drivers/md/dm-crypt.c  |  7 +--
 drivers/md/dm.c|  4 ++--
 drivers/md/md.c| 20 +---
 fs/bio.c   | 11 +++
 fs/exofs/ore.c |  5 ++---
 include/linux/bio.h| 17 ++---
 8 files changed, 29 insertions(+), 46 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b776cc9..82aab28 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2781,16 +2781,10 @@ int blk_rq_prep_clone(struct request *rq, struct 
request *rq_src,
blk_rq_init(NULL, rq);
 
__rq_for_each_bio(bio_src, rq_src) {
-   bio = bio_alloc_bioset(gfp_mask, bio_src-bi_max_vecs, bs);
+   bio = bio_clone_bioset(bio_src, gfp_mask, bs);
if (!bio)
goto free_and_out;
 
-   __bio_clone(bio, bio_src);
-
-   if (bio_integrity(bio_src) 
-   bio_integrity_clone(bio, bio_src, gfp_mask))
-   goto free_and_out;
-
if (bio_ctr  bio_ctr(bio, bio_src, data))
goto free_and_out;
 
diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
index 87311eb..1bbc681 100644
--- a/drivers/block/osdblk.c
+++ b/drivers/block/osdblk.c
@@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, 
gfp_t gfpmask)
struct bio *tmp, *new_chain = NULL, *tail = NULL;
 
while (old_chain) {
-   tmp = bio_kmalloc(gfpmask, old_chain-bi_max_vecs);
+   tmp = bio_clone_kmalloc(old_chain, gfpmask);
if (!tmp)
goto err_out;
 
-   __bio_clone(tmp, old_chain);
tmp-bi_bdev = NULL;
gfpmask = ~__GFP_WAIT;
tmp-bi_next = NULL;
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3c0acba..bbf459b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -979,19 +979,14 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t 
gfp)
 * copy the required bvecs because we need the original
 * one in order to decrypt the whole bio data *afterwards*.
 */
-   clone = bio_alloc_bioset(gfp, bio_segments(base_bio), cc-bs);
+   clone = bio_clone_bioset(base_bio, gfp, cc-bs);
if (!clone)
return 1;
 
crypt_inc_pending(io);
 
clone_init(io, clone);
-   clone-bi_idx = 0;
-   clone-bi_vcnt = bio_segments(base_bio);
-   clone-bi_size = base_bio-bi_size;
clone-bi_sector = cc-start + io-sector;
-   memcpy(clone-bi_io_vec, bio_iovec(base_bio),
-  sizeof(struct bio_vec) * clone-bi_vcnt);
 
generic_make_request(clone);
return 0;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f2eb730..41afc66 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1129,8 +1129,8 @@ static void __issue_target_request(struct clone_info *ci, 
struct dm_target *ti,
 * ci-bio-bi_max_vecs is BIO_INLINE_VECS anyway, for both flush
 * and discard, so no need for concern about wasted bvec allocations.
 */
-   clone = bio_alloc_bioset(GFP_NOIO, ci-bio-bi_max_vecs, ci-md-bs);
-   __bio_clone(clone, ci-bio);
+   clone = bio_clone_bioset(ci-bio, GFP_NOIO, ci-md-bs);
+
if (len) {
clone-bi_sector = ci-sector;
clone-bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 457ca84..7a2b079 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -173,28 +173,10 @@ EXPORT_SYMBOL_GPL(bio_alloc_mddev);
 struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev)
 {
-   struct bio *b;
-
if (!mddev || !mddev-bio_set)
return bio_clone(bio, gfp_mask);
 
-   b = bio_alloc_bioset(gfp_mask, bio-bi_max_vecs, mddev-bio_set);
-   if (!b)
-   return NULL;
-
-   __bio_clone(b, bio);
-   if (bio_integrity(bio)) {
-   int ret;
-
-   ret = bio_integrity_clone(b, bio, gfp_mask);
-
-   if (ret  0) {
-   bio_put(b);
-   return NULL;
-   }
-   }
-
-   return b;
+   return bio_clone_bioset

[PATCH v9 7/9] block: Kill bi_destructor

2012-09-06 Thread Kent Overstreet
Now that we've got generic code for freeing bios allocated from bio
pools, this isn't needed anymore.

This patch also makes bio_free() static, since without bi_destructor
there should be no need for it to be called anywhere else.

bio_free() is now only called from bio_put, so we can refactor those a
bit - move some code from bio_put() to bio_free() and kill the redundant
bio-bi_next = NULL.

v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
v7: No #define BIO_KMALLOC_POOL anymore

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 Documentation/block/biodoc.txt |  5 
 block/blk-core.c   |  2 +-
 fs/bio.c   | 64 +-
 include/linux/bio.h|  1 -
 include/linux/blk_types.h  |  3 --
 5 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index e418dc0..8df5e8e 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -465,7 +465,6 @@ struct bio {
bio_end_io_t*bi_end_io;  /* bi_end_io (bio) */
atomic_tbi_cnt;  /* pin count: free when it hits 
zero */
void *bi_private;
-   bio_destructor_t *bi_destructor; /* bi_destructor (bio) */
 };
 
 With this multipage bio design:
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for 
different size biovecs,
 so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
 given size from these slabs.
 
-The bi_destructor() routine takes into account the possibility of the bio
-having originated from a different source (see later discussions on
-n/w to block transfers and kvec_cb)
-
 The bio_get() routine may be used to hold an extra reference on a bio prior
 to i/o submission, if the bio fields are likely to be accessed after the
 i/o is issued (since the bio may otherwise get freed in case i/o completion
diff --git a/block/blk-core.c b/block/blk-core.c
index 95c4935..b776cc9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2807,7 +2807,7 @@ int blk_rq_prep_clone(struct request *rq, struct request 
*rq_src,
 
 free_and_out:
if (bio)
-   bio_free(bio, bs);
+   bio_put(bio);
blk_rq_unprep_clone(rq);
 
return -ENOMEM;
diff --git a/fs/bio.c b/fs/bio.c
index 74ab3f0..e6691f6 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -233,26 +233,37 @@ fallback:
return bvl;
 }
 
-void bio_free(struct bio *bio, struct bio_set *bs)
+static void __bio_free(struct bio *bio)
 {
-   void *p;
-
-   if (bio_has_allocated_vec(bio))
-   bvec_free_bs(bs, bio-bi_io_vec, BIO_POOL_IDX(bio));
+   bio_disassociate_task(bio);
 
if (bio_integrity(bio))
bio_integrity_free(bio);
+}
 
-   /*
-* If we have front padding, adjust the bio pointer before freeing
-*/
-   p = bio;
-   if (bs-front_pad)
+static void bio_free(struct bio *bio)
+{
+   struct bio_set *bs = bio-bi_pool;
+   void *p;
+
+   __bio_free(bio);
+
+   if (bs) {
+   if (bio_has_allocated_vec(bio))
+   bvec_free_bs(bs, bio-bi_io_vec, BIO_POOL_IDX(bio));
+
+   /*
+* If we have front padding, adjust the bio pointer before 
freeing
+*/
+   p = bio;
p -= bs-front_pad;
 
-   mempool_free(p, bs-bio_pool);
+   mempool_free(p, bs-bio_pool);
+   } else {
+   /* Bio was allocated by bio_kmalloc() */
+   kfree(bio);
+   }
 }
-EXPORT_SYMBOL(bio_free);
 
 void bio_init(struct bio *bio)
 {
@@ -290,10 +301,7 @@ void bio_reset(struct bio *bio)
 {
unsigned long flags = bio-bi_flags  (~0UL  BIO_RESET_BITS);
 
-   if (bio_integrity(bio))
-   bio_integrity_free(bio, bio-bi_pool);
-
-   bio_disassociate_task(bio);
+   __bio_free(bio);
 
memset(bio, 0, BIO_RESET_BYTES);
bio-bi_flags = flags|(1  BIO_UPTODATE);
@@ -376,13 +384,6 @@ struct bio *bio_alloc(gfp_t gfp_mask, unsigned int 
nr_iovecs)
 }
 EXPORT_SYMBOL(bio_alloc);
 
-static void bio_kmalloc_destructor(struct bio *bio)
-{
-   if (bio_integrity(bio))
-   bio_integrity_free(bio);
-   kfree(bio);
-}
-
 /**
  * bio_kmalloc - allocate a bio for I/O using kmalloc()
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -409,7 +410,6 @@ struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int 
nr_iovecs)
bio-bi_flags |= BIO_POOL_NONE  BIO_POOL_OFFSET;
bio-bi_max_vecs = nr_iovecs;
bio-bi_io_vec = bio-bi_inline_vecs;
-   bio-bi_destructor = bio_kmalloc_destructor;
 
return bio;
 }
@@ -445,20 +445,8 @@ void bio_put(struct bio *bio)
/*
 * last put frees it
 */
-   if (atomic_dec_and_test(bio-bi_cnt

[PATCH v8 6/8] block: Kill bi_destructor

2012-09-06 Thread Kent Overstreet
Now that we've got generic code for freeing bios allocated from bio
pools, this isn't needed anymore.

This patch also makes bio_free() static, since without bi_destructor
there should be no need for it to be called anywhere else.

While we're at it, since bio_free() is now only called from one place,
refactor it a bit and pull out __bio_free() for bio_reset() to use.

v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
v7: No #define BIO_KMALLOC_POOL anymore

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 Documentation/block/biodoc.txt |  5 
 block/blk-core.c   |  2 +-
 fs/bio.c   | 64 +-
 include/linux/bio.h|  1 -
 include/linux/blk_types.h  |  3 --
 5 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index e418dc0..8df5e8e 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -465,7 +465,6 @@ struct bio {
bio_end_io_t*bi_end_io;  /* bi_end_io (bio) */
atomic_tbi_cnt;  /* pin count: free when it hits 
zero */
void *bi_private;
-   bio_destructor_t *bi_destructor; /* bi_destructor (bio) */
 };
 
 With this multipage bio design:
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for 
different size biovecs,
 so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
 given size from these slabs.
 
-The bi_destructor() routine takes into account the possibility of the bio
-having originated from a different source (see later discussions on
-n/w to block transfers and kvec_cb)
-
 The bio_get() routine may be used to hold an extra reference on a bio prior
 to i/o submission, if the bio fields are likely to be accessed after the
 i/o is issued (since the bio may otherwise get freed in case i/o completion
diff --git a/block/blk-core.c b/block/blk-core.c
index 95c4935..b776cc9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2807,7 +2807,7 @@ int blk_rq_prep_clone(struct request *rq, struct request 
*rq_src,
 
 free_and_out:
if (bio)
-   bio_free(bio, bs);
+   bio_put(bio);
blk_rq_unprep_clone(rq);
 
return -ENOMEM;
diff --git a/fs/bio.c b/fs/bio.c
index 208141f..d807fe2 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -233,26 +233,37 @@ fallback:
return bvl;
 }
 
-void bio_free(struct bio *bio, struct bio_set *bs)
+static void __bio_free(struct bio *bio)
 {
-   void *p;
-
-   if (bio_has_allocated_vec(bio))
-   bvec_free_bs(bs, bio-bi_io_vec, BIO_POOL_IDX(bio));
+   bio_disassociate_task(bio);
 
if (bio_integrity(bio))
bio_integrity_free(bio);
+}
 
-   /*
-* If we have front padding, adjust the bio pointer before freeing
-*/
-   p = bio;
-   if (bs-front_pad)
+static void bio_free(struct bio *bio)
+{
+   struct bio_set *bs = bio-bi_pool;
+   void *p;
+
+   __bio_free(bio);
+
+   if (bs) {
+   if (bio_has_allocated_vec(bio))
+   bvec_free_bs(bs, bio-bi_io_vec, BIO_POOL_IDX(bio));
+
+   /*
+* If we have front padding, adjust the bio pointer before 
freeing
+*/
+   p = bio;
p -= bs-front_pad;
 
-   mempool_free(p, bs-bio_pool);
+   mempool_free(p, bs-bio_pool);
+   } else {
+   /* Bio was allocated by bio_kmalloc() */
+   kfree(bio);
+   }
 }
-EXPORT_SYMBOL(bio_free);
 
 void bio_init(struct bio *bio)
 {
@@ -266,10 +277,7 @@ void bio_reset(struct bio *bio)
 {
unsigned long flags = bio-bi_flags  (~0UL  BIO_RESET_BITS);
 
-   if (bio_integrity(bio))
-   bio_integrity_free(bio, bio-bi_pool);
-
-   bio_disassociate_task(bio);
+   __bio_free(bio);
 
memset(bio, 0, BIO_RESET_BYTES);
bio-bi_flags = flags|(1  BIO_UPTODATE);
@@ -352,13 +360,6 @@ struct bio *bio_alloc(gfp_t gfp_mask, unsigned int 
nr_iovecs)
 }
 EXPORT_SYMBOL(bio_alloc);
 
-static void bio_kmalloc_destructor(struct bio *bio)
-{
-   if (bio_integrity(bio))
-   bio_integrity_free(bio);
-   kfree(bio);
-}
-
 /**
  * bio_kmalloc - allocate a bio for I/O using kmalloc()
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -385,7 +386,6 @@ struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int 
nr_iovecs)
bio-bi_flags |= BIO_POOL_NONE  BIO_POOL_OFFSET;
bio-bi_max_vecs = nr_iovecs;
bio-bi_io_vec = bio-bi_inline_vecs;
-   bio-bi_destructor = bio_kmalloc_destructor;
 
return bio;
 }
@@ -421,20 +421,8 @@ void bio_put(struct bio *bio)
/*
 * last put frees it
 */
-   if (atomic_dec_and_test(bio-bi_cnt)) {
-   bio_disassociate_task(bio

[PATCH v9 2/9] block: Ues bi_pool for bio_integrity_alloc()

2012-09-06 Thread Kent Overstreet
Now that bios keep track of where they were allocated from,
bio_integrity_alloc_bioset() becomes redundant.

Remove bio_integrity_alloc_bioset() and drop bio_set argument from the
related functions and make them use bio-bi_pool.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: Martin K. Petersen martin.peter...@oracle.com
Acked-by: Tejun Heo t...@kernel.org
---
 block/blk-core.c|  2 +-
 drivers/md/dm.c |  4 ++--
 drivers/md/md.c |  2 +-
 fs/bio-integrity.c  | 44 +++-
 fs/bio.c|  6 +++---
 include/linux/bio.h |  9 -
 6 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 4b4dbdf..95c4935 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2788,7 +2788,7 @@ int blk_rq_prep_clone(struct request *rq, struct request 
*rq_src,
__bio_clone(bio, bio_src);
 
if (bio_integrity(bio_src) 
-   bio_integrity_clone(bio, bio_src, gfp_mask, bs))
+   bio_integrity_clone(bio, bio_src, gfp_mask))
goto free_and_out;
 
if (bio_ctr  bio_ctr(bio, bio_src, data))
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 0c3d6dd..f43aaf6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1068,7 +1068,7 @@ static struct bio *split_bvec(struct bio *bio, sector_t 
sector,
clone-bi_flags |= 1  BIO_CLONED;
 
if (bio_integrity(bio)) {
-   bio_integrity_clone(clone, bio, GFP_NOIO, bs);
+   bio_integrity_clone(clone, bio, GFP_NOIO);
bio_integrity_trim(clone,
   bio_sector_offset(bio, idx, offset), len);
}
@@ -1094,7 +1094,7 @@ static struct bio *clone_bio(struct bio *bio, sector_t 
sector,
clone-bi_flags = ~(1  BIO_SEG_VALID);
 
if (bio_integrity(bio)) {
-   bio_integrity_clone(clone, bio, GFP_NOIO, bs);
+   bio_integrity_clone(clone, bio, GFP_NOIO);
 
if (idx != bio-bi_idx || clone-bi_size  bio-bi_size)
bio_integrity_trim(clone,
diff --git a/drivers/md/md.c b/drivers/md/md.c
index b8eebe3..457ca84 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -186,7 +186,7 @@ struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
if (bio_integrity(bio)) {
int ret;
 
-   ret = bio_integrity_clone(b, bio, gfp_mask, mddev-bio_set);
+   ret = bio_integrity_clone(b, bio, gfp_mask);
 
if (ret  0) {
bio_put(b);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index e85c04b..a3f28f3 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -70,23 +70,25 @@ static inline int use_bip_pool(unsigned int idx)
 }
 
 /**
- * bio_integrity_alloc_bioset - Allocate integrity payload and attach it to bio
+ * bio_integrity_alloc - Allocate integrity payload and attach it to bio
  * @bio:   bio to attach integrity metadata to
  * @gfp_mask:  Memory allocation mask
  * @nr_vecs:   Number of integrity metadata scatter-gather elements
- * @bs:bio_set to allocate from
  *
  * Description: This function prepares a bio for attaching integrity
  * metadata.  nr_vecs specifies the maximum number of pages containing
  * integrity metadata that can be attached.
  */
-struct bio_integrity_payload *bio_integrity_alloc_bioset(struct bio *bio,
-gfp_t gfp_mask,
-unsigned int nr_vecs,
-struct bio_set *bs)
+struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
+ gfp_t gfp_mask,
+ unsigned int nr_vecs)
 {
struct bio_integrity_payload *bip;
unsigned int idx = vecs_to_idx(nr_vecs);
+   struct bio_set *bs = bio-bi_pool;
+
+   if (!bs)
+   bs = fs_bio_set;
 
BUG_ON(bio == NULL);
bip = NULL;
@@ -114,37 +116,22 @@ struct bio_integrity_payload 
*bio_integrity_alloc_bioset(struct bio *bio,
 
return bip;
 }
-EXPORT_SYMBOL(bio_integrity_alloc_bioset);
-
-/**
- * bio_integrity_alloc - Allocate integrity payload and attach it to bio
- * @bio:   bio to attach integrity metadata to
- * @gfp_mask:  Memory allocation mask
- * @nr_vecs:   Number of integrity metadata scatter-gather elements
- *
- * Description: This function prepares a bio for attaching integrity
- * metadata.  nr_vecs specifies the maximum number of pages containing
- * integrity metadata that can be attached.
- */
-struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
- gfp_t gfp_mask,
- unsigned int nr_vecs)
-{
-   return

[PATCH v9 6/9] block: Add bio_reset()

2012-09-06 Thread Kent Overstreet
Reusing bios is something that's been highly frowned upon in the past,
but driver code keeps doing it anyways. If it's going to happen anyways,
we should provide a generic method.

This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
was open coding it, by doing a bio_init() and resetting bi_destructor.

This required reordering struct bio, but the block layer is not yet
nearly fast enough for any cacheline effects to matter here.

v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
bio-bi_flags are saved.
v6: Further commenting verbosity, per Tejun
v9: Add a function comment

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
Acked-by: Tejun Heo t...@kernel.org
---
 fs/bio.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index 208141f..74ab3f0 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -277,6 +277,30 @@ void bio_reset(struct bio *bio)
 EXPORT_SYMBOL(bio_reset);
 
 /**
+ * bio_reset - reinitialize a bio
+ * @bio:   bio to reset
+ *
+ * Description:
+ *   After calling bio_reset(), @bio will be in the same state as a freshly
+ *   allocated bio returned bio bio_alloc_bioset() - the only fields that are
+ *   preserved are the ones that are initialized by bio_alloc_bioset(). See
+ *   comment in struct bio.
+ */
+void bio_reset(struct bio *bio)
+{
+   unsigned long flags = bio-bi_flags  (~0UL  BIO_RESET_BITS);
+
+   if (bio_integrity(bio))
+   bio_integrity_free(bio, bio-bi_pool);
+
+   bio_disassociate_task(bio);
+
+   memset(bio, 0, BIO_RESET_BYTES);
+   bio-bi_flags = flags|(1  BIO_UPTODATE);
+}
+EXPORT_SYMBOL(bio_reset);
+
+/**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
  * @nr_iovecs: number of iovecs to pre-allocate
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 5/9] pktcdvd: Switch to bio_kmalloc()

2012-09-06 Thread Kent Overstreet
This is prep work for killing bi_destructor - previously, pktcdvd had
its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
necessitating its own bi_destructor implementation.

v5: Un-reorder some functions, to make the patch easier to review

Signed-off-by: Kent Overstreet koverstr...@google.com
Acked-by: Jiri Kosina jkos...@suse.cz
---
 drivers/block/pktcdvd.c | 52 +++--
 1 file changed, 7 insertions(+), 45 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ba66e44..2e7de7a 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -522,38 +522,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
}
 }
 
-static void pkt_bio_destructor(struct bio *bio)
-{
-   kfree(bio-bi_io_vec);
-   kfree(bio);
-}
-
-static struct bio *pkt_bio_alloc(int nr_iovecs)
-{
-   struct bio_vec *bvl = NULL;
-   struct bio *bio;
-
-   bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
-   if (!bio)
-   goto no_bio;
-   bio_init(bio);
-
-   bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
-   if (!bvl)
-   goto no_bvl;
-
-   bio-bi_max_vecs = nr_iovecs;
-   bio-bi_io_vec = bvl;
-   bio-bi_destructor = pkt_bio_destructor;
-
-   return bio;
-
- no_bvl:
-   kfree(bio);
- no_bio:
-   return NULL;
-}
-
 /*
  * Allocate a packet_data struct
  */
@@ -567,7 +535,7 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
goto no_pkt;
 
pkt-frames = frames;
-   pkt-w_bio = pkt_bio_alloc(frames);
+   pkt-w_bio = bio_kmalloc(GFP_KERNEL, frames);
if (!pkt-w_bio)
goto no_bio;
 
@@ -581,9 +549,10 @@ static struct packet_data *pkt_alloc_packet_data(int 
frames)
bio_list_init(pkt-orig_bios);
 
for (i = 0; i  frames; i++) {
-   struct bio *bio = pkt_bio_alloc(1);
+   struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
if (!bio)
goto no_rd_bio;
+
pkt-r_bios[i] = bio;
}
 
@@ -,21 +1080,17 @@ static void pkt_gather_data(struct pktcdvd_device *pd, 
struct packet_data *pkt)
 * Schedule reads for missing parts of the packet.
 */
for (f = 0; f  pkt-frames; f++) {
-   struct bio_vec *vec;
-
int p, offset;
+
if (written[f])
continue;
+
bio = pkt-r_bios[f];
-   vec = bio-bi_io_vec;
-   bio_init(bio);
-   bio-bi_max_vecs = 1;
+   bio_reset(bio);
bio-bi_sector = pkt-sector + f * (CD_FRAMESIZE  9);
bio-bi_bdev = pd-bdev;
bio-bi_end_io = pkt_end_io_read;
bio-bi_private = pkt;
-   bio-bi_io_vec = vec;
-   bio-bi_destructor = pkt_bio_destructor;
 
p = (f * CD_FRAMESIZE) / PAGE_SIZE;
offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
@@ -1418,14 +1383,11 @@ static void pkt_start_write(struct pktcdvd_device *pd, 
struct packet_data *pkt)
}
 
/* Start the write request */
-   bio_init(pkt-w_bio);
-   pkt-w_bio-bi_max_vecs = PACKET_MAX_SIZE;
+   bio_reset(pkt-w_bio);
pkt-w_bio-bi_sector = pkt-sector;
pkt-w_bio-bi_bdev = pd-bdev;
pkt-w_bio-bi_end_io = pkt_end_io_packet_write;
pkt-w_bio-bi_private = pkt;
-   pkt-w_bio-bi_io_vec = bvec;
-   pkt-w_bio-bi_destructor = pkt_bio_destructor;
for (f = 0; f  pkt-frames; f++)
if (!bio_add_page(pkt-w_bio, bvec[f].bv_page, CD_FRAMESIZE, 
bvec[f].bv_offset))
BUG();
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 4/9] block: Add bio_reset()

2012-09-06 Thread Kent Overstreet
Reusing bios is something that's been highly frowned upon in the past,
but driver code keeps doing it anyways. If it's going to happen anyways,
we should provide a generic method.

This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
was open coding it, by doing a bio_init() and resetting bi_destructor.

v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
bio-bi_flags are saved.
v6: Further commenting verbosity, per Tejun

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
Acked-by: Tejun Heo t...@kernel.org
---
 fs/bio.c  | 14 ++
 include/linux/bio.h   |  1 +
 include/linux/blk_types.h | 25 +++--
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index b14f71a..208141f 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -262,6 +262,20 @@ void bio_init(struct bio *bio)
 }
 EXPORT_SYMBOL(bio_init);
 
+void bio_reset(struct bio *bio)
+{
+   unsigned long flags = bio-bi_flags  (~0UL  BIO_RESET_BITS);
+
+   if (bio_integrity(bio))
+   bio_integrity_free(bio, bio-bi_pool);
+
+   bio_disassociate_task(bio);
+
+   memset(bio, 0, BIO_RESET_BYTES);
+   bio-bi_flags = flags|(1  BIO_UPTODATE);
+}
+EXPORT_SYMBOL(bio_reset);
+
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a11f74b..76f6c25 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -226,6 +226,7 @@ extern void __bio_clone(struct bio *, struct bio *);
 extern struct bio *bio_clone(struct bio *, gfp_t);
 
 extern void bio_init(struct bio *);
+extern void bio_reset(struct bio *);
 
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned 
int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index af9dd9d..1b607c2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -59,12 +59,6 @@ struct bio {
unsigned intbi_seg_front_size;
unsigned intbi_seg_back_size;
 
-   unsigned intbi_max_vecs;/* max bvl_vecs we can hold */
-
-   atomic_tbi_cnt; /* pin count */
-
-   struct bio_vec  *bi_io_vec; /* the actual vec list */
-
bio_end_io_t*bi_end_io;
 
void*bi_private;
@@ -80,6 +74,16 @@ struct bio {
struct bio_integrity_payload *bi_integrity;  /* data integrity */
 #endif
 
+   /*
+* Everything starting with bi_max_vecs will be preserved by bio_reset()
+*/
+
+   unsigned intbi_max_vecs;/* max bvl_vecs we can hold */
+
+   atomic_tbi_cnt; /* pin count */
+
+   struct bio_vec  *bi_io_vec; /* the actual vec list */
+
/* If bi_pool is non NULL, bi_destructor is not called */
struct bio_set  *bi_pool;
 
@@ -93,6 +97,8 @@ struct bio {
struct bio_vec  bi_inline_vecs[0];
 };
 
+#define BIO_RESET_BYTESoffsetof(struct bio, bi_max_vecs)
+
 /*
  * bio flags
  */
@@ -108,6 +114,13 @@ struct bio {
 #define BIO_FS_INTEGRITY 9 /* fs owns integrity data, not block layer */
 #define BIO_QUIET  10  /* Make BIO Quiet */
 #define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
+
+/*
+ * Flags starting here get preserved by bio_reset() - this includes
+ * BIO_POOL_IDX()
+ */
+#define BIO_RESET_BITS 12
+
 #define bio_flagged(bio, flag) ((bio)-bi_flags  (1  (flag)))
 
 /*
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 1/9] block: Generalized bio pool freeing

2012-09-06 Thread Kent Overstreet
With the old code, when you allocate a bio from a bio pool you have to
implement your own destructor that knows how to find the bio pool the
bio was originally allocated from.

This adds a new field to struct bio (bi_pool) and changes
bio_alloc_bioset() to use it. This makes various bio destructors
unnecessary, so they're then deleted.

v6: Explain the temporary if statement in bio_put

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
CC: Alasdair Kergon a...@redhat.com
CC: Nicholas Bellinger n...@linux-iscsi.org
CC: Lars Ellenberg lars.ellenb...@linbit.com
Acked-by: Tejun Heo t...@kernel.org
Acked-by: Nicholas Bellinger n...@linux-iscsi.org
---
 drivers/block/drbd/drbd_main.c  | 13 +
 drivers/md/dm-crypt.c   |  9 -
 drivers/md/dm-io.c  | 11 ---
 drivers/md/dm.c | 20 
 drivers/md/md.c | 28 
 drivers/target/target_core_iblock.c |  9 -
 fs/bio.c| 31 +--
 include/linux/blk_types.h   |  3 +++
 8 files changed, 21 insertions(+), 103 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index f93a032..f55683a 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -162,23 +162,12 @@ static const struct block_device_operations drbd_ops = {
.release = drbd_release,
 };
 
-static void bio_destructor_drbd(struct bio *bio)
-{
-   bio_free(bio, drbd_md_io_bio_set);
-}
-
 struct bio *bio_alloc_drbd(gfp_t gfp_mask)
 {
-   struct bio *bio;
-
if (!drbd_md_io_bio_set)
return bio_alloc(gfp_mask, 1);
 
-   bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
-   if (!bio)
-   return NULL;
-   bio-bi_destructor = bio_destructor_drbd;
-   return bio;
+   return bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
 }
 
 #ifdef __CHECKER__
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 664743d..3c0acba 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -798,14 +798,6 @@ static int crypt_convert(struct crypt_config *cc,
return 0;
 }
 
-static void dm_crypt_bio_destructor(struct bio *bio)
-{
-   struct dm_crypt_io *io = bio-bi_private;
-   struct crypt_config *cc = io-cc;
-
-   bio_free(bio, cc-bs);
-}
-
 /*
  * Generate a new unfragmented bio with the given size
  * This should never violate the device limitations
@@ -974,7 +966,6 @@ static void clone_init(struct dm_crypt_io *io, struct bio 
*clone)
clone-bi_end_io  = crypt_endio;
clone-bi_bdev= cc-dev-bdev;
clone-bi_rw  = io-base_bio-bi_rw;
-   clone-bi_destructor = dm_crypt_bio_destructor;
 }
 
 static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index ea5dd28..1c46f97 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -249,16 +249,6 @@ static void vm_dp_init(struct dpages *dp, void *data)
dp-context_ptr = data;
 }
 
-static void dm_bio_destructor(struct bio *bio)
-{
-   unsigned region;
-   struct io *io;
-
-   retrieve_io_and_region_from_bio(bio, io, region);
-
-   bio_free(bio, io-client-bios);
-}
-
 /*
  * Functions for getting the pages from kernel memory.
  */
@@ -317,7 +307,6 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
bio-bi_sector = where-sector + (where-count - remaining);
bio-bi_bdev = where-bdev;
bio-bi_end_io = endio;
-   bio-bi_destructor = dm_bio_destructor;
store_io_and_region_in_bio(bio, io, region);
 
if (rw  REQ_DISCARD) {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4e09b6f..0c3d6dd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -681,11 +681,6 @@ static void clone_endio(struct bio *bio, int error)
}
}
 
-   /*
-* Store md for cleanup instead of tio which is about to get freed.
-*/
-   bio-bi_private = md-bs;
-
free_tio(md, tio);
bio_put(bio);
dec_pending(io, error);
@@ -1032,11 +1027,6 @@ static void __map_bio(struct dm_target *ti, struct bio 
*clone,
/* error the io and bail out, or requeue it if needed */
md = tio-io-md;
dec_pending(tio-io, r);
-   /*
-* Store bio_set for cleanup.
-*/
-   clone-bi_end_io = NULL;
-   clone-bi_private = md-bs;
bio_put(clone);
free_tio(md, tio);
} else if (r) {
@@ -1055,13 +1045,6 @@ struct clone_info {
unsigned short idx;
 };
 
-static void dm_bio_destructor(struct bio *bio)
-{
-   struct bio_set *bs = bio-bi_private;
-
-   bio_free(bio, bs

Re: [dm-devel] [PATCH v8 3/8] dm: Use bioset's front_pad for dm_rq_clone_bio_info

2012-09-06 Thread Kent Overstreet
On Thu, Sep 06, 2012 at 12:21:15PM +0900, Jun'ichi Nomura wrote:
 On 09/06/12 05:27, Kent Overstreet wrote:
  @@ -2718,7 +2705,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned 
  type, unsigned integrity)
  if (!pools-tio_pool)
  goto free_io_pool_and_out;
   
  -   pools-bs = bioset_create(pool_size, 0);
  +   pools-bs = bioset_create(pool_size,
  + offsetof(struct dm_rq_clone_bio_info, clone));
  if (!pools-bs)
  goto free_tio_pool_and_out;
 
 frontpad is not necessary if type is DM_TYPE_BIO_BASED.
 
 Other pool creation in that function do something like:
   pools-bs = (type == DM_TYPE_BIO_BASED) ?
   bioset_create(pool_size, 0) :
   bioset_create(pool_size, offsetof(struct 
 dm_rq_clone_bio_info, clone));
 

Eh, it doesn't really matter considering it's two pointers of padding
and struct bio + the inline vecs are something like 200 bytes, but I can
do it if it makes you happy. Can I get someone's acked-by?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 3/8] dm: Use bioset's front_pad for dm_rq_clone_bio_info

2012-09-06 Thread Kent Overstreet
Previously, dm_rq_clone_bio_info needed to be freed by the bio's
destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
This gets rid of a memory allocation and means we can kill
dm_rq_bio_destructor.

The _rq_bio_info_cache kmem cache is unused now and needs to be deleted,
but due to the way io_pool is used and overloaded this looks not quite
trivial so I'm leaving it for a later patch.

v6: Fix comment on struct dm_rq_clone_bio_info, per Tejun

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Alasdair Kergon a...@redhat.com
Acked-by: Tejun Heo t...@kernel.org
---
 drivers/md/dm.c | 46 ++
 1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f43aaf6..33470f0 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -86,12 +86,17 @@ struct dm_rq_target_io {
 };
 
 /*
- * For request-based dm.
- * One of these is allocated per bio.
+ * For request-based dm - the bio clones we allocate are embedded in these
+ * structs.
+ *
+ * We allocate these with bio_alloc_bioset, using the front_pad parameter when
+ * the bioset is created - this means the bio has to come at the end of the
+ * struct.
  */
 struct dm_rq_clone_bio_info {
struct bio *orig;
struct dm_rq_target_io *tio;
+   struct bio clone;
 };
 
 union map_info *dm_get_mapinfo(struct bio *bio)
@@ -211,6 +216,11 @@ struct dm_md_mempools {
 static struct kmem_cache *_io_cache;
 static struct kmem_cache *_tio_cache;
 static struct kmem_cache *_rq_tio_cache;
+
+/*
+ * Unused now, and needs to be deleted. But since io_pool is overloaded and 
it's
+ * still used for _io_cache, I'm leaving this for a later cleanup
+ */
 static struct kmem_cache *_rq_bio_info_cache;
 
 static int __init local_init(void)
@@ -467,16 +477,6 @@ static void free_rq_tio(struct dm_rq_target_io *tio)
mempool_free(tio, tio-md-tio_pool);
 }
 
-static struct dm_rq_clone_bio_info *alloc_bio_info(struct mapped_device *md)
-{
-   return mempool_alloc(md-io_pool, GFP_ATOMIC);
-}
-
-static void free_bio_info(struct dm_rq_clone_bio_info *info)
-{
-   mempool_free(info, info-tio-md-io_pool);
-}
-
 static int md_in_flight(struct mapped_device *md)
 {
return atomic_read(md-pending[READ]) +
@@ -1460,30 +1460,17 @@ void dm_dispatch_request(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(dm_dispatch_request);
 
-static void dm_rq_bio_destructor(struct bio *bio)
-{
-   struct dm_rq_clone_bio_info *info = bio-bi_private;
-   struct mapped_device *md = info-tio-md;
-
-   free_bio_info(info);
-   bio_free(bio, md-bs);
-}
-
 static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig,
 void *data)
 {
struct dm_rq_target_io *tio = data;
-   struct mapped_device *md = tio-md;
-   struct dm_rq_clone_bio_info *info = alloc_bio_info(md);
-
-   if (!info)
-   return -ENOMEM;
+   struct dm_rq_clone_bio_info *info =
+   container_of(bio, struct dm_rq_clone_bio_info, clone);
 
info-orig = bio_orig;
info-tio = tio;
bio-bi_end_io = end_clone_bio;
bio-bi_private = info;
-   bio-bi_destructor = dm_rq_bio_destructor;
 
return 0;
 }
@@ -2718,7 +2705,10 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned 
type, unsigned integrity)
if (!pools-tio_pool)
goto free_io_pool_and_out;
 
-   pools-bs = bioset_create(pool_size, 0);
+   pools-bs = (type == DM_TYPE_BIO_BASED) ?
+   bioset_create(pool_size, 0) :
+   bioset_create(pool_size,
+ offsetof(struct dm_rq_clone_bio_info, clone));
if (!pools-bs)
goto free_tio_pool_and_out;
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 4/8] block: Add bio_reset()

2012-09-06 Thread Kent Overstreet
Reusing bios is something that's been highly frowned upon in the past,
but driver code keeps doing it anyways. If it's going to happen anyways,
we should provide a generic method.

This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
was open coding it, by doing a bio_init() and resetting bi_destructor.

This required reordering struct bio, but the block layer is not yet
nearly fast enough for any cacheline effects to matter here.

v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
bio-bi_flags are saved.
v6: Further commenting verbosity, per Tejun
v9: Add a function comment

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
Acked-by: Tejun Heo t...@kernel.org
---
 fs/bio.c  | 24 
 include/linux/bio.h   |  1 +
 include/linux/blk_types.h | 25 +++--
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index b14f71a..919ee9a 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -263,6 +263,30 @@ void bio_init(struct bio *bio)
 EXPORT_SYMBOL(bio_init);
 
 /**
+ * bio_reset - reinitialize a bio
+ * @bio:   bio to reset
+ *
+ * Description:
+ *   After calling bio_reset(), @bio will be in the same state as a freshly
+ *   allocated bio returned bio bio_alloc_bioset() - the only fields that are
+ *   preserved are the ones that are initialized by bio_alloc_bioset(). See
+ *   comment in struct bio.
+ */
+void bio_reset(struct bio *bio)
+{
+   unsigned long flags = bio-bi_flags  (~0UL  BIO_RESET_BITS);
+
+   if (bio_integrity(bio))
+   bio_integrity_free(bio);
+
+   bio_disassociate_task(bio);
+
+   memset(bio, 0, BIO_RESET_BYTES);
+   bio-bi_flags = flags|(1  BIO_UPTODATE);
+}
+EXPORT_SYMBOL(bio_reset);
+
+/**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
  * @nr_iovecs: number of iovecs to pre-allocate
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a11f74b..76f6c25 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -226,6 +226,7 @@ extern void __bio_clone(struct bio *, struct bio *);
 extern struct bio *bio_clone(struct bio *, gfp_t);
 
 extern void bio_init(struct bio *);
+extern void bio_reset(struct bio *);
 
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned 
int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index af9dd9d..1b607c2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -59,12 +59,6 @@ struct bio {
unsigned intbi_seg_front_size;
unsigned intbi_seg_back_size;
 
-   unsigned intbi_max_vecs;/* max bvl_vecs we can hold */
-
-   atomic_tbi_cnt; /* pin count */
-
-   struct bio_vec  *bi_io_vec; /* the actual vec list */
-
bio_end_io_t*bi_end_io;
 
void*bi_private;
@@ -80,6 +74,16 @@ struct bio {
struct bio_integrity_payload *bi_integrity;  /* data integrity */
 #endif
 
+   /*
+* Everything starting with bi_max_vecs will be preserved by bio_reset()
+*/
+
+   unsigned intbi_max_vecs;/* max bvl_vecs we can hold */
+
+   atomic_tbi_cnt; /* pin count */
+
+   struct bio_vec  *bi_io_vec; /* the actual vec list */
+
/* If bi_pool is non NULL, bi_destructor is not called */
struct bio_set  *bi_pool;
 
@@ -93,6 +97,8 @@ struct bio {
struct bio_vec  bi_inline_vecs[0];
 };
 
+#define BIO_RESET_BYTESoffsetof(struct bio, bi_max_vecs)
+
 /*
  * bio flags
  */
@@ -108,6 +114,13 @@ struct bio {
 #define BIO_FS_INTEGRITY 9 /* fs owns integrity data, not block layer */
 #define BIO_QUIET  10  /* Make BIO Quiet */
 #define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
+
+/*
+ * Flags starting here get preserved by bio_reset() - this includes
+ * BIO_POOL_IDX()
+ */
+#define BIO_RESET_BITS 12
+
 #define bio_flagged(bio, flag) ((bio)-bi_flags  (1  (flag)))
 
 /*
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 8/8] block: Add bio_clone_bioset(), bio_clone_kmalloc()

2012-09-06 Thread Kent Overstreet
Previously, there was bio_clone() but it only allocated from the fs bio
set; as a result various users were open coding it and using
__bio_clone().

This changes bio_clone() to become bio_clone_bioset(), and then we add
bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
the functionality the last patch adedd.

This will also help in a later patch changing how bio cloning works.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
CC: Alasdair Kergon a...@redhat.com
CC: Boaz Harrosh bharr...@panasas.com
CC: Jeff Garzik j...@garzik.org
Acked-by: Jeff Garzik jgar...@redhat.com
---
 block/blk-core.c   |  8 +---
 drivers/block/osdblk.c |  3 +--
 drivers/md/dm-crypt.c  |  7 +--
 drivers/md/dm.c|  4 ++--
 drivers/md/md.c| 20 +---
 fs/bio.c   | 11 +++
 fs/exofs/ore.c |  5 ++---
 include/linux/bio.h| 17 ++---
 8 files changed, 29 insertions(+), 46 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b776cc9..82aab28 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2781,16 +2781,10 @@ int blk_rq_prep_clone(struct request *rq, struct 
request *rq_src,
blk_rq_init(NULL, rq);
 
__rq_for_each_bio(bio_src, rq_src) {
-   bio = bio_alloc_bioset(gfp_mask, bio_src-bi_max_vecs, bs);
+   bio = bio_clone_bioset(bio_src, gfp_mask, bs);
if (!bio)
goto free_and_out;
 
-   __bio_clone(bio, bio_src);
-
-   if (bio_integrity(bio_src) 
-   bio_integrity_clone(bio, bio_src, gfp_mask))
-   goto free_and_out;
-
if (bio_ctr  bio_ctr(bio, bio_src, data))
goto free_and_out;
 
diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
index 87311eb..1bbc681 100644
--- a/drivers/block/osdblk.c
+++ b/drivers/block/osdblk.c
@@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, 
gfp_t gfpmask)
struct bio *tmp, *new_chain = NULL, *tail = NULL;
 
while (old_chain) {
-   tmp = bio_kmalloc(gfpmask, old_chain-bi_max_vecs);
+   tmp = bio_clone_kmalloc(old_chain, gfpmask);
if (!tmp)
goto err_out;
 
-   __bio_clone(tmp, old_chain);
tmp-bi_bdev = NULL;
gfpmask = ~__GFP_WAIT;
tmp-bi_next = NULL;
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3c0acba..bbf459b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -979,19 +979,14 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t 
gfp)
 * copy the required bvecs because we need the original
 * one in order to decrypt the whole bio data *afterwards*.
 */
-   clone = bio_alloc_bioset(gfp, bio_segments(base_bio), cc-bs);
+   clone = bio_clone_bioset(base_bio, gfp, cc-bs);
if (!clone)
return 1;
 
crypt_inc_pending(io);
 
clone_init(io, clone);
-   clone-bi_idx = 0;
-   clone-bi_vcnt = bio_segments(base_bio);
-   clone-bi_size = base_bio-bi_size;
clone-bi_sector = cc-start + io-sector;
-   memcpy(clone-bi_io_vec, bio_iovec(base_bio),
-  sizeof(struct bio_vec) * clone-bi_vcnt);
 
generic_make_request(clone);
return 0;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 33470f0..8378797 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1129,8 +1129,8 @@ static void __issue_target_request(struct clone_info *ci, 
struct dm_target *ti,
 * ci-bio-bi_max_vecs is BIO_INLINE_VECS anyway, for both flush
 * and discard, so no need for concern about wasted bvec allocations.
 */
-   clone = bio_alloc_bioset(GFP_NOIO, ci-bio-bi_max_vecs, ci-md-bs);
-   __bio_clone(clone, ci-bio);
+   clone = bio_clone_bioset(ci-bio, GFP_NOIO, ci-md-bs);
+
if (len) {
clone-bi_sector = ci-sector;
clone-bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 457ca84..7a2b079 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -173,28 +173,10 @@ EXPORT_SYMBOL_GPL(bio_alloc_mddev);
 struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev)
 {
-   struct bio *b;
-
if (!mddev || !mddev-bio_set)
return bio_clone(bio, gfp_mask);
 
-   b = bio_alloc_bioset(gfp_mask, bio-bi_max_vecs, mddev-bio_set);
-   if (!b)
-   return NULL;
-
-   __bio_clone(b, bio);
-   if (bio_integrity(bio)) {
-   int ret;
-
-   ret = bio_integrity_clone(b, bio, gfp_mask);
-
-   if (ret  0) {
-   bio_put(b);
-   return NULL;
-   }
-   }
-
-   return b;
+   return bio_clone_bioset

[PATCH v10 7/8] block: Consolidate bio_alloc_bioset(), bio_kmalloc()

2012-09-06 Thread Kent Overstreet
Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
different because there was some almost-duplicated code - this fixes
some of that.

The important change is that previously bio_kmalloc() always set
bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
bio_alloc_bioset(). This would cause bio_has_data() to return true; I
don't know if this resulted in any actual bugs but it was certainly
wrong.

bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
(BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
at least they're enforced closer together and hopefully they will be
fixed in a later patch.

This'll also help with some future cleanups - there are a fair number of
functions that allocate bios (e.g. bio_clone()), and now they don't have
to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
v7: Re-add dropped comments, improv patch description
---
 fs/bio.c| 110 ++--
 include/linux/bio.h |  16 ++--
 2 files changed, 49 insertions(+), 77 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 736ef12..191b9b8 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -55,6 +55,7 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] 
__read_mostly = {
  * IO code that does not need private memory pools.
  */
 struct bio_set *fs_bio_set;
+EXPORT_SYMBOL(fs_bio_set);
 
 /*
  * Our slab pool management
@@ -301,39 +302,58 @@ EXPORT_SYMBOL(bio_reset);
  * @bs:the bio_set to allocate from.
  *
  * Description:
- *   bio_alloc_bioset will try its own mempool to satisfy the allocation.
- *   If %__GFP_WAIT is set then we will block on the internal pool waiting
- *   for a struct bio to become free.
- **/
+ *   If @bs is NULL, uses kmalloc() to allocate the bio; else the allocation is
+ *   backed by the @bs's mempool.
+ *
+ *   When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
+ *   able to allocate a bio. This is due to the mempool guarantees. To make 
this
+ *   work, callers must never allocate more than 1 bio at a time from this 
pool.
+ *   Callers that need to allocate more than 1 bio must always submit the
+ *   previously allocated bio for IO before attempting to allocate a new one.
+ *   Failure to do so can cause deadlocks under memory pressure.
+ *
+ *   RETURNS:
+ *   Pointer to new bio on success, NULL on failure.
+ */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
+   unsigned front_pad;
+   unsigned inline_vecs;
unsigned long idx = BIO_POOL_NONE;
struct bio_vec *bvl = NULL;
struct bio *bio;
void *p;
 
-   p = mempool_alloc(bs-bio_pool, gfp_mask);
+   if (!bs) {
+   if (nr_iovecs  UIO_MAXIOV)
+   return NULL;
+
+   p = kmalloc(sizeof(struct bio) +
+   nr_iovecs * sizeof(struct bio_vec),
+   gfp_mask);
+   front_pad = 0;
+   inline_vecs = nr_iovecs;
+   } else {
+   p = mempool_alloc(bs-bio_pool, gfp_mask);
+   front_pad = bs-front_pad;
+   inline_vecs = BIO_INLINE_VECS;
+   }
+
if (unlikely(!p))
return NULL;
-   bio = p + bs-front_pad;
 
+   bio = p + front_pad;
bio_init(bio);
-   bio-bi_pool = bs;
-
-   if (unlikely(!nr_iovecs))
-   goto out_set;
 
-   if (nr_iovecs = BIO_INLINE_VECS) {
-   bvl = bio-bi_inline_vecs;
-   nr_iovecs = BIO_INLINE_VECS;
-   } else {
+   if (nr_iovecs  inline_vecs) {
bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, idx, bs);
if (unlikely(!bvl))
goto err_free;
-
-   nr_iovecs = bvec_nr_vecs(idx);
+   } else if (nr_iovecs) {
+   bvl = bio-bi_inline_vecs;
}
-out_set:
+
+   bio-bi_pool = bs;
bio-bi_flags |= idx  BIO_POOL_OFFSET;
bio-bi_max_vecs = nr_iovecs;
bio-bi_io_vec = bvl;
@@ -345,62 +365,6 @@ err_free:
 }
 EXPORT_SYMBOL(bio_alloc_bioset);
 
-/**
- * bio_alloc - allocate a new bio, memory pool backed
- * @gfp_mask: allocation mask to use
- * @nr_iovecs: number of iovecs
- *
- * bio_alloc will allocate a bio and associated bio_vec array that can hold
- * at least @nr_iovecs entries. Allocations will be done from the
- * fs_bio_set. Also see @bio_alloc_bioset and @bio_kmalloc.
- *
- * If %__GFP_WAIT is set, then bio_alloc will always be able to allocate
- * a bio. This is due to the mempool guarantees. To make this work, callers
- * must never allocate more than 1 bio at a time from this pool. Callers
- * that need to allocate more than 1 bio must always submit the previously

[PATCH v10 6/8] block: Kill bi_destructor

2012-09-06 Thread Kent Overstreet
Now that we've got generic code for freeing bios allocated from bio
pools, this isn't needed anymore.

This patch also makes bio_free() static, since without bi_destructor
there should be no need for it to be called anywhere else.

bio_free() is now only called from bio_put, so we can refactor those a
bit - move some code from bio_put() to bio_free() and kill the redundant
bio-bi_next = NULL.

v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
v7: No #define BIO_KMALLOC_POOL anymore

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 Documentation/block/biodoc.txt |  5 
 block/blk-core.c   |  2 +-
 fs/bio.c   | 64 +-
 include/linux/bio.h|  1 -
 include/linux/blk_types.h  |  3 --
 5 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index e418dc0..8df5e8e 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -465,7 +465,6 @@ struct bio {
bio_end_io_t*bi_end_io;  /* bi_end_io (bio) */
atomic_tbi_cnt;  /* pin count: free when it hits 
zero */
void *bi_private;
-   bio_destructor_t *bi_destructor; /* bi_destructor (bio) */
 };
 
 With this multipage bio design:
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for 
different size biovecs,
 so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
 given size from these slabs.
 
-The bi_destructor() routine takes into account the possibility of the bio
-having originated from a different source (see later discussions on
-n/w to block transfers and kvec_cb)
-
 The bio_get() routine may be used to hold an extra reference on a bio prior
 to i/o submission, if the bio fields are likely to be accessed after the
 i/o is issued (since the bio may otherwise get freed in case i/o completion
diff --git a/block/blk-core.c b/block/blk-core.c
index 95c4935..b776cc9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2807,7 +2807,7 @@ int blk_rq_prep_clone(struct request *rq, struct request 
*rq_src,
 
 free_and_out:
if (bio)
-   bio_free(bio, bs);
+   bio_put(bio);
blk_rq_unprep_clone(rq);
 
return -ENOMEM;
diff --git a/fs/bio.c b/fs/bio.c
index 919ee9a..736ef12 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -233,26 +233,37 @@ fallback:
return bvl;
 }
 
-void bio_free(struct bio *bio, struct bio_set *bs)
+static void __bio_free(struct bio *bio)
 {
-   void *p;
-
-   if (bio_has_allocated_vec(bio))
-   bvec_free_bs(bs, bio-bi_io_vec, BIO_POOL_IDX(bio));
+   bio_disassociate_task(bio);
 
if (bio_integrity(bio))
bio_integrity_free(bio);
+}
 
-   /*
-* If we have front padding, adjust the bio pointer before freeing
-*/
-   p = bio;
-   if (bs-front_pad)
+static void bio_free(struct bio *bio)
+{
+   struct bio_set *bs = bio-bi_pool;
+   void *p;
+
+   __bio_free(bio);
+
+   if (bs) {
+   if (bio_has_allocated_vec(bio))
+   bvec_free_bs(bs, bio-bi_io_vec, BIO_POOL_IDX(bio));
+
+   /*
+* If we have front padding, adjust the bio pointer before 
freeing
+*/
+   p = bio;
p -= bs-front_pad;
 
-   mempool_free(p, bs-bio_pool);
+   mempool_free(p, bs-bio_pool);
+   } else {
+   /* Bio was allocated by bio_kmalloc() */
+   kfree(bio);
+   }
 }
-EXPORT_SYMBOL(bio_free);
 
 void bio_init(struct bio *bio)
 {
@@ -276,10 +287,7 @@ void bio_reset(struct bio *bio)
 {
unsigned long flags = bio-bi_flags  (~0UL  BIO_RESET_BITS);
 
-   if (bio_integrity(bio))
-   bio_integrity_free(bio);
-
-   bio_disassociate_task(bio);
+   __bio_free(bio);
 
memset(bio, 0, BIO_RESET_BYTES);
bio-bi_flags = flags|(1  BIO_UPTODATE);
@@ -362,13 +370,6 @@ struct bio *bio_alloc(gfp_t gfp_mask, unsigned int 
nr_iovecs)
 }
 EXPORT_SYMBOL(bio_alloc);
 
-static void bio_kmalloc_destructor(struct bio *bio)
-{
-   if (bio_integrity(bio))
-   bio_integrity_free(bio);
-   kfree(bio);
-}
-
 /**
  * bio_kmalloc - allocate a bio for I/O using kmalloc()
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -395,7 +396,6 @@ struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int 
nr_iovecs)
bio-bi_flags |= BIO_POOL_NONE  BIO_POOL_OFFSET;
bio-bi_max_vecs = nr_iovecs;
bio-bi_io_vec = bio-bi_inline_vecs;
-   bio-bi_destructor = bio_kmalloc_destructor;
 
return bio;
 }
@@ -431,20 +431,8 @@ void bio_put(struct bio *bio)
/*
 * last put frees it
 */
-   if (atomic_dec_and_test(bio-bi_cnt

[PATCH v10 0/8] Block cleanups

2012-09-06 Thread Kent Overstreet
Screwed up the bio_reset() patch in the last patch series when I went to edit
the description, fixed that here.

Only other change is the dm patch - made the front_pad conditional on
DM_TYPE_BIO_BASED.

Kent Overstreet (8):
  block: Generalized bio pool freeing
  block: Ues bi_pool for bio_integrity_alloc()
  dm: Use bioset's front_pad for dm_rq_clone_bio_info
  block: Add bio_reset()
  pktcdvd: Switch to bio_kmalloc()
  block: Kill bi_destructor
  block: Consolidate bio_alloc_bioset(), bio_kmalloc()
  block: Add bio_clone_bioset(), bio_clone_kmalloc()

 Documentation/block/biodoc.txt  |   5 -
 block/blk-core.c|  10 +-
 drivers/block/drbd/drbd_main.c  |  13 +--
 drivers/block/osdblk.c  |   3 +-
 drivers/block/pktcdvd.c |  52 ++---
 drivers/md/dm-crypt.c   |  16 +--
 drivers/md/dm-io.c  |  11 --
 drivers/md/dm.c |  74 -
 drivers/md/md.c |  44 +---
 drivers/target/target_core_iblock.c |   9 --
 fs/bio-integrity.c  |  44 +++-
 fs/bio.c| 206 
 fs/exofs/ore.c  |   5 +-
 include/linux/bio.h |  44 +---
 include/linux/blk_types.h   |  27 +++--
 15 files changed, 195 insertions(+), 368 deletions(-)

-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 5/8] pktcdvd: Switch to bio_kmalloc()

2012-09-06 Thread Kent Overstreet
This is prep work for killing bi_destructor - previously, pktcdvd had
its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
necessitating its own bi_destructor implementation.

v5: Un-reorder some functions, to make the patch easier to review

Signed-off-by: Kent Overstreet koverstr...@google.com
Acked-by: Jiri Kosina jkos...@suse.cz
---
 drivers/block/pktcdvd.c | 52 +++--
 1 file changed, 7 insertions(+), 45 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ba66e44..2e7de7a 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -522,38 +522,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
}
 }
 
-static void pkt_bio_destructor(struct bio *bio)
-{
-   kfree(bio-bi_io_vec);
-   kfree(bio);
-}
-
-static struct bio *pkt_bio_alloc(int nr_iovecs)
-{
-   struct bio_vec *bvl = NULL;
-   struct bio *bio;
-
-   bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
-   if (!bio)
-   goto no_bio;
-   bio_init(bio);
-
-   bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
-   if (!bvl)
-   goto no_bvl;
-
-   bio-bi_max_vecs = nr_iovecs;
-   bio-bi_io_vec = bvl;
-   bio-bi_destructor = pkt_bio_destructor;
-
-   return bio;
-
- no_bvl:
-   kfree(bio);
- no_bio:
-   return NULL;
-}
-
 /*
  * Allocate a packet_data struct
  */
@@ -567,7 +535,7 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
goto no_pkt;
 
pkt-frames = frames;
-   pkt-w_bio = pkt_bio_alloc(frames);
+   pkt-w_bio = bio_kmalloc(GFP_KERNEL, frames);
if (!pkt-w_bio)
goto no_bio;
 
@@ -581,9 +549,10 @@ static struct packet_data *pkt_alloc_packet_data(int 
frames)
bio_list_init(pkt-orig_bios);
 
for (i = 0; i  frames; i++) {
-   struct bio *bio = pkt_bio_alloc(1);
+   struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
if (!bio)
goto no_rd_bio;
+
pkt-r_bios[i] = bio;
}
 
@@ -,21 +1080,17 @@ static void pkt_gather_data(struct pktcdvd_device *pd, 
struct packet_data *pkt)
 * Schedule reads for missing parts of the packet.
 */
for (f = 0; f  pkt-frames; f++) {
-   struct bio_vec *vec;
-
int p, offset;
+
if (written[f])
continue;
+
bio = pkt-r_bios[f];
-   vec = bio-bi_io_vec;
-   bio_init(bio);
-   bio-bi_max_vecs = 1;
+   bio_reset(bio);
bio-bi_sector = pkt-sector + f * (CD_FRAMESIZE  9);
bio-bi_bdev = pd-bdev;
bio-bi_end_io = pkt_end_io_read;
bio-bi_private = pkt;
-   bio-bi_io_vec = vec;
-   bio-bi_destructor = pkt_bio_destructor;
 
p = (f * CD_FRAMESIZE) / PAGE_SIZE;
offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
@@ -1418,14 +1383,11 @@ static void pkt_start_write(struct pktcdvd_device *pd, 
struct packet_data *pkt)
}
 
/* Start the write request */
-   bio_init(pkt-w_bio);
-   pkt-w_bio-bi_max_vecs = PACKET_MAX_SIZE;
+   bio_reset(pkt-w_bio);
pkt-w_bio-bi_sector = pkt-sector;
pkt-w_bio-bi_bdev = pd-bdev;
pkt-w_bio-bi_end_io = pkt_end_io_packet_write;
pkt-w_bio-bi_private = pkt;
-   pkt-w_bio-bi_io_vec = bvec;
-   pkt-w_bio-bi_destructor = pkt_bio_destructor;
for (f = 0; f  pkt-frames; f++)
if (!bio_add_page(pkt-w_bio, bvec[f].bv_page, CD_FRAMESIZE, 
bvec[f].bv_offset))
BUG();
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 2/8] block: Ues bi_pool for bio_integrity_alloc()

2012-09-06 Thread Kent Overstreet
Now that bios keep track of where they were allocated from,
bio_integrity_alloc_bioset() becomes redundant.

Remove bio_integrity_alloc_bioset() and drop bio_set argument from the
related functions and make them use bio-bi_pool.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: Martin K. Petersen martin.peter...@oracle.com
Acked-by: Tejun Heo t...@kernel.org
---
 block/blk-core.c|  2 +-
 drivers/md/dm.c |  4 ++--
 drivers/md/md.c |  2 +-
 fs/bio-integrity.c  | 44 +++-
 fs/bio.c|  6 +++---
 include/linux/bio.h |  9 -
 6 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 4b4dbdf..95c4935 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2788,7 +2788,7 @@ int blk_rq_prep_clone(struct request *rq, struct request 
*rq_src,
__bio_clone(bio, bio_src);
 
if (bio_integrity(bio_src) 
-   bio_integrity_clone(bio, bio_src, gfp_mask, bs))
+   bio_integrity_clone(bio, bio_src, gfp_mask))
goto free_and_out;
 
if (bio_ctr  bio_ctr(bio, bio_src, data))
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 0c3d6dd..f43aaf6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1068,7 +1068,7 @@ static struct bio *split_bvec(struct bio *bio, sector_t 
sector,
clone-bi_flags |= 1  BIO_CLONED;
 
if (bio_integrity(bio)) {
-   bio_integrity_clone(clone, bio, GFP_NOIO, bs);
+   bio_integrity_clone(clone, bio, GFP_NOIO);
bio_integrity_trim(clone,
   bio_sector_offset(bio, idx, offset), len);
}
@@ -1094,7 +1094,7 @@ static struct bio *clone_bio(struct bio *bio, sector_t 
sector,
clone-bi_flags = ~(1  BIO_SEG_VALID);
 
if (bio_integrity(bio)) {
-   bio_integrity_clone(clone, bio, GFP_NOIO, bs);
+   bio_integrity_clone(clone, bio, GFP_NOIO);
 
if (idx != bio-bi_idx || clone-bi_size  bio-bi_size)
bio_integrity_trim(clone,
diff --git a/drivers/md/md.c b/drivers/md/md.c
index b8eebe3..457ca84 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -186,7 +186,7 @@ struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
if (bio_integrity(bio)) {
int ret;
 
-   ret = bio_integrity_clone(b, bio, gfp_mask, mddev-bio_set);
+   ret = bio_integrity_clone(b, bio, gfp_mask);
 
if (ret  0) {
bio_put(b);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index e85c04b..a3f28f3 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -70,23 +70,25 @@ static inline int use_bip_pool(unsigned int idx)
 }
 
 /**
- * bio_integrity_alloc_bioset - Allocate integrity payload and attach it to bio
+ * bio_integrity_alloc - Allocate integrity payload and attach it to bio
  * @bio:   bio to attach integrity metadata to
  * @gfp_mask:  Memory allocation mask
  * @nr_vecs:   Number of integrity metadata scatter-gather elements
- * @bs:bio_set to allocate from
  *
  * Description: This function prepares a bio for attaching integrity
  * metadata.  nr_vecs specifies the maximum number of pages containing
  * integrity metadata that can be attached.
  */
-struct bio_integrity_payload *bio_integrity_alloc_bioset(struct bio *bio,
-gfp_t gfp_mask,
-unsigned int nr_vecs,
-struct bio_set *bs)
+struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
+ gfp_t gfp_mask,
+ unsigned int nr_vecs)
 {
struct bio_integrity_payload *bip;
unsigned int idx = vecs_to_idx(nr_vecs);
+   struct bio_set *bs = bio-bi_pool;
+
+   if (!bs)
+   bs = fs_bio_set;
 
BUG_ON(bio == NULL);
bip = NULL;
@@ -114,37 +116,22 @@ struct bio_integrity_payload 
*bio_integrity_alloc_bioset(struct bio *bio,
 
return bip;
 }
-EXPORT_SYMBOL(bio_integrity_alloc_bioset);
-
-/**
- * bio_integrity_alloc - Allocate integrity payload and attach it to bio
- * @bio:   bio to attach integrity metadata to
- * @gfp_mask:  Memory allocation mask
- * @nr_vecs:   Number of integrity metadata scatter-gather elements
- *
- * Description: This function prepares a bio for attaching integrity
- * metadata.  nr_vecs specifies the maximum number of pages containing
- * integrity metadata that can be attached.
- */
-struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
- gfp_t gfp_mask,
- unsigned int nr_vecs)
-{
-   return

[PATCH v10 1/8] block: Generalized bio pool freeing

2012-09-06 Thread Kent Overstreet
With the old code, when you allocate a bio from a bio pool you have to
implement your own destructor that knows how to find the bio pool the
bio was originally allocated from.

This adds a new field to struct bio (bi_pool) and changes
bio_alloc_bioset() to use it. This makes various bio destructors
unnecessary, so they're then deleted.

v6: Explain the temporary if statement in bio_put

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
CC: Alasdair Kergon a...@redhat.com
CC: Nicholas Bellinger n...@linux-iscsi.org
CC: Lars Ellenberg lars.ellenb...@linbit.com
Acked-by: Tejun Heo t...@kernel.org
Acked-by: Nicholas Bellinger n...@linux-iscsi.org
---
 drivers/block/drbd/drbd_main.c  | 13 +
 drivers/md/dm-crypt.c   |  9 -
 drivers/md/dm-io.c  | 11 ---
 drivers/md/dm.c | 20 
 drivers/md/md.c | 28 
 drivers/target/target_core_iblock.c |  9 -
 fs/bio.c| 31 +--
 include/linux/blk_types.h   |  3 +++
 8 files changed, 21 insertions(+), 103 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index f93a032..f55683a 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -162,23 +162,12 @@ static const struct block_device_operations drbd_ops = {
.release = drbd_release,
 };
 
-static void bio_destructor_drbd(struct bio *bio)
-{
-   bio_free(bio, drbd_md_io_bio_set);
-}
-
 struct bio *bio_alloc_drbd(gfp_t gfp_mask)
 {
-   struct bio *bio;
-
if (!drbd_md_io_bio_set)
return bio_alloc(gfp_mask, 1);
 
-   bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
-   if (!bio)
-   return NULL;
-   bio-bi_destructor = bio_destructor_drbd;
-   return bio;
+   return bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
 }
 
 #ifdef __CHECKER__
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 664743d..3c0acba 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -798,14 +798,6 @@ static int crypt_convert(struct crypt_config *cc,
return 0;
 }
 
-static void dm_crypt_bio_destructor(struct bio *bio)
-{
-   struct dm_crypt_io *io = bio-bi_private;
-   struct crypt_config *cc = io-cc;
-
-   bio_free(bio, cc-bs);
-}
-
 /*
  * Generate a new unfragmented bio with the given size
  * This should never violate the device limitations
@@ -974,7 +966,6 @@ static void clone_init(struct dm_crypt_io *io, struct bio 
*clone)
clone-bi_end_io  = crypt_endio;
clone-bi_bdev= cc-dev-bdev;
clone-bi_rw  = io-base_bio-bi_rw;
-   clone-bi_destructor = dm_crypt_bio_destructor;
 }
 
 static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index ea5dd28..1c46f97 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -249,16 +249,6 @@ static void vm_dp_init(struct dpages *dp, void *data)
dp-context_ptr = data;
 }
 
-static void dm_bio_destructor(struct bio *bio)
-{
-   unsigned region;
-   struct io *io;
-
-   retrieve_io_and_region_from_bio(bio, io, region);
-
-   bio_free(bio, io-client-bios);
-}
-
 /*
  * Functions for getting the pages from kernel memory.
  */
@@ -317,7 +307,6 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
bio-bi_sector = where-sector + (where-count - remaining);
bio-bi_bdev = where-bdev;
bio-bi_end_io = endio;
-   bio-bi_destructor = dm_bio_destructor;
store_io_and_region_in_bio(bio, io, region);
 
if (rw  REQ_DISCARD) {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4e09b6f..0c3d6dd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -681,11 +681,6 @@ static void clone_endio(struct bio *bio, int error)
}
}
 
-   /*
-* Store md for cleanup instead of tio which is about to get freed.
-*/
-   bio-bi_private = md-bs;
-
free_tio(md, tio);
bio_put(bio);
dec_pending(io, error);
@@ -1032,11 +1027,6 @@ static void __map_bio(struct dm_target *ti, struct bio 
*clone,
/* error the io and bail out, or requeue it if needed */
md = tio-io-md;
dec_pending(tio-io, r);
-   /*
-* Store bio_set for cleanup.
-*/
-   clone-bi_end_io = NULL;
-   clone-bi_private = md-bs;
bio_put(clone);
free_tio(md, tio);
} else if (r) {
@@ -1055,13 +1045,6 @@ struct clone_info {
unsigned short idx;
 };
 
-static void dm_bio_destructor(struct bio *bio)
-{
-   struct bio_set *bs = bio-bi_private;
-
-   bio_free(bio, bs

Re: [PATCH] virtio-blk: Fix kconfig option

2012-09-06 Thread Kent Overstreet
On Fri, Sep 07, 2012 at 09:10:25AM +0930, Rusty Russell wrote:
 Kent Overstreet koverstr...@google.com writes:
 
  On Thu, Sep 06, 2012 at 12:49:56PM +0300, Michael S. Tsirkin wrote:
  On Thu, Sep 06, 2012 at 02:25:12AM -0700, Kent Overstreet wrote:
   Do you not understand the difference between depends an selects?
   Or did you not read my original mail?
 
 Now you're getting insulting.

Yes, but at least I'm not being intentionally obtuse.

 It's normal for options to depend on other options.  Sometimes they're
 directly nested (eg. E1000 depends on NETDEVICES, and it's nested under
 that option), sometimes they're not (eg. E1000 depends on PCI, which is
 selected elsewhere).
 
 The fact that you are only just realizing this is not Michael's problem.

Like I said, I'm well aware of that. The issue here isn't the
dependency, it's that it depends on something that isn't exposed
anywhere!

Think about it from the user's pov. They check what VIRTIO_BLK depends
on - just VIRTIO.

So they try to figure out how to flip on VIRTIO, or what VIRTIO even is.

See how that last step might be problematic? CONFIG_VIRTIO is not
exposed! It doesn't even seem to control anything!

Go back to your example. Checking the dependencies for E1000 would tell
you the user needs to flip on CONFIG_PCI. Done. Easy.

User checks the dependencies here and... what do _you_ expect people to
do?

Look, depending on a kconfig option that's supposed to be user
controllable but isn't exposed anywhere is flat out broken. The fact
that it's in a different submenu just makes it worse.

The problem is that VIRTIO_BLK's dependencies are not actually specified
in the kconfig. If it depends on VIRTIO_PCI, that's what the kconfig
should say. If it depends on having any of multiple virtio backends
enabled, then specify that!

depends VIRTIO_PCI || VIRTIO_WHATEVER

Or if you really want to have a fake config option that's enabled if you
have any virtio backend enabled, fix the damn comments and naming!

How is anyone supposed to know that CONFIG_VIRTIO really means any
virtio backend? Call it VIRTIO_ANY_BACKEND if that's what it really is.

And, if that is what you're doing with CONFIG_VIRTIO (I'm still not
sure) the comment at the top of drivers/virtio/Kconfig is _wrong_:

# Virtio always gets selected by whoever wants it.
VIRTIO
tristate

How is _anyone_ supposed to know that really means VIRTIO gets selected
by things that provide a virtio backend?

C'mon, you've had to debug other people's code before. What would _you_
think if you were tripped up by something like that?

   Flip off everything in drivers - virtio
   
   Now go to drivers - block and try to turn on virtio-blk.
   
   It's not listed!
  
  Yes. Because you disabled all virtio backends.
  It does not make sense to have any frontends.
 
  How's a user - or even another kernel developer who isn't familiar with
  virtio - supposed to know that?
 
 I get annoyed that menuconfig doesn't show options whose dependencies
 aren't possible, too.  (I got bitten the other way: it doesn't show
 dependencies which can't be disabled, and I was trying to turn KALLSYMS
 off).
 
 But as I found out just last week, the '/' key allows you to find any
 option, and shows what dependencies it has, and their values.

Yep, use it all the time. 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 3/8] dm: Use bioset's front_pad for dm_rq_clone_bio_info

2012-09-07 Thread Kent Overstreet
Previously, dm_rq_clone_bio_info needed to be freed by the bio's
destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
This gets rid of a memory allocation and means we can kill
dm_rq_bio_destructor.

The _rq_bio_info_cache kmem cache is unused now and needs to be deleted,
but due to the way io_pool is used and overloaded this looks not quite
trivial so I'm leaving it for a later patch.

v6: Fix comment on struct dm_rq_clone_bio_info, per Tejun

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Alasdair Kergon a...@redhat.com
Acked-by: Tejun Heo t...@kernel.org
---
 drivers/md/dm.c | 46 ++
 1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f43aaf6..33470f0 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -86,12 +86,17 @@ struct dm_rq_target_io {
 };
 
 /*
- * For request-based dm.
- * One of these is allocated per bio.
+ * For request-based dm - the bio clones we allocate are embedded in these
+ * structs.
+ *
+ * We allocate these with bio_alloc_bioset, using the front_pad parameter when
+ * the bioset is created - this means the bio has to come at the end of the
+ * struct.
  */
 struct dm_rq_clone_bio_info {
struct bio *orig;
struct dm_rq_target_io *tio;
+   struct bio clone;
 };
 
 union map_info *dm_get_mapinfo(struct bio *bio)
@@ -211,6 +216,11 @@ struct dm_md_mempools {
 static struct kmem_cache *_io_cache;
 static struct kmem_cache *_tio_cache;
 static struct kmem_cache *_rq_tio_cache;
+
+/*
+ * Unused now, and needs to be deleted. But since io_pool is overloaded and 
it's
+ * still used for _io_cache, I'm leaving this for a later cleanup
+ */
 static struct kmem_cache *_rq_bio_info_cache;
 
 static int __init local_init(void)
@@ -467,16 +477,6 @@ static void free_rq_tio(struct dm_rq_target_io *tio)
mempool_free(tio, tio-md-tio_pool);
 }
 
-static struct dm_rq_clone_bio_info *alloc_bio_info(struct mapped_device *md)
-{
-   return mempool_alloc(md-io_pool, GFP_ATOMIC);
-}
-
-static void free_bio_info(struct dm_rq_clone_bio_info *info)
-{
-   mempool_free(info, info-tio-md-io_pool);
-}
-
 static int md_in_flight(struct mapped_device *md)
 {
return atomic_read(md-pending[READ]) +
@@ -1460,30 +1460,17 @@ void dm_dispatch_request(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(dm_dispatch_request);
 
-static void dm_rq_bio_destructor(struct bio *bio)
-{
-   struct dm_rq_clone_bio_info *info = bio-bi_private;
-   struct mapped_device *md = info-tio-md;
-
-   free_bio_info(info);
-   bio_free(bio, md-bs);
-}
-
 static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig,
 void *data)
 {
struct dm_rq_target_io *tio = data;
-   struct mapped_device *md = tio-md;
-   struct dm_rq_clone_bio_info *info = alloc_bio_info(md);
-
-   if (!info)
-   return -ENOMEM;
+   struct dm_rq_clone_bio_info *info =
+   container_of(bio, struct dm_rq_clone_bio_info, clone);
 
info-orig = bio_orig;
info-tio = tio;
bio-bi_end_io = end_clone_bio;
bio-bi_private = info;
-   bio-bi_destructor = dm_rq_bio_destructor;
 
return 0;
 }
@@ -2718,7 +2705,10 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned 
type, unsigned integrity)
if (!pools-tio_pool)
goto free_io_pool_and_out;
 
-   pools-bs = bioset_create(pool_size, 0);
+   pools-bs = (type == DM_TYPE_BIO_BASED) ?
+   bioset_create(pool_size, 0) :
+   bioset_create(pool_size,
+ offsetof(struct dm_rq_clone_bio_info, clone));
if (!pools-bs)
goto free_tio_pool_and_out;
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 0/8] Block cleanups

2012-09-07 Thread Kent Overstreet
On Thu, Sep 06, 2012 at 07:37:10PM -0600, Jens Axboe wrote:
 On 2012-09-06 17:48, Tejun Heo wrote:
  Hello, guys.
  
  (cc'ing Jens, Alasdair and Neil)
  
  On Thu, Sep 06, 2012 at 03:34:54PM -0700, Kent Overstreet wrote:
  Screwed up the bio_reset() patch in the last patch series when I went to 
  edit
  the description, fixed that here.
 
  Only other change is the dm patch - made the front_pad conditional on
  DM_TYPE_BIO_BASED.
 
  Kent Overstreet (8):
block: Generalized bio pool freeing
block: Ues bi_pool for bio_integrity_alloc()
dm: Use bioset's front_pad for dm_rq_clone_bio_info
block: Add bio_reset()
pktcdvd: Switch to bio_kmalloc()
block: Kill bi_destructor
block: Consolidate bio_alloc_bioset(), bio_kmalloc()
block: Add bio_clone_bioset(), bio_clone_kmalloc()
  
  This series looks good to me now.  If someone can ack the dm patch, I
  think it's good to go.  Jens, what do you think?
 
 Looks OK to me too. I'll run a quick perf test on this tomorrow, and get
 it queued up for 3.7 if it looks fine.

Thanks!

 Kent, are to resend 3+5 to me?

You should have it now.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 5/8] pktcdvd: Switch to bio_kmalloc()

2012-09-07 Thread Kent Overstreet
This is prep work for killing bi_destructor - previously, pktcdvd had
its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
necessitating its own bi_destructor implementation.

v5: Un-reorder some functions, to make the patch easier to review

Signed-off-by: Kent Overstreet koverstr...@google.com
Acked-by: Jiri Kosina jkos...@suse.cz
---
 drivers/block/pktcdvd.c | 52 +++--
 1 file changed, 7 insertions(+), 45 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ba66e44..2e7de7a 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -522,38 +522,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
}
 }
 
-static void pkt_bio_destructor(struct bio *bio)
-{
-   kfree(bio-bi_io_vec);
-   kfree(bio);
-}
-
-static struct bio *pkt_bio_alloc(int nr_iovecs)
-{
-   struct bio_vec *bvl = NULL;
-   struct bio *bio;
-
-   bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
-   if (!bio)
-   goto no_bio;
-   bio_init(bio);
-
-   bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
-   if (!bvl)
-   goto no_bvl;
-
-   bio-bi_max_vecs = nr_iovecs;
-   bio-bi_io_vec = bvl;
-   bio-bi_destructor = pkt_bio_destructor;
-
-   return bio;
-
- no_bvl:
-   kfree(bio);
- no_bio:
-   return NULL;
-}
-
 /*
  * Allocate a packet_data struct
  */
@@ -567,7 +535,7 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
goto no_pkt;
 
pkt-frames = frames;
-   pkt-w_bio = pkt_bio_alloc(frames);
+   pkt-w_bio = bio_kmalloc(GFP_KERNEL, frames);
if (!pkt-w_bio)
goto no_bio;
 
@@ -581,9 +549,10 @@ static struct packet_data *pkt_alloc_packet_data(int 
frames)
bio_list_init(pkt-orig_bios);
 
for (i = 0; i  frames; i++) {
-   struct bio *bio = pkt_bio_alloc(1);
+   struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
if (!bio)
goto no_rd_bio;
+
pkt-r_bios[i] = bio;
}
 
@@ -,21 +1080,17 @@ static void pkt_gather_data(struct pktcdvd_device *pd, 
struct packet_data *pkt)
 * Schedule reads for missing parts of the packet.
 */
for (f = 0; f  pkt-frames; f++) {
-   struct bio_vec *vec;
-
int p, offset;
+
if (written[f])
continue;
+
bio = pkt-r_bios[f];
-   vec = bio-bi_io_vec;
-   bio_init(bio);
-   bio-bi_max_vecs = 1;
+   bio_reset(bio);
bio-bi_sector = pkt-sector + f * (CD_FRAMESIZE  9);
bio-bi_bdev = pd-bdev;
bio-bi_end_io = pkt_end_io_read;
bio-bi_private = pkt;
-   bio-bi_io_vec = vec;
-   bio-bi_destructor = pkt_bio_destructor;
 
p = (f * CD_FRAMESIZE) / PAGE_SIZE;
offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
@@ -1418,14 +1383,11 @@ static void pkt_start_write(struct pktcdvd_device *pd, 
struct packet_data *pkt)
}
 
/* Start the write request */
-   bio_init(pkt-w_bio);
-   pkt-w_bio-bi_max_vecs = PACKET_MAX_SIZE;
+   bio_reset(pkt-w_bio);
pkt-w_bio-bi_sector = pkt-sector;
pkt-w_bio-bi_bdev = pd-bdev;
pkt-w_bio-bi_end_io = pkt_end_io_packet_write;
pkt-w_bio-bi_private = pkt;
-   pkt-w_bio-bi_io_vec = bvec;
-   pkt-w_bio-bi_destructor = pkt_bio_destructor;
for (f = 0; f  pkt-frames; f++)
if (!bio_add_page(pkt-w_bio, bvec[f].bv_page, CD_FRAMESIZE, 
bvec[f].bv_offset))
BUG();
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 4/8] block: Add bio_reset()

2012-09-07 Thread Kent Overstreet
On Thu, Sep 06, 2012 at 07:34:18PM -0600, Jens Axboe wrote:
 On 2012-09-06 16:34, Kent Overstreet wrote:
  Reusing bios is something that's been highly frowned upon in the past,
  but driver code keeps doing it anyways. If it's going to happen anyways,
  we should provide a generic method.
  
  This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
  was open coding it, by doing a bio_init() and resetting bi_destructor.
  
  This required reordering struct bio, but the block layer is not yet
  nearly fast enough for any cacheline effects to matter here.
 
 That's an odd and misplaced comment. Was just doing testing today at 5M
 IOPS, and even years back we've had cache effects for O_DIRECT in higher
 speed setups.

Ah, I wasn't aware that you were pushing that many iops through the
block layer - most I've tested myself was around 1M. It wouldn't
surprise me if cache effects in struct bio mattered around 5M...

 That said, we haven't done cache analysis in a long time. So moving
 members around isn't necessarily a huge deal.

Ok, good to know. I've got another patch coming later that reorders
struct bio a bit more, for immutable bvecs (bi_sector, bi_size, bi_idx
go into a struct bvec_iter together).

 Lastly, this isn't a great commit message for other reasons. Anyone can
 see that it moves members around. It'd be a lot better to explain _why_
 it is reordering the struct.

Yeah, I suppose so. Will keep that in mind for the next patch.

 
 BTW, I looked over the rest of the patches, and it looks OK to me.

Resent them. Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] Avoid deadlocks with bio allocation

2012-09-07 Thread Kent Overstreet
These patches were part of the block cleanups series I just sent out, but I
split them off. Nothing's changed with them lately, the last thing I added was
a bit of logic to the punt to rescue code to only punt bios that were
allocated from the current bio_set.

Kent Overstreet (2):
  block: Reorder struct bio_set
  block: Avoid deadlocks with bio allocation by stacking drivers

 fs/bio.c| 87 +++--
 include/linux/bio.h | 75 +
 2 files changed, 126 insertions(+), 36 deletions(-)

-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-07 Thread Kent Overstreet
Previously, if we ever try to allocate more than once from the same bio
set while running under generic_make_request() (i.e. a stacking block
driver), we risk deadlock.

This is because of the code in generic_make_request() that converts
recursion to iteration; any bios we submit won't actually be submitted
(so they can complete and eventually be freed) until after we return -
this means if we allocate a second bio, we're blocking the first one
from ever being freed.

Thus if enough threads call into a stacking block driver at the same
time with bios that need multiple splits, and the bio_set's reserve gets
used up, we deadlock.

This can be worked around in the driver code - we could check if we're
running under generic_make_request(), then mask out __GFP_WAIT when we
go to allocate a bio, and if the allocation fails punt to workqueue and
retry the allocation.

But this is tricky and not a generic solution. This patch solves it for
all users by inverting the previously described technique. We allocate a
rescuer workqueue for each bio_set, and then in the allocation code if
there are bios on current-bio_list we would be blocking, we punt them
to the rescuer workqueue to be submitted.

Tested it by forcing the rescue codepath to be taken (by disabling the
first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
of arbitrary bio splitting) and verified that the rescuer was being
invoked.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 fs/bio.c| 87 +++--
 include/linux/bio.h |  9 ++
 2 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 13e9567..244007f 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -295,6 +295,43 @@ void bio_reset(struct bio *bio)
 }
 EXPORT_SYMBOL(bio_reset);
 
+static void bio_alloc_rescue(struct work_struct *work)
+{
+   struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
+   struct bio *bio;
+
+   while (1) {
+   spin_lock(bs-rescue_lock);
+   bio = bio_list_pop(bs-rescue_list);
+   spin_unlock(bs-rescue_lock);
+
+   if (!bio)
+   break;
+
+   generic_make_request(bio);
+   }
+}
+
+static void punt_bios_to_rescuer(struct bio_set *bs)
+{
+   struct bio_list punt, nopunt;
+   struct bio *bio;
+
+   bio_list_init(punt);
+   bio_list_init(nopunt);
+
+   while ((bio = bio_list_pop(current-bio_list)))
+   bio_list_add(bio-bi_pool == bs ? punt : nopunt, bio);
+
+   *current-bio_list = nopunt;
+
+   spin_lock(bs-rescue_lock);
+   bio_list_merge(bs-rescue_list, punt);
+   spin_unlock(bs-rescue_lock);
+
+   queue_work(bs-rescue_workqueue, bs-rescue_work);
+}
+
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -317,6 +354,7 @@ EXPORT_SYMBOL(bio_reset);
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
+   gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
unsigned inline_vecs;
unsigned long idx = BIO_POOL_NONE;
@@ -334,13 +372,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int 
nr_iovecs, struct bio_set *bs)
front_pad = 0;
inline_vecs = nr_iovecs;
} else {
+   /*
+* generic_make_request() converts recursion to iteration; this
+* means if we're running beneath it, any bios we allocate and
+* submit will not be submitted (and thus freed) until after we
+* return.
+*
+* This exposes us to a potential deadlock if we allocate
+* multiple bios from the same bio_set() while running
+* underneath generic_make_request(). If we were to allocate
+* multiple bios (say a stacking block driver that was splitting
+* bios), we would deadlock if we exhausted the mempool's
+* reserve.
+*
+* We solve this, and guarantee forward progress, with a rescuer
+* workqueue per bio_set. If we go to allocate and there are
+* bios on current-bio_list, we first try the allocation
+* without __GFP_WAIT; if that fails, we punt those bios we
+* would be blocking to the rescuer workqueue before we retry
+* with the original gfp_flags.
+*/
+
+   if (current-bio_list  !bio_list_empty(current-bio_list))
+   gfp_mask = ~__GFP_WAIT;
+retry:
p = mempool_alloc(bs-bio_pool, gfp_mask);
front_pad = bs-front_pad;
inline_vecs = BIO_INLINE_VECS;
}
 
if (unlikely(!p))
-   return NULL;
+   goto err;
 
bio = p + front_pad

[PATCH 1/2] block: Reorder struct bio_set

2012-09-07 Thread Kent Overstreet
This is prep work for the next patch, which embeds a struct bio_list in
struct bio_set.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 include/linux/bio.h | 66 ++---
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 52b9cbc..a2bfe3a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -298,39 +298,6 @@ static inline int bio_associate_current(struct bio *bio) { 
return -ENOENT; }
 static inline void bio_disassociate_task(struct bio *bio) { }
 #endif /* CONFIG_BLK_CGROUP */
 
-/*
- * bio_set is used to allow other portions of the IO system to
- * allocate their own private memory pools for bio and iovec structures.
- * These memory pools in turn all allocate from the bio_slab
- * and the bvec_slabs[].
- */
-#define BIO_POOL_SIZE 2
-#define BIOVEC_NR_POOLS 6
-#define BIOVEC_MAX_IDX (BIOVEC_NR_POOLS - 1)
-
-struct bio_set {
-   struct kmem_cache *bio_slab;
-   unsigned int front_pad;
-
-   mempool_t *bio_pool;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
-   mempool_t *bio_integrity_pool;
-#endif
-   mempool_t *bvec_pool;
-};
-
-struct biovec_slab {
-   int nr_vecs;
-   char *name;
-   struct kmem_cache *slab;
-};
-
-/*
- * a small number of entries is fine, not going to be performance critical.
- * basically we just need to survive
- */
-#define BIO_SPLIT_ENTRIES 2
-
 #ifdef CONFIG_HIGHMEM
 /*
  * remember never ever reenable interrupts between a bvec_kmap_irq and
@@ -505,6 +472,39 @@ static inline struct bio *bio_list_get(struct bio_list *bl)
return bio;
 }
 
+/*
+ * bio_set is used to allow other portions of the IO system to
+ * allocate their own private memory pools for bio and iovec structures.
+ * These memory pools in turn all allocate from the bio_slab
+ * and the bvec_slabs[].
+ */
+#define BIO_POOL_SIZE 2
+#define BIOVEC_NR_POOLS 6
+#define BIOVEC_MAX_IDX (BIOVEC_NR_POOLS - 1)
+
+struct bio_set {
+   struct kmem_cache *bio_slab;
+   unsigned int front_pad;
+
+   mempool_t *bio_pool;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+   mempool_t *bio_integrity_pool;
+#endif
+   mempool_t *bvec_pool;
+};
+
+struct biovec_slab {
+   int nr_vecs;
+   char *name;
+   struct kmem_cache *slab;
+};
+
+/*
+ * a small number of entries is fine, not going to be performance critical.
+ * basically we just need to survive
+ */
+#define BIO_SPLIT_ENTRIES 2
+
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 
 #define bip_vec_idx(bip, idx)  ((bip-bip_vec[(idx)]))
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 4/8] block: Add bio_reset()

2012-09-07 Thread Kent Overstreet
On Fri, Sep 07, 2012 at 04:06:45PM -0600, Jens Axboe wrote:
 On 2012-09-07 15:55, Jens Axboe wrote:
  On 2012-09-07 14:58, Kent Overstreet wrote:
  On Thu, Sep 06, 2012 at 07:34:18PM -0600, Jens Axboe wrote:
  On 2012-09-06 16:34, Kent Overstreet wrote:
  Reusing bios is something that's been highly frowned upon in the past,
  but driver code keeps doing it anyways. If it's going to happen anyways,
  we should provide a generic method.
 
  This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
  was open coding it, by doing a bio_init() and resetting bi_destructor.
 
  This required reordering struct bio, but the block layer is not yet
  nearly fast enough for any cacheline effects to matter here.
 
  That's an odd and misplaced comment. Was just doing testing today at 5M
  IOPS, and even years back we've had cache effects for O_DIRECT in higher
  speed setups.
 
  Ah, I wasn't aware that you were pushing that many iops through the
  block layer - most I've tested myself was around 1M. It wouldn't
  surprise me if cache effects in struct bio mattered around 5M...
  
  5M is nothing, just did 13.5M :-)
  
  But we can reshuffle for now. As mentioned, we're way overdue for a
  decent look at cache profiling in any case.
 
 No ill effects seen so far, fwiw:
 
   read : io=1735.8GB, bw=53690MB/s, iops=13745K, runt= 33104msec

Cool!

I'd be really curious to see a profile. Of the patches I've got queued
up I don't think anything's going to significantly affect performance
yet, but I'm hoping the cleanups/immutable bvec stuff/efficient bio
splitting enables some performance gains.

Well, it certainly will for stacking drivers, but I'm less sure what
it's going to look like running on just a raw flash device.

My end goal is making generic_make_request handle arbitrary sized bios,
and have (efficient) splitting happen as required. This'll get rid of a
bunch of code and complexity in the upper layers, in bio_add_page() and
elsewhere. More in the stacking drivers - merge_bvec_fn is horrendous to
support.

I think I might be able to efficiently get rid of the
segments-after-merging precalculating, and just have segments merged
once. That'd get rid of a couple fields in struct bio, and get it under
2 cachelines last I counted.

Course, all this doesn't matter as much for 4k bios so it may just be a
wash for you.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/9] Prep work for immutable bio vecs

2012-09-07 Thread Kent Overstreet
Random assortment of refactoring and trivial cleanups.

Immutable bio vecs and efficient bio splitting require auditing and removing
pretty much all bi_idx uses, among other things.

The reason is that with immutable bio vecs we can't use the bvec array
directly; if we have a partially completed bvec, that'll be indicated with a
field in struct bvec_iter (which gets embedded in struct bio) - bi_bvec_done.

bio_for_each_segments() will handle this transparently, so code needs to be
converted to use it or some other generic accessor.

Also, bio splitting means that when a driver gets a bio, bi_idx and
bi_bvec_done may both be nonzero. Again, just need to use generic accessors.

Kent Overstreet (9):
  block: Convert integrity to bvec_alloc_bs(), and a bugfix
  block: Add bio_advance()
  block: Refactor blk_update_request()
  md: Convert md_trim_bio() to use bio_advance()
  block: Add bio_end()
  block: Use bio_sectors() more consistently
  block: Don't use bi_idx in bio_split() or require it to be 0
  block: Remove bi_idx references
  block: Remove some unnecessary bi_vcnt usage

 block/blk-core.c |  86 -
 block/cfq-iosched.c  |   7 +-
 block/deadline-iosched.c |   2 +-
 drivers/block/aoe/aoeblk.c   |   2 +-
 drivers/block/aoe/aoecmd.c   |   2 +-
 drivers/block/brd.c  |   3 +-
 drivers/block/drbd/drbd_req.c|   8 +-
 drivers/block/floppy.c   |   1 -
 drivers/block/pktcdvd.c  |   8 +-
 drivers/block/ps3vram.c  |   2 +-
 drivers/md/dm-raid1.c|   2 +-
 drivers/md/dm-stripe.c   |   2 +-
 drivers/md/dm-verity.c   |   4 +-
 drivers/md/faulty.c  |   6 +-
 drivers/md/linear.c  |   3 +-
 drivers/md/md.c  |  19 ++---
 drivers/md/raid0.c   |   9 +--
 drivers/md/raid1.c   |  21 +++--
 drivers/md/raid10.c  |  28 +++
 drivers/md/raid5.c   |  22 +++---
 drivers/message/fusion/mptsas.c  |   6 +-
 drivers/s390/block/dcssblk.c |   3 +-
 drivers/scsi/libsas/sas_expander.c   |   6 +-
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  10 +--
 fs/bio-integrity.c   | 128 ++-
 fs/bio.c |  48 +++-
 fs/btrfs/extent_io.c |   3 +-
 fs/buffer.c  |   1 -
 fs/gfs2/lops.c   |   2 +-
 fs/jfs/jfs_logmgr.c  |   2 -
 fs/logfs/dev_bdev.c  |   5 --
 include/linux/bio.h  |   8 +-
 include/trace/events/block.h |  10 +--
 mm/page_io.c |   1 -
 34 files changed, 192 insertions(+), 278 deletions(-)

-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/9] md: Convert md_trim_bio() to use bio_advance()

2012-09-07 Thread Kent Overstreet
Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/md.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7a2b079..51ce48c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -190,25 +190,16 @@ void md_trim_bio(struct bio *bio, int offset, int size)
struct bio_vec *bvec;
int sofar = 0;
 
-   size = 9;
if (offset == 0  size == bio-bi_size)
return;
 
-   bio-bi_sector += offset;
-   bio-bi_size = size;
-   offset = 9;
clear_bit(BIO_SEG_VALID, bio-bi_flags);
 
-   while (bio-bi_idx  bio-bi_vcnt 
-  bio-bi_io_vec[bio-bi_idx].bv_len = offset) {
-   /* remove this whole bio_vec */
-   offset -= bio-bi_io_vec[bio-bi_idx].bv_len;
-   bio-bi_idx++;
-   }
-   if (bio-bi_idx  bio-bi_vcnt) {
-   bio-bi_io_vec[bio-bi_idx].bv_offset += offset;
-   bio-bi_io_vec[bio-bi_idx].bv_len -= offset;
-   }
+   bio_advance(bio, offset  9);
+
+   size = 9;
+   bio-bi_size = size;
+
/* avoid any complications with bi_idx being non-zero*/
if (bio-bi_idx) {
memmove(bio-bi_io_vec, bio-bi_io_vec+bio-bi_idx,
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/9] block: Remove some unnecessary bi_vcnt usage

2012-09-07 Thread Kent Overstreet
More prep work for immutable bvecs/effecient bio splitting - usage of
bi_vcnt has to be auditing, so getting rid of all the unnecessary usage
makes that easier.

Plus, bio_segments() is really what this code wanted, as it respects the
current value of bi_idx.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/message/fusion/mptsas.c  |  6 +++---
 drivers/scsi/libsas/sas_expander.c   |  6 +++---
 drivers/scsi/mpt2sas/mpt2sas_transport.c | 10 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 551262e..5406a9f 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -2235,10 +2235,10 @@ static int mptsas_smp_handler(struct Scsi_Host *shost, 
struct sas_rphy *rphy,
}
 
/* do we need to support multiple segments? */
-   if (req-bio-bi_vcnt  1 || rsp-bio-bi_vcnt  1) {
+   if (bio_segments(req-bio)  1 || bio_segments(rsp-bio)  1) {
printk(MYIOC_s_ERR_FMT %s: multiple segments req %u %u, rsp %u 
%u\n,
-   ioc-name, __func__, req-bio-bi_vcnt, blk_rq_bytes(req),
-   rsp-bio-bi_vcnt, blk_rq_bytes(rsp));
+   ioc-name, __func__, bio_segments(req-bio), 
blk_rq_bytes(req),
+   bio_segments(rsp-bio), blk_rq_bytes(rsp));
return -EINVAL;
}
 
diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
index efc6e72..ee331a7 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2151,10 +2151,10 @@ int sas_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
}
 
/* do we need to support multiple segments? */
-   if (req-bio-bi_vcnt  1 || rsp-bio-bi_vcnt  1) {
+   if (bio_segments(req-bio)  1 || bio_segments(rsp-bio)  1) {
printk(%s: multiple segments req %u %u, rsp %u %u\n,
-  __func__, req-bio-bi_vcnt, blk_rq_bytes(req),
-  rsp-bio-bi_vcnt, blk_rq_bytes(rsp));
+  __func__, bio_segments(req-bio), blk_rq_bytes(req),
+  bio_segments(rsp-bio), blk_rq_bytes(rsp));
return -EINVAL;
}
 
diff --git a/drivers/scsi/mpt2sas/mpt2sas_transport.c 
b/drivers/scsi/mpt2sas/mpt2sas_transport.c
index c6cf20f..403a57b 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_transport.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_transport.c
@@ -1939,7 +1939,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
ioc-transport_cmds.status = MPT2_CMD_PENDING;
 
/* Check if the request is split across multiple segments */
-   if (req-bio-bi_vcnt  1) {
+   if (bio_segments(req-bio)  1) {
u32 offset = 0;
 
/* Allocate memory and copy the request */
@@ -1971,7 +1971,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
 
/* Check if the response needs to be populated across
 * multiple segments */
-   if (rsp-bio-bi_vcnt  1) {
+   if (bio_segments(rsp-bio)  1) {
pci_addr_in = pci_alloc_consistent(ioc-pdev, blk_rq_bytes(rsp),
pci_dma_in);
if (!pci_addr_in) {
@@ -2038,7 +2038,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
sgl_flags = sgl_flags  MPI2_SGE_FLAGS_SHIFT;
-   if (req-bio-bi_vcnt  1) {
+   if (bio_segments(req-bio)  1) {
ioc-base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(req) - 4), pci_dma_out);
} else {
@@ -2054,7 +2054,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
MPI2_SGE_FLAGS_END_OF_LIST);
sgl_flags = sgl_flags  MPI2_SGE_FLAGS_SHIFT;
-   if (rsp-bio-bi_vcnt  1) {
+   if (bio_segments(rsp-bio)  1) {
ioc-base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(rsp) + 4), pci_dma_in);
} else {
@@ -2099,7 +2099,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
le16_to_cpu(mpi_reply-ResponseDataLength);
/* check if the resp needs to be copied from the allocated
 * pci mem */
-   if (rsp-bio-bi_vcnt  1) {
+   if (bio_segments(rsp-bio)  1) {
u32 offset = 0;
u32 bytes_to_copy =
le16_to_cpu(mpi_reply-ResponseDataLength);
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo

[PATCH 8/9] block: Remove bi_idx references

2012-09-07 Thread Kent Overstreet
These were harmless but uneccessary,andt getting rid of them makes the
code easier to audit since most of them need to be removed.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/block/floppy.c | 1 -
 drivers/md/dm-verity.c | 2 +-
 drivers/md/raid10.c| 1 -
 fs/buffer.c| 1 -
 fs/jfs/jfs_logmgr.c| 2 --
 fs/logfs/dev_bdev.c| 5 -
 mm/page_io.c   | 1 -
 7 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index a7d6347..2941ce7 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3778,7 +3778,6 @@ static int __floppy_read_block_0(struct block_device 
*bdev)
bio_vec.bv_len = size;
bio_vec.bv_offset = 0;
bio.bi_vcnt = 1;
-   bio.bi_idx = 0;
bio.bi_size = size;
bio.bi_bdev = bdev;
bio.bi_sector = 0;
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 18ef6c5..6956626 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -496,7 +496,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio,
 
bio-bi_end_io = verity_end_io;
bio-bi_private = io;
-   io-io_vec_size = bio-bi_vcnt - bio-bi_idx;
+   io-io_vec_size = bio_segments(bio);
if (io-io_vec_size  DM_VERITY_IO_VEC_INLINE)
io-io_vec = io-io_vec_inline;
else
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index bbd08f5..6d06d83 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4249,7 +4249,6 @@ read_more:
read_bio-bi_flags = ~(BIO_POOL_MASK - 1);
read_bio-bi_flags |= 1  BIO_UPTODATE;
read_bio-bi_vcnt = 0;
-   read_bio-bi_idx = 0;
read_bio-bi_size = 0;
r10_bio-master_bio = read_bio;
r10_bio-read_slot = r10_bio-devs[r10_bio-read_slot].devnum;
diff --git a/fs/buffer.c b/fs/buffer.c
index 58e2e7b..38d8793 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2893,7 +2893,6 @@ int submit_bh(int rw, struct buffer_head * bh)
bio-bi_io_vec[0].bv_offset = bh_offset(bh);
 
bio-bi_vcnt = 1;
-   bio-bi_idx = 0;
bio-bi_size = bh-b_size;
 
bio-bi_end_io = end_bio_bh_io_sync;
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 2eb952c..8ae5e35 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2004,7 +2004,6 @@ static int lbmRead(struct jfs_log * log, int pn, struct 
lbuf ** bpp)
bio-bi_io_vec[0].bv_offset = bp-l_offset;
 
bio-bi_vcnt = 1;
-   bio-bi_idx = 0;
bio-bi_size = LOGPSIZE;
 
bio-bi_end_io = lbmIODone;
@@ -2145,7 +2144,6 @@ static void lbmStartIO(struct lbuf * bp)
bio-bi_io_vec[0].bv_offset = bp-l_offset;
 
bio-bi_vcnt = 1;
-   bio-bi_idx = 0;
bio-bi_size = LOGPSIZE;
 
bio-bi_end_io = lbmIODone;
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index e784a21..550475c 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -32,7 +32,6 @@ static int sync_request(struct page *page, struct 
block_device *bdev, int rw)
bio_vec.bv_len = PAGE_SIZE;
bio_vec.bv_offset = 0;
bio.bi_vcnt = 1;
-   bio.bi_idx = 0;
bio.bi_size = PAGE_SIZE;
bio.bi_bdev = bdev;
bio.bi_sector = page-index * (PAGE_SIZE  9);
@@ -108,7 +107,6 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, 
pgoff_t index,
if (i = max_pages) {
/* Block layer cannot split bios :( */
bio-bi_vcnt = i;
-   bio-bi_idx = 0;
bio-bi_size = i * PAGE_SIZE;
bio-bi_bdev = super-s_bdev;
bio-bi_sector = ofs  9;
@@ -136,7 +134,6 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, 
pgoff_t index,
unlock_page(page);
}
bio-bi_vcnt = nr_pages;
-   bio-bi_idx = 0;
bio-bi_size = nr_pages * PAGE_SIZE;
bio-bi_bdev = super-s_bdev;
bio-bi_sector = ofs  9;
@@ -202,7 +199,6 @@ static int do_erase(struct super_block *sb, u64 ofs, 
pgoff_t index,
if (i = max_pages) {
/* Block layer cannot split bios :( */
bio-bi_vcnt = i;
-   bio-bi_idx = 0;
bio-bi_size = i * PAGE_SIZE;
bio-bi_bdev = super-s_bdev;
bio-bi_sector = ofs  9;
@@ -224,7 +220,6 @@ static int do_erase(struct super_block *sb, u64 ofs, 
pgoff_t index,
bio-bi_io_vec[i].bv_offset = 0;
}
bio-bi_vcnt = nr_pages;
-   bio-bi_idx = 0;
bio-bi_size = nr_pages * PAGE_SIZE;
bio-bi_bdev = super-s_bdev;
bio-bi_sector = ofs  9;
diff --git a/mm/page_io.c b/mm/page_io.c
index 78eee32..8d3c0c0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -35,7 +35,6 @@ static struct bio *get_swap_bio(gfp_t gfp_flags

[PATCH 7/9] block: Don't use bi_idx in bio_split() or require it to be 0

2012-09-07 Thread Kent Overstreet
Prep work for immutable bio_vecs/efficient bio splitting: they require
auditing and removing most uses of bi_idx.

So here we convert bio_split() to respect the current value of bi_idx
and use the bio_iovec() macro, instead of assuming bi_idx will be 0.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/block/drbd/drbd_req.c | 6 +++---
 drivers/md/raid0.c| 3 +--
 drivers/md/raid10.c   | 3 +--
 fs/bio-integrity.c| 4 ++--
 fs/bio.c  | 7 +++
 5 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index af69a96..57eb253 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1155,11 +1155,11 @@ void drbd_make_request(struct request_queue *q, struct 
bio *bio)
 
/* can this bio be split generically?
 * Maybe add our own split-arbitrary-bios function. */
-   if (bio-bi_vcnt != 1 || bio-bi_idx != 0 || bio-bi_size  
DRBD_MAX_BIO_SIZE) {
+   if (bio_segments(bio) != 1 || bio-bi_size  DRBD_MAX_BIO_SIZE) {
/* rather error out here than BUG in bio_split */
dev_err(DEV, bio would need to, but cannot, be split: 
-   (vcnt=%u,idx=%u,size=%u,sector=%llu)\n,
-   bio-bi_vcnt, bio-bi_idx, bio-bi_size,
+   (segments=%u,size=%u,sector=%llu)\n,
+   bio_segments(bio), bio-bi_size,
(unsigned long long)bio-bi_sector);
bio_endio(bio, -EINVAL);
} else {
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 387cb89..0587450 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -509,8 +509,7 @@ static void raid0_make_request(struct mddev *mddev, struct 
bio *bio)
sector_t sector = bio-bi_sector;
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this 
happening */
-   if (bio-bi_vcnt != 1 ||
-   bio-bi_idx != 0)
+   if (bio_segments(bio) != 1)
goto bad_map;
/* This is a one page bio that upper layers
 * refuse to split for us, so we need to split it.
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9715aaf..bbd08f5 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1081,8 +1081,7 @@ static void make_request(struct mddev *mddev, struct bio 
* bio)
 || conf-prev.near_copies  conf-prev.raid_disks))) {
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this 
happening */
-   if (bio-bi_vcnt != 1 ||
-   bio-bi_idx != 0)
+   if (bio_segments(bio) != 1)
goto bad_map;
/* This is a one page bio that upper layers
 * refuse to split for us, so we need to split it.
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 1d64f7f..e8555a5 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -657,8 +657,8 @@ void bio_integrity_split(struct bio *bio, struct bio_pair 
*bp, int sectors)
bp-bio1.bi_integrity = bp-bip1;
bp-bio2.bi_integrity = bp-bip2;
 
-   bp-iv1 = bip-bip_vec[0];
-   bp-iv2 = bip-bip_vec[0];
+   bp-iv1 = bip-bip_vec[bip-bip_idx];
+   bp-iv2 = bip-bip_vec[bip-bip_idx];
 
bp-bip1.bip_vec = bp-iv1;
bp-bip2.bip_vec = bp-iv2;
diff --git a/fs/bio.c b/fs/bio.c
index a539664..126b264 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1585,8 +1585,7 @@ struct bio_pair *bio_split(struct bio *bi, int 
first_sectors)
trace_block_split(bdev_get_queue(bi-bi_bdev), bi,
bi-bi_sector + first_sectors);
 
-   BUG_ON(bi-bi_vcnt != 1);
-   BUG_ON(bi-bi_idx != 0);
+   BUG_ON(bio_segments(bi) != 1);
atomic_set(bp-cnt, 3);
bp-error = 0;
bp-bio1 = *bi;
@@ -1595,8 +1594,8 @@ struct bio_pair *bio_split(struct bio *bi, int 
first_sectors)
bp-bio2.bi_size -= first_sectors  9;
bp-bio1.bi_size = first_sectors  9;
 
-   bp-bv1 = bi-bi_io_vec[0];
-   bp-bv2 = bi-bi_io_vec[0];
+   bp-bv1 = *bio_iovec(bi);
+   bp-bv2 = *bio_iovec(bi);
bp-bv2.bv_offset += first_sectors  9;
bp-bv2.bv_len -= first_sectors  9;
bp-bv1.bv_len = first_sectors  9;
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/9] block: Use bio_sectors() more consistently

2012-09-07 Thread Kent Overstreet
Bunch of places in the code weren't using it where they could be -
this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
into a struct bvec_iter.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/block/aoe/aoeblk.c   |  2 +-
 drivers/block/aoe/aoecmd.c   |  2 +-
 drivers/block/brd.c  |  3 +--
 drivers/block/pktcdvd.c  |  2 +-
 drivers/block/ps3vram.c  |  2 +-
 drivers/md/dm-raid1.c|  2 +-
 drivers/md/raid0.c   |  6 +++---
 drivers/md/raid1.c   | 17 -
 drivers/md/raid10.c  | 24 +++-
 drivers/md/raid5.c   |  8 
 include/trace/events/block.h | 10 +-
 11 files changed, 37 insertions(+), 41 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 321de7b..6e4420a 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -199,7 +199,7 @@ aoeblk_make_request(struct request_queue *q, struct bio 
*bio)
buf-bio = bio;
buf-resid = bio-bi_size;
buf-sector = bio-bi_sector;
-   buf-bv = bio-bi_io_vec[bio-bi_idx];
+   buf-bv = bio_iovec(bio);
buf-bv_resid = buf-bv-bv_len;
WARN_ON(buf-bv_resid == 0);
buf-bv_off = buf-bv-bv_offset;
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index de0435e..2b52ebc 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -720,7 +720,7 @@ gettgt(struct aoedev *d, char *addr)
 static inline void
 diskstats(struct gendisk *disk, struct bio *bio, ulong duration, sector_t 
sector)
 {
-   unsigned long n_sect = bio-bi_size  9;
+   unsigned long n_sect = bio_sectors(bio);
const int rw = bio_data_dir(bio);
struct hd_struct *part;
int cpu;
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 531ceb3..d5c4978 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -334,8 +334,7 @@ static void brd_make_request(struct request_queue *q, 
struct bio *bio)
int err = -EIO;
 
sector = bio-bi_sector;
-   if (sector + (bio-bi_size  SECTOR_SHIFT) 
-   get_capacity(bdev-bd_disk))
+   if (sector + bio_sectors(bio)  get_capacity(bdev-bd_disk))
goto out;
 
if (unlikely(bio-bi_rw  REQ_DISCARD)) {
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 8df3216..0824627 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2433,7 +2433,7 @@ static void pkt_make_request(struct request_queue *q, 
struct bio *bio)
cloned_bio-bi_bdev = pd-bdev;
cloned_bio-bi_private = psd;
cloned_bio-bi_end_io = pkt_end_io_read_cloned;
-   pd-stats.secs_r += bio-bi_size  9;
+   pd-stats.secs_r += bio_sectors(bio);
pkt_queue_bio(pd, cloned_bio);
return;
}
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index f58cdcf..1ff38e8 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -553,7 +553,7 @@ static struct bio *ps3vram_do_bio(struct 
ps3_system_bus_device *dev,
struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
int write = bio_data_dir(bio) == WRITE;
const char *op = write ? write : read;
-   loff_t offset = bio-bi_sector  9;
+   loff_t offset = bio_sectors(bio);
int error = 0;
struct bio_vec *bvec;
unsigned int i;
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index bc5ddba8..3dac2de 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -457,7 +457,7 @@ static void map_region(struct dm_io_region *io, struct 
mirror *m,
 {
io-bdev = m-dev-bdev;
io-sector = map_sector(m, bio);
-   io-count = bio-bi_size  9;
+   io-count = bio_sectors(bio);
 }
 
 static void hold_bio(struct mirror_set *ms, struct bio *bio)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index de63a1f..387cb89 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -484,11 +484,11 @@ static inline int is_io_in_chunk_boundary(struct mddev 
*mddev,
 {
if (likely(is_power_of_2(chunk_sects))) {
return chunk_sects = ((bio-bi_sector  (chunk_sects-1))
-   + (bio-bi_size  9));
+   + bio_sectors(bio));
} else{
sector_t sector = bio-bi_sector;
return chunk_sects = (sector_div(sector, chunk_sects)
-   + (bio-bi_size  9));
+   + bio_sectors(bio));
}
 }
 
@@ -542,7 +542,7 @@ bad_map:
printk(md/raid0:%s: make_request bug: can't convert block across 
chunks
or bigger than %dk %llu %d\n,
   mdname(mddev), chunk_sects / 2,
-  (unsigned long long)bio-bi_sector, bio

[PATCH 5/9] block: Add bio_end()

2012-09-07 Thread Kent Overstreet
Just a little convenience macro - main reason to add it now is preparing
for immutable bio vecs, it'll reduce the size of the patch that puts
bi_sector/bi_size/bi_idx into a struct bvec_iter.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 block/blk-core.c  |  2 +-
 block/cfq-iosched.c   |  7 ++-
 block/deadline-iosched.c  |  2 +-
 drivers/block/drbd/drbd_req.c |  2 +-
 drivers/block/pktcdvd.c   |  6 +++---
 drivers/md/dm-stripe.c|  2 +-
 drivers/md/dm-verity.c|  2 +-
 drivers/md/faulty.c   |  6 ++
 drivers/md/linear.c   |  3 +--
 drivers/md/raid1.c|  4 ++--
 drivers/md/raid5.c| 14 +++---
 drivers/s390/block/dcssblk.c  |  3 +--
 fs/btrfs/extent_io.c  |  3 +--
 fs/gfs2/lops.c|  2 +-
 include/linux/bio.h   |  1 +
 15 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 94cbdcc..d405e45 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1559,7 +1559,7 @@ static void handle_bad_sector(struct bio *bio)
printk(KERN_INFO %s: rw=%ld, want=%Lu, limit=%Lu\n,
bdevname(bio-bi_bdev, b),
bio-bi_rw,
-   (unsigned long long)bio-bi_sector + bio_sectors(bio),
+   (unsigned long long)bio_end(bio),
(long long)(i_size_read(bio-bi_bdev-bd_inode)  9));
 
set_bit(BIO_EOF, bio-bi_flags);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index fb52df9..8eae0f3 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1883,11 +1883,8 @@ cfq_find_rq_fmerge(struct cfq_data *cfqd, struct bio 
*bio)
return NULL;
 
cfqq = cic_to_cfqq(cic, cfq_bio_sync(bio));
-   if (cfqq) {
-   sector_t sector = bio-bi_sector + bio_sectors(bio);
-
-   return elv_rb_find(cfqq-sort_list, sector);
-   }
+   if (cfqq)
+   return elv_rb_find(cfqq-sort_list, bio_end(bio));
 
return NULL;
 }
diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index 599b12e..a3b4df9 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -132,7 +132,7 @@ deadline_merge(struct request_queue *q, struct request 
**req, struct bio *bio)
 * check for front merge
 */
if (dd-front_merges) {
-   sector_t sector = bio-bi_sector + bio_sectors(bio);
+   sector_t sector = bio_end(bio);
 
__rq = elv_rb_find(dd-sort_list[bio_data_dir(bio)], sector);
if (__rq) {
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 01b2ac6..af69a96 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, struct 
bio *bio)
/* to make some things easier, force alignment of requests within the
 * granularity of our hash tables */
s_enr = bio-bi_sector  HT_SHIFT;
-   e_enr = bio-bi_size ? (bio-bi_sector+(bio-bi_size9)-1)  HT_SHIFT 
: s_enr;
+   e_enr = (bio_end(bio) - 1)  HT_SHIFT;
 
if (likely(s_enr == e_enr)) {
do {
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 2e7de7a..8df3216 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -901,7 +901,7 @@ static void pkt_iosched_process_queue(struct pktcdvd_device 
*pd)
pd-iosched.successive_reads += bio-bi_size  10;
else {
pd-iosched.successive_reads = 0;
-   pd-iosched.last_write = bio-bi_sector + 
bio_sectors(bio);
+   pd-iosched.last_write = bio_end(bio);
}
if (pd-iosched.successive_reads = HI_SPEED_SWITCH) {
if (pd-read_speed == pd-write_speed) {
@@ -2454,7 +2454,7 @@ static void pkt_make_request(struct request_queue *q, 
struct bio *bio)
zone = ZONE(bio-bi_sector, pd);
VPRINTK(pkt_make_request: start = %6llx stop = %6llx\n,
(unsigned long long)bio-bi_sector,
-   (unsigned long long)(bio-bi_sector + bio_sectors(bio)));
+   (unsigned long long)bio_end(bio));
 
/* Check if we have to split the bio */
{
@@ -2462,7 +2462,7 @@ static void pkt_make_request(struct request_queue *q, 
struct bio *bio)
sector_t last_zone;
int first_sectors;
 
-   last_zone = ZONE(bio-bi_sector + bio_sectors(bio) - 1, pd);
+   last_zone = ZONE(bio_end(bio) - 1, pd);
if (last_zone != zone) {
BUG_ON(last_zone != zone + pd-settings.size);
first_sectors = last_zone - bio-bi_sector;
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index a087bf2..047dd08 100644
--- a/drivers/md/dm

[PATCH 3/9] block: Refactor blk_update_request()

2012-09-07 Thread Kent Overstreet
Converts it to use bio_advance(), simplifying it quite a bit in the
process.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 block/blk-core.c | 84 +++-
 1 file changed, 16 insertions(+), 68 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 82aab28..94cbdcc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -153,25 +153,19 @@ EXPORT_SYMBOL(blk_rq_init);
 static void req_bio_endio(struct request *rq, struct bio *bio,
  unsigned int nbytes, int error)
 {
+   /*
+* XXX: bio_endio() does this. only need this because of the weird
+* flush seq thing.
+*/
if (error)
clear_bit(BIO_UPTODATE, bio-bi_flags);
else if (!test_bit(BIO_UPTODATE, bio-bi_flags))
error = -EIO;
 
-   if (unlikely(nbytes  bio-bi_size)) {
-   printk(KERN_ERR %s: want %u bytes done, %u left\n,
-  __func__, nbytes, bio-bi_size);
-   nbytes = bio-bi_size;
-   }
-
if (unlikely(rq-cmd_flags  REQ_QUIET))
set_bit(BIO_QUIET, bio-bi_flags);
 
-   bio-bi_size -= nbytes;
-   bio-bi_sector += (nbytes  9);
-
-   if (bio_integrity(bio))
-   bio_integrity_advance(bio, nbytes);
+   bio_advance(bio, nbytes);
 
/* don't actually finish bio if it's part of flush sequence */
if (bio-bi_size == 0  !(rq-cmd_flags  REQ_FLUSH_SEQ))
@@ -2216,8 +2210,7 @@ EXPORT_SYMBOL(blk_fetch_request);
  **/
 bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 {
-   int total_bytes, bio_nbytes, next_idx = 0;
-   struct bio *bio;
+   int total_bytes;
 
if (!req-bio)
return false;
@@ -2261,56 +2254,21 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
 
blk_account_io_completion(req, nr_bytes);
 
-   total_bytes = bio_nbytes = 0;
-   while ((bio = req-bio) != NULL) {
-   int nbytes;
+   total_bytes = 0;
+   while (req-bio) {
+   struct bio *bio = req-bio;
+   unsigned bio_bytes = min(bio-bi_size, nr_bytes);
 
-   if (nr_bytes = bio-bi_size) {
+   if (bio_bytes == bio-bi_size)
req-bio = bio-bi_next;
-   nbytes = bio-bi_size;
-   req_bio_endio(req, bio, nbytes, error);
-   next_idx = 0;
-   bio_nbytes = 0;
-   } else {
-   int idx = bio-bi_idx + next_idx;
-
-   if (unlikely(idx = bio-bi_vcnt)) {
-   blk_dump_rq_flags(req, __end_that);
-   printk(KERN_ERR %s: bio idx %d = vcnt %d\n,
-  __func__, idx, bio-bi_vcnt);
-   break;
-   }
-
-   nbytes = bio_iovec_idx(bio, idx)-bv_len;
-   BIO_BUG_ON(nbytes  bio-bi_size);
-
-   /*
-* not a complete bvec done
-*/
-   if (unlikely(nbytes  nr_bytes)) {
-   bio_nbytes += nr_bytes;
-   total_bytes += nr_bytes;
-   break;
-   }
 
-   /*
-* advance to the next vector
-*/
-   next_idx++;
-   bio_nbytes += nbytes;
-   }
+   req_bio_endio(req, bio, bio_bytes, error);
 
-   total_bytes += nbytes;
-   nr_bytes -= nbytes;
+   total_bytes += bio_bytes;
+   nr_bytes -= bio_bytes;
 
-   bio = req-bio;
-   if (bio) {
-   /*
-* end more in this run, or just return 'not-done'
-*/
-   if (unlikely(nr_bytes = 0))
-   break;
-   }
+   if (!nr_bytes)
+   break;
}
 
/*
@@ -2326,16 +2284,6 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
return false;
}
 
-   /*
-* if the request wasn't completed, update state
-*/
-   if (bio_nbytes) {
-   req_bio_endio(req, bio, bio_nbytes, error);
-   bio-bi_idx += next_idx;
-   bio_iovec(bio)-bv_offset += nr_bytes;
-   bio_iovec(bio)-bv_len -= nr_bytes;
-   }
-
req-__data_len -= total_bytes;
req-buffer = bio_data(req-bio);
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http

[PATCH 2/9] block: Add bio_advance()

2012-09-07 Thread Kent Overstreet
This is prep work for immutable bio vecs; we first want to centralize
where bvecs are modified.

Next two patches convert some existing code to use this function.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 fs/bio.c| 41 +
 include/linux/bio.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index 244007f..a539664 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -719,6 +719,47 @@ int bio_add_page(struct bio *bio, struct page *page, 
unsigned int len,
 }
 EXPORT_SYMBOL(bio_add_page);
 
+/**
+ * bio_advance - increment/complete a bio by some number of bytes
+ * @bio:   bio to advance
+ * @bytes: number of bytes to complete
+ *
+ * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
+ * complete doesn't align with a bvec boundary, then bv_len and bv_offset will
+ * be updated on the last bvec as well.
+ *
+ * @bio will then represent the remaining, uncompleted portion of the io.
+ */
+void bio_advance(struct bio *bio, unsigned bytes)
+{
+   if (bio_integrity(bio))
+   bio_integrity_advance(bio, bytes);
+
+   bio-bi_sector += bytes  0;
+   bio-bi_size -= bytes;
+
+   if (!bio-bi_size)
+   return;
+
+   while (bytes) {
+   if (unlikely(bio-bi_idx = bio-bi_vcnt)) {
+   printk(KERN_ERR %s: bio idx %d = vcnt %d\n,
+  __func__, bio-bi_idx, bio-bi_vcnt);
+   break;
+   }
+
+   if (bytes = bio_iovec(bio)-bv_len) {
+   bytes -= bio_iovec(bio)-bv_len;
+   bio-bi_idx++;
+   } else {
+   bio_iovec(bio)-bv_len -= bytes;
+   bio_iovec(bio)-bv_offset += bytes;
+   bytes = 0;
+   }
+   }
+}
+EXPORT_SYMBOL(bio_advance);
+
 struct bio_map_data {
struct bio_vec *iovecs;
struct sg_iovec *sgvecs;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7873465..6763cdf 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -248,6 +248,8 @@ extern void bio_endio(struct bio *, int);
 struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
 
+void bio_advance(struct bio *, unsigned);
+
 extern void bio_init(struct bio *);
 extern void bio_reset(struct bio *);
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/9] block: Convert integrity to bvec_alloc_bs(), and a bugfix

2012-09-07 Thread Kent Overstreet
This adds a pointer to the bvec array to struct bio_integrity_payload,
instead of the bvecs always being inline; then the bvecs are allocated
with bvec_alloc_bs().

This is needed eventually for immutable bio vecs - immutable bvecs
aren't useful if we still have to copy them, hence the need for the
pointer. Less code is always nice too, though.

Also fix an amusing bug in bio_integrity_split() - struct bio_pair
doesn't have the integrity bvecs after the bio_integrity_payloads, so
there was a buffer overrun. The code was confusing pointers with arrays.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: Martin K. Petersen martin.peter...@oracle.com
---
 fs/bio-integrity.c  | 124 +---
 include/linux/bio.h |   5 ++-
 2 files changed, 43 insertions(+), 86 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index a3f28f3..1d64f7f 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -27,48 +27,11 @@
 #include linux/workqueue.h
 #include linux/slab.h
 
-struct integrity_slab {
-   struct kmem_cache *slab;
-   unsigned short nr_vecs;
-   char name[8];
-};
-
-#define IS(x) { .nr_vecs = x, .name = bip-__stringify(x) }
-struct integrity_slab bip_slab[BIOVEC_NR_POOLS] __read_mostly = {
-   IS(1), IS(4), IS(16), IS(64), IS(128), IS(BIO_MAX_PAGES),
-};
-#undef IS
+#define BIP_INLINE_VECS4
 
+static struct kmem_cache *bip_slab;
 static struct workqueue_struct *kintegrityd_wq;
 
-static inline unsigned int vecs_to_idx(unsigned int nr)
-{
-   switch (nr) {
-   case 1:
-   return 0;
-   case 2 ... 4:
-   return 1;
-   case 5 ... 16:
-   return 2;
-   case 17 ... 64:
-   return 3;
-   case 65 ... 128:
-   return 4;
-   case 129 ... BIO_MAX_PAGES:
-   return 5;
-   default:
-   BUG();
-   }
-}
-
-static inline int use_bip_pool(unsigned int idx)
-{
-   if (idx == BIOVEC_MAX_IDX)
-   return 1;
-
-   return 0;
-}
-
 /**
  * bio_integrity_alloc - Allocate integrity payload and attach it to bio
  * @bio:   bio to attach integrity metadata to
@@ -84,37 +47,38 @@ struct bio_integrity_payload *bio_integrity_alloc(struct 
bio *bio,
  unsigned int nr_vecs)
 {
struct bio_integrity_payload *bip;
-   unsigned int idx = vecs_to_idx(nr_vecs);
struct bio_set *bs = bio-bi_pool;
+   unsigned long idx = BIO_POOL_NONE;
+   unsigned inline_vecs;
+
+   if (!bs) {
+   bip = kmalloc(sizeof(struct bio_integrity_payload) +
+ sizeof(struct bio_vec) * nr_vecs, gfp_mask);
+   inline_vecs = nr_vecs;
+   } else {
+   bip = mempool_alloc(bs-bio_integrity_pool, gfp_mask);
+   inline_vecs = BIP_INLINE_VECS;
+   }
 
-   if (!bs)
-   bs = fs_bio_set;
-
-   BUG_ON(bio == NULL);
-   bip = NULL;
+   if (unlikely(!bip))
+   return NULL;
 
-   /* Lower order allocations come straight from slab */
-   if (!use_bip_pool(idx))
-   bip = kmem_cache_alloc(bip_slab[idx].slab, gfp_mask);
+   memset(bip, 0, sizeof(struct bio_integrity_payload));
 
-   /* Use mempool if lower order alloc failed or max vecs were requested */
-   if (bip == NULL) {
-   idx = BIOVEC_MAX_IDX;  /* so we free the payload properly later 
*/
-   bip = mempool_alloc(bs-bio_integrity_pool, gfp_mask);
-
-   if (unlikely(bip == NULL)) {
-   printk(KERN_ERR %s: could not alloc bip\n, __func__);
-   return NULL;
-   }
+   if (nr_vecs  inline_vecs) {
+   bip-bip_vec = bvec_alloc_bs(gfp_mask, nr_vecs, idx, bs);
+   if (!bip-bip_vec)
+   goto err;
}
 
-   memset(bip, 0, sizeof(*bip));
-
bip-bip_slab = idx;
bip-bip_bio = bio;
bio-bi_integrity = bip;
 
return bip;
+err:
+   mempool_free(bip, bs-bio_integrity_pool);
+   return NULL;
 }
 EXPORT_SYMBOL(bio_integrity_alloc);
 
@@ -130,20 +94,19 @@ void bio_integrity_free(struct bio *bio)
struct bio_integrity_payload *bip = bio-bi_integrity;
struct bio_set *bs = bio-bi_pool;
 
-   if (!bs)
-   bs = fs_bio_set;
-
-   BUG_ON(bip == NULL);
-
/* A cloned bio doesn't own the integrity metadata */
if (!bio_flagged(bio, BIO_CLONED)  !bio_flagged(bio, BIO_FS_INTEGRITY)
 bip-bip_buf != NULL)
kfree(bip-bip_buf);
 
-   if (use_bip_pool(bip-bip_slab))
+   if (bs) {
+   if (bip-bip_slab != BIO_POOL_NONE)
+   bvec_free_bs(bs, bip-bip_vec, bip-bip_slab);
+
mempool_free(bip, bs-bio_integrity_pool);
-   else
-   kmem_cache_free(bip_slab[bip

Re: [dm-devel] [PATCH v10 4/8] block: Add bio_reset()

2012-09-07 Thread Kent Overstreet
On Sat, Sep 08, 2012 at 12:14:33AM +0100, Alasdair G Kergon wrote:
 As I indicated already in this discussion, dm started to use
 merge_bvec_fn as a cheap way of avoiding splitting and this improved
 overall efficiency.  Often it's better to pay the small price of calling
 that function to ensure the bio is created the right size in the first
 place so it won't have to get split later.

When I say cheap, I mean _cheap_:

split = bio_clone_bioset(bio, gfp_flags, bs);

bio_advance(bio, sectors  9);
split-bi_iter.bi_size = sectors  9;

And the clone doesn't copy the bvecs - split-bi_io_vec ==
bio-bi_io_vec.

 I'm as yet unconvinced that removing merge_bvec_fn would be an overall
 win.  Some of Kent's other changes that make splitting cheaper will
 improve the balance in some situations, but that might be handled by
 simplifying the merge_bvec_fn calculations in those situations.
 (Or changing the mechanism to avoid repeating performing the mapping
 when it hasn't changed.)

The current situation is what causes you to repeatedly do the mapping
lookup, since you'll often get contiguous bios that don't need to be
split at the mapping level (because of other requirements of the
underlying devices or because implementing merge_bvec_fn correctly was
too hard).

Splitting only when required is going to _improve_ that.

 IOW Any proposal to remove merge_bvec_fn from dm needs careful 
 evaluation to ensure it doesn't introduce any significant
 performance regressions for some sets of users.

There's also the 1000+ lines of deleted code to consider. In my
immutable bvec branch I've deleted over 400 lines of code, and that's
without actually trying to delete code. Getting rid of merge_bvec_fn
deletes another 800 lines of code on top of that.

CPU wise, there won't be any performance regressions. The only cause for
concern I can think of is where the upper layer could've made use of
partial completions - i.e. it submitted a 1 mb bio instead of a bunch of
128k bios, but it could've made use of that first 128k if it went to a
different device and completed sooner.

Only thing I know of that'd be affected by that though is readahead, and
I have a couple ideas for easily solving that if it actually becomes an
issue.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-09 Thread Kent Overstreet
On Sat, Sep 08, 2012 at 12:36:41PM -0700, Tejun Heo wrote:
 (Restoring cc list from the previous discussion.  Please retain the cc
 list of the people who discussed in the previous postings.)
 
 On Fri, Sep 07, 2012 at 03:12:53PM -0700, Kent Overstreet wrote:
  But this is tricky and not a generic solution. This patch solves it for
  all users by inverting the previously described technique. We allocate a
  rescuer workqueue for each bio_set, and then in the allocation code if
  there are bios on current-bio_list we would be blocking, we punt them
  to the rescuer workqueue to be submitted.
 
 It would be great if this explanation can be expanded a bit.  Why does
 it make the deadlock condition go away?  What are the restrictions -
 e.g. using other mempools for additional per-bio data structure
 wouldn't work, right?

Ok, I'll do that. New patch below.

  +static void punt_bios_to_rescuer(struct bio_set *bs)
  +{
  +   struct bio_list punt, nopunt;
  +   struct bio *bio;
  +
  +   bio_list_init(punt);
  +   bio_list_init(nopunt);
  +
  +   while ((bio = bio_list_pop(current-bio_list)))
  +   bio_list_add(bio-bi_pool == bs ? punt : nopunt, bio);
  +
  +   *current-bio_list = nopunt;
 
 Why this is necessary needs explanation and it's done in rather
 unusual way.  I suppose the weirdness is from bio_list API
 restriction?

It's because bio_lists are singly linked, so deleting an entry from the
middle of the list would be a real pain - just much cleaner/simpler to
do it this way.

  +   spin_lock(bs-rescue_lock);
  +   bio_list_merge(bs-rescue_list, punt);
  +   spin_unlock(bs-rescue_lock);
  +
  +   queue_work(bs-rescue_workqueue, bs-rescue_work);
  +}
  +
   /**
* bio_alloc_bioset - allocate a bio for I/O
* @gfp_mask:   the GFP_ mask given to the slab allocator
  @@ -317,6 +354,7 @@ EXPORT_SYMBOL(bio_reset);
*/
   struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set 
  *bs)
   {
  +   gfp_t saved_gfp = gfp_mask;
  unsigned front_pad;
  unsigned inline_vecs;
  unsigned long idx = BIO_POOL_NONE;
  @@ -334,13 +372,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int 
  nr_iovecs, struct bio_set *bs)
  front_pad = 0;
  inline_vecs = nr_iovecs;
  } else {
  +   /*
  +* generic_make_request() converts recursion to iteration; this
  +* means if we're running beneath it, any bios we allocate and
  +* submit will not be submitted (and thus freed) until after we
  +* return.
  +*
  +* This exposes us to a potential deadlock if we allocate
  +* multiple bios from the same bio_set() while running
  +* underneath generic_make_request(). If we were to allocate
  +* multiple bios (say a stacking block driver that was splitting
  +* bios), we would deadlock if we exhausted the mempool's
  +* reserve.
  +*
  +* We solve this, and guarantee forward progress, with a rescuer
  +* workqueue per bio_set. If we go to allocate and there are
  +* bios on current-bio_list, we first try the allocation
  +* without __GFP_WAIT; if that fails, we punt those bios we
  +* would be blocking to the rescuer workqueue before we retry
  +* with the original gfp_flags.
  +*/
  +
  +   if (current-bio_list  !bio_list_empty(current-bio_list))
  +   gfp_mask = ~__GFP_WAIT;
  +retry:
  p = mempool_alloc(bs-bio_pool, gfp_mask);
  front_pad = bs-front_pad;
  inline_vecs = BIO_INLINE_VECS;
  }
 
 Wouldn't the following be better?
 
   p = mempool_alloc(bs-bi_pool, gfp_mask);
   if (unlikely(!p)  gfp_mask != saved_gfp) {
   punt_bios_to_rescuer(bs);
   p = mempool_alloc(bs-bi_pool, saved_gfp);
   }

That'd require duplicating the error handling in two different places -
once for the initial allocation, once for the bvec allocation. And I
really hate that writing code that does

alloc_something()
if (fail) {
alloc_something_again()
}

it just screams ugly to me.

 I really hope the usage restriction (don't mix with mempool) for
 bioset is clearly documented somewhere appropriate.

Good point, I'm adding it to bio_alloc_bioset's documentation.


commit 4edb21e0b749fc098c72edcb4f9abdeca6fc62cd
Author: Kent Overstreet koverstr...@google.com
Date:   Sun Sep 9 17:23:29 2012 -0700

block: Avoid deadlocks with bio allocation by stacking drivers

Previously, if we ever try to allocate more than once from the same bio
set while running under generic_make_request() (i.e. a stacking block
driver), we risk deadlock.

This is because of the code in generic_make_request() that converts
recursion to iteration; any bios we submit won't actually be submitted
(so they can complete and eventually be freed) until

Re: [PATCH 0/9] Prep work for immutable bio vecs

2012-09-09 Thread Kent Overstreet
On Sat, Sep 08, 2012 at 12:41:15PM -0700, Tejun Heo wrote:
 Hello, Kent.
 
 On Fri, Sep 07, 2012 at 03:59:11PM -0700, Kent Overstreet wrote:
  Random assortment of refactoring and trivial cleanups.
  
  Immutable bio vecs and efficient bio splitting require auditing and removing
  pretty much all bi_idx uses, among other things.
  
  The reason is that with immutable bio vecs we can't use the bvec array
  directly; if we have a partially completed bvec, that'll be indicated with a
  field in struct bvec_iter (which gets embedded in struct bio) - 
  bi_bvec_done.
  
  bio_for_each_segments() will handle this transparently, so code needs to be
  converted to use it or some other generic accessor.
  
  Also, bio splitting means that when a driver gets a bio, bi_idx and
  bi_bvec_done may both be nonzero. Again, just need to use generic accessors.
 
 There are three pending patchsets and I don't know how they're
 supposed to come together.  Please explain on which tree the patches
 are based and how they stack and preferably provide a git branch.

Both of the new ones depend on the block cleanups series, but they
shouldn't depend on each other (bar perhaps some trivial merge
conflicts in bio.h)

Git repo - block_stuff branch on http://evilpiepirate.org/git/linux-bcache.git/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-10 Thread Kent Overstreet
On Mon, Sep 10, 2012 at 10:22:10AM -0700, Tejun Heo wrote:
 Hello, Kent.
 
 On Sun, Sep 09, 2012 at 05:28:10PM -0700, Kent Overstreet wrote:
+   while ((bio = bio_list_pop(current-bio_list)))
+   bio_list_add(bio-bi_pool == bs ? punt : nopunt, bio);
+
+   *current-bio_list = nopunt;
   
   Why this is necessary needs explanation and it's done in rather
   unusual way.  I suppose the weirdness is from bio_list API
   restriction?
  
  It's because bio_lists are singly linked, so deleting an entry from the
  middle of the list would be a real pain - just much cleaner/simpler to
  do it this way.
 
 Yeah, I wonder how benefical that singly linked list is.  Eh well...

Well, this is the first time I can think of that it's come up, and IMO
this is no less clean a way of writing it... just a bit unusual in C,
it feels more functional to me instead of imperative.

   Wouldn't the following be better?
   
 p = mempool_alloc(bs-bi_pool, gfp_mask);
 if (unlikely(!p)  gfp_mask != saved_gfp) {
 punt_bios_to_rescuer(bs);
 p = mempool_alloc(bs-bi_pool, saved_gfp);
 }
  
  That'd require duplicating the error handling in two different places -
  once for the initial allocation, once for the bvec allocation. And I
  really hate that writing code that does
  
  alloc_something()
  if (fail) {
  alloc_something_again()
  }
  
  it just screams ugly to me.
 
 I don't know.  That at least represents what's going on and goto'ing
 back and forth is hardly pretty.  Sometimes the code gets much uglier
 / unwieldy and we have to live with gotos.  Here, that doesn't seem to
 be the case.

I think this is really more personal preference than anything, but:

Setting gfp_mask = saved_gfp after calling punt_bio_to_rescuer() is
really the correct thing to do, and makes the code clearer IMO: once
we've run punt_bio_to_rescuer() we don't need to mask out GFP_WAIT (not
until the next time a bio is submitted, really).

This matters a bit for the bvl allocation too, if we call
punt_bio_to_rescuer() for the bio allocation no point doing it again.

So to be rigorously correct, your way would have to be

p = mempool_alloc(bs-bio_pool, gfp_mask);
if (!p  gfp_mask != saved_gfp) {
punt_bios_to_rescuer(bs);
gfp_mask = saved_gfp;
p = mempool_alloc(bs-bio_pool, gfp_mask);
}

And at that point, why duplicate that line of code? It doesn't matter that
much, but IMO a goto retry better labels what's actually going on (it's
something that's not uncommon in the kernel and if I see a retry label
in a function I pretty immediately have an idea of what's going on).

So we could do

retry:
p = mempool_alloc(bs-bio_pool, gfp_mask);
if (!p  gfp_mask != saved_gfp) {
punt_bios_to_rescuer(bs);
gfp_mask = saved_gfp;
goto retry;
}

(side note: not that it really matters here, but gcc will inline the
bvec_alloc_bs() call if it's not duplicated, I've never seen it
consolidate duplicated code and /then/ inline based off that)

This does have the advantage that we're not freeing and reallocating the
bio like Vivek pointed out, but I'm not a huge fan of having the
punting/retry logic in the main code path.

I don't care that much though. I'd prefer not to have the actual
allocations duplicated, but it's starting to feel like bikeshedding to
me.

  +static void punt_bios_to_rescuer(struct bio_set *bs)
  +{
  +   struct bio_list punt, nopunt;
  +   struct bio *bio;
  +
  +   /*
  +* Don't want to punt all bios on current-bio_list; if there was a bio
  +* on there for a stacking driver higher up in the stack, processing it
  +* could require allocating bios from this bio_set, and we don't want to
  +* do that from our own rescuer.
 
 Hmmm... isn't it more like we must process only the bios which are
 from this bio_set to have any kind of forward-progress guarantee?  The
 above sounds like it's just something undesirable.

Yeah, that'd be better, I'll change it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-10 Thread Kent Overstreet
On Mon, Sep 10, 2012 at 01:40:10PM -0700, Tejun Heo wrote:
 Hello, Kent.
 
 On Mon, Sep 10, 2012 at 01:24:35PM -0700, Kent Overstreet wrote:
  And at that point, why duplicate that line of code? It doesn't matter that
  much, but IMO a goto retry better labels what's actually going on (it's
  something that's not uncommon in the kernel and if I see a retry label
  in a function I pretty immediately have an idea of what's going on).
  
  So we could do
  
  retry:
  p = mempool_alloc(bs-bio_pool, gfp_mask);
  if (!p  gfp_mask != saved_gfp) {
  punt_bios_to_rescuer(bs);
  gfp_mask = saved_gfp;
  goto retry;
  }
 
 Yes, we do retry loops if that makes the code simpler.  Doing that to
 save one extra alloc call, I don't think so.

Simpler isn't really an objective thing though. To me the goto version
is more obvious/idiomatic.

Eh. I'll do it your way, but consider this a formal objection :p
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-10 Thread Kent Overstreet
On Mon, Sep 10, 2012 at 02:37:10PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 02:33:49PM -0700, Kent Overstreet wrote:
  Simpler isn't really an objective thing though. To me the goto version
  is more obvious/idiomatic.
  
  Eh. I'll do it your way, but consider this a formal objection :p
 
 Thanks. :)

Here's current version. Good enough for an acked-by?

commit df7e63cbffa3065fcc4ba2b9a93418d7c7312243
Author: Kent Overstreet koverstr...@google.com
Date:   Mon Sep 10 14:33:46 2012 -0700

block: Avoid deadlocks with bio allocation by stacking drivers

Previously, if we ever try to allocate more than once from the same bio
set while running under generic_make_request() (i.e. a stacking block
driver), we risk deadlock.

This is because of the code in generic_make_request() that converts
recursion to iteration; any bios we submit won't actually be submitted
(so they can complete and eventually be freed) until after we return -
this means if we allocate a second bio, we're blocking the first one
from ever being freed.

Thus if enough threads call into a stacking block driver at the same
time with bios that need multiple splits, and the bio_set's reserve gets
used up, we deadlock.

This can be worked around in the driver code - we could check if we're
running under generic_make_request(), then mask out __GFP_WAIT when we
go to allocate a bio, and if the allocation fails punt to workqueue and
retry the allocation.

But this is tricky and not a generic solution. This patch solves it for
all users by inverting the previously described technique. We allocate a
rescuer workqueue for each bio_set, and then in the allocation code if
there are bios on current-bio_list we would be blocking, we punt them
to the rescuer workqueue to be submitted.

This guarantees forward progress for bio allocations under
generic_make_request() provided each bio is submitted before allocating
the next, and provided the bios are freed after they complete.

Note that this doesn't do anything for allocation from other mempools.
Instead of allocating per bio data structures from a mempool, code
should use bio_set's front_pad.

Tested it by forcing the rescue codepath to be taken (by disabling the
first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
of arbitrary bio splitting) and verified that the rescuer was being
invoked.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk

diff --git a/fs/bio.c b/fs/bio.c
index 13e9567..4783e31 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -295,6 +295,54 @@ void bio_reset(struct bio *bio)
 }
 EXPORT_SYMBOL(bio_reset);
 
+static void bio_alloc_rescue(struct work_struct *work)
+{
+   struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
+   struct bio *bio;
+
+   while (1) {
+   spin_lock(bs-rescue_lock);
+   bio = bio_list_pop(bs-rescue_list);
+   spin_unlock(bs-rescue_lock);
+
+   if (!bio)
+   break;
+
+   generic_make_request(bio);
+   }
+}
+
+static void punt_bios_to_rescuer(struct bio_set *bs)
+{
+   struct bio_list punt, nopunt;
+   struct bio *bio;
+
+   /*
+* In order to guarantee forward progress we must punt only bios that
+* were allocated from this bio_set; otherwise, if there was a bio on
+* there for a stacking driver higher up in the stack, processing it
+* could require allocating bios from this bio_set, and doing that from
+* our own rescuer would be bad.
+*
+* Since bio lists are singly linked, pop them all instead of trying to
+* remove from the middle of the list:
+*/
+
+   bio_list_init(punt);
+   bio_list_init(nopunt);
+
+   while ((bio = bio_list_pop(current-bio_list)))
+   bio_list_add(bio-bi_pool == bs ? punt : nopunt, bio);
+
+   *current-bio_list = nopunt;
+
+   spin_lock(bs-rescue_lock);
+   bio_list_merge(bs-rescue_list, punt);
+   spin_unlock(bs-rescue_lock);
+
+   queue_work(bs-rescue_workqueue, bs-rescue_work);
+}
+
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -312,11 +360,27 @@ EXPORT_SYMBOL(bio_reset);
  *   previously allocated bio for IO before attempting to allocate a new one.
  *   Failure to do so can cause deadlocks under memory pressure.
  *
+ *   Note that when running under generic_make_request() (i.e. any block
+ *   driver), bios are not submitted until after you return - see the code in
+ *   generic_make_request() that converts recursion into iteration, to prevent
+ *   stack overflows.
+ *
+ *   This would normally mean allocating multiple bios under
+ *   generic_make_request() would be susceptible to deadlocks, but we have

Re: [dm-devel] [PATCH 2/2] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-10 Thread Kent Overstreet
On Mon, Sep 10, 2012 at 11:50:57PM +0100, Alasdair G Kergon wrote:
 On Mon, Sep 10, 2012 at 03:09:10PM -0700, Tejun Heo wrote:
  On Mon, Sep 10, 2012 at 02:56:33PM -0700, Kent Overstreet wrote:
   commit df7e63cbffa3065fcc4ba2b9a93418d7c7312243
   Author: Kent Overstreet koverstr...@google.com
   Date:   Mon Sep 10 14:33:46 2012 -0700
   
   block: Avoid deadlocks with bio allocation by stacking drivers
 
   Note that this doesn't do anything for allocation from other mempools.
 
 Note that dm has several cases of this, so this patch should not be used with
 dm yet.

That just means it won't affect dm one way or the other for those
allocations.

 Mikulas is studying those cases to see whether anything like this
 might be feasible/sensible or not.

I've got a patch that eliminates one of the per bio mempools in dm, and
I'll probably work on the rest after I finish off with immutable biovecs
- which is mostly done, just cleaning up/testing/pushing patches in now.


commit 8754349145edfc791450d3ad54c19f0f3715c86c
Author: Kent Overstreet koverstr...@google.com
Date:   Tue Sep 4 06:17:56 2012 -0700

dm: Use bioset's front_pad for dm_target_io

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f2eb730..3cf39b0 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -71,6 +71,7 @@ struct dm_target_io {
struct dm_io *io;
struct dm_target *ti;
union map_info info;
+   struct bio clone;
 };
 
 /*
@@ -174,7 +175,7 @@ struct mapped_device {
 * io objects are allocated from here.
 */
mempool_t *io_pool;
-   mempool_t *tio_pool;
+   mempool_t *rq_tio_pool;
 
struct bio_set *bs;
 
@@ -214,15 +215,8 @@ struct dm_md_mempools {
 
 #define MIN_IOS 256
 static struct kmem_cache *_io_cache;
-static struct kmem_cache *_tio_cache;
 static struct kmem_cache *_rq_tio_cache;
 
-/*
- * Unused now, and needs to be deleted. But since io_pool is overloaded and 
it's
- * still used for _io_cache, I'm leaving this for a later cleanup
- */
-static struct kmem_cache *_rq_bio_info_cache;
-
 static int __init local_init(void)
 {
int r = -ENOMEM;
@@ -232,22 +226,13 @@ static int __init local_init(void)
if (!_io_cache)
return r;
 
-   /* allocate a slab for the target ios */
-   _tio_cache = KMEM_CACHE(dm_target_io, 0);
-   if (!_tio_cache)
-   goto out_free_io_cache;
-
_rq_tio_cache = KMEM_CACHE(dm_rq_target_io, 0);
if (!_rq_tio_cache)
-   goto out_free_tio_cache;
-
-   _rq_bio_info_cache = KMEM_CACHE(dm_rq_clone_bio_info, 0);
-   if (!_rq_bio_info_cache)
-   goto out_free_rq_tio_cache;
+   goto out_free_io_cache;
 
r = dm_uevent_init();
if (r)
-   goto out_free_rq_bio_info_cache;
+   goto out_free_rq_tio_cache;
 
_major = major;
r = register_blkdev(_major, _name);
@@ -261,12 +246,8 @@ static int __init local_init(void)
 
 out_uevent_exit:
dm_uevent_exit();
-out_free_rq_bio_info_cache:
-   kmem_cache_destroy(_rq_bio_info_cache);
 out_free_rq_tio_cache:
kmem_cache_destroy(_rq_tio_cache);
-out_free_tio_cache:
-   kmem_cache_destroy(_tio_cache);
 out_free_io_cache:
kmem_cache_destroy(_io_cache);
 
@@ -275,9 +256,7 @@ out_free_io_cache:
 
 static void local_exit(void)
 {
-   kmem_cache_destroy(_rq_bio_info_cache);
kmem_cache_destroy(_rq_tio_cache);
-   kmem_cache_destroy(_tio_cache);
kmem_cache_destroy(_io_cache);
unregister_blkdev(_major, _name);
dm_uevent_exit();
@@ -461,20 +440,15 @@ static void free_io(struct mapped_device *md, struct 
dm_io *io)
mempool_free(io, md-io_pool);
 }
 
-static void free_tio(struct mapped_device *md, struct dm_target_io *tio)
-{
-   mempool_free(tio, md-tio_pool);
-}
-
 static struct dm_rq_target_io *alloc_rq_tio(struct mapped_device *md,
gfp_t gfp_mask)
 {
-   return mempool_alloc(md-tio_pool, gfp_mask);
+   return mempool_alloc(md-rq_tio_pool, gfp_mask);
 }
 
 static void free_rq_tio(struct dm_rq_target_io *tio)
 {
-   mempool_free(tio, tio-md-tio_pool);
+   mempool_free(tio, tio-md-rq_tio_pool);
 }
 
 static int md_in_flight(struct mapped_device *md)
@@ -658,7 +632,6 @@ static void clone_endio(struct bio *bio, int error)
int r = 0;
struct dm_target_io *tio = bio-bi_private;
struct dm_io *io = tio-io;
-   struct mapped_device *md = tio-io-md;
dm_endio_fn endio = tio-ti-type-end_io;
 
if (!bio_flagged(bio, BIO_UPTODATE)  !error)
@@ -681,7 +654,6 @@ static void clone_endio(struct bio *bio, int error)
}
}
 
-   free_tio(md, tio);
bio_put(bio);
dec_pending(io, error);
 }
@@ -998,13 +970,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, 
sector_t len)
 }
 EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
 
-static void __map_bio(struct

Re: [dm-devel] [PATCH 2/2] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-10 Thread Kent Overstreet
On Mon, Sep 10, 2012 at 04:01:01PM -0700, Tejun Heo wrote:
 Hello,
 
 On Mon, Sep 10, 2012 at 3:50 PM, Alasdair G Kergon a...@redhat.com wrote:
   Note that this doesn't do anything for allocation from other 
   mempools.
 
  Note that dm has several cases of this, so this patch should not be used 
  with
  dm yet.  Mikulas is studying those cases to see whether anything like this
  might be feasible/sensible or not.
 
 IIUC, Kent posted a patch which converts all of them to use front-pad
 (there's no reason not to, really). This better come after that but
 it's not like this is gonna break something which isn't broken now.

Not all, I only did the easy one - you know how dm has all those crazy
abstraction layers? They've got multiple per bio allocations because of
that; the core dm code does one, and then some other code takes that
struct dm_io* and allocates its own state pointing to that (which then
points to the original bio...)

So front_pad should still work, but you need to have say dm_crypt pass
the amount of front pad it needs to the core dm code when it creates the
bio_set, and then dm crypt can use container_of(struct dm_io) and embed
like everything does that use the bio_set front pad.

*I'm probably misremembering all the names.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Drbd-dev] FLUSH/FUA documentation code discrepancy

2012-09-10 Thread Kent Overstreet
On Mon, Sep 10, 2012 at 04:06:54PM -0700, Tejun Heo wrote:
 Hello, again.
 
 cc'ing Kent and Vivek.  The original thread is at
 
   http://thread.gmane.org/gmane.linux.network.drbd.devel/2130
 
 On Mon, Sep 10, 2012 at 03:54:42PM -0700, Tejun Heo wrote:
   We can possibly work around that by introducing an additional submitter 
   thread,
   or at least our own list where we queue assembled bios until the lower
   level device queue drains.
   
   But we'd rather have the elevator see the FLUSH/FUA,
   and treat them as at least a soft barrier/reorder boundary.
   
   I may be wrong here, but all the necessary bits for this seem to be in
   place already, if the information would even reach the elevator in one
   way or other, and not be completely stripped away early.
   
   What would you rather see, the elevator recognizing reorder boundaries?
   Or additional higher level queueing and extra thread/work queue/whatever?
   
   Both are fine with me, I'm just asking for an opinion.
  
  First of all, using FLUSH/FUA for such purpose is an error-prone
  abuse.  You're trying to exploit an implementation detail which may
  change at any time.  I think what you want is to be able to specify
  REQ_SOFTBARRIER on bio submission, which shouldn't be too hard but I'm
  still lost why this is necessary.  Can you please explain it a bit
  more?
 
 The problem with exposing REQ_SOFTBARRIER at bio submission is that it
 would require block layer not to reorder bios while passing through
 stacked adrivers until it reaches a rq-based driver.  I *suspect* this
 has been true until now but Kent's pending patch to fix possible
 deadlock issue breaks that.
 
   http://thread.gmane.org/gmane.linux.kernel.bcache.devel/1017/focus=1356250
 
 As for what the resolution should be, urgh... I don't know. :(

No, that ordering has definitely not been preserved before - any
stacking driver would need either a giant global lock around its
make_request fn and everything else, or it'd need to serialize all bios
with some kind of linked list or something until they got to the
underlying devices.

And if that wasn't bad enough performance wise, getting the ordering
semantics correct when you've got multiple underlying devices...
*shudder*

I thought these kinds of ordering requirements were what we were getting
away from when old style barriers went away in 2.6.38.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Drbd-dev] FLUSH/FUA documentation code discrepancy

2012-09-10 Thread Kent Overstreet
cc'ing Neil

On Mon, Sep 10, 2012 at 04:06:54PM -0700, Tejun Heo wrote:
 Hello, again.
 
 cc'ing Kent and Vivek.  The original thread is at
 
   http://thread.gmane.org/gmane.linux.network.drbd.devel/2130
 
 On Mon, Sep 10, 2012 at 03:54:42PM -0700, Tejun Heo wrote:
   We can possibly work around that by introducing an additional submitter 
   thread,
   or at least our own list where we queue assembled bios until the lower
   level device queue drains.
   
   But we'd rather have the elevator see the FLUSH/FUA,
   and treat them as at least a soft barrier/reorder boundary.
   
   I may be wrong here, but all the necessary bits for this seem to be in
   place already, if the information would even reach the elevator in one
   way or other, and not be completely stripped away early.
   
   What would you rather see, the elevator recognizing reorder boundaries?
   Or additional higher level queueing and extra thread/work queue/whatever?
   
   Both are fine with me, I'm just asking for an opinion.
  
  First of all, using FLUSH/FUA for such purpose is an error-prone
  abuse.  You're trying to exploit an implementation detail which may
  change at any time.  I think what you want is to be able to specify
  REQ_SOFTBARRIER on bio submission, which shouldn't be too hard but I'm
  still lost why this is necessary.  Can you please explain it a bit
  more?
 
 The problem with exposing REQ_SOFTBARRIER at bio submission is that it
 would require block layer not to reorder bios while passing through
 stacked adrivers until it reaches a rq-based driver.  I *suspect* this
 has been true until now but Kent's pending patch to fix possible
 deadlock issue breaks that.

Yeah, you might be right about that. I think Neil Brown would know
better than I if this ordering was ever explicitly broken.

But I don't think anything else is relying on that kind of ordering any
more.

   http://thread.gmane.org/gmane.linux.kernel.bcache.devel/1017/focus=1356250
 
 As for what the resolution should be, urgh... I don't know. :(
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 00/26] Prep work for immutable bio vecs

2012-09-10 Thread Kent Overstreet
Random assortment of refactoring and trivial cleanups;

Immutable bio vecs and efficient bio splitting require auditing and
removing pretty much all bi_idx uses, among other things.

The reason is that with immutable bio vecs we can't use the bvec array
directly; if we have a partially completed bvec, that'll be indicated
with a field in struct bvec_iter (which gets embedded in struct bio) -
bi_bvec_done.

bio_for_each_segments() will handle this transparently, so code needs to
be converted to use it or some other generic accessor.

Also, bio splitting means that when a driver gets a bio, bi_idx and
bi_bvec_done may both be nonzero. Again, just need to use generic
accessors.

v2: Patch series now has all the prep work to be done before abstracting
out the bio iterator, I think.

Kent Overstreet (26):
  block: Convert integrity to bvec_alloc_bs(), and a bugfix
  block: Add bio_advance()
  block: Refactor blk_update_request()
  md: Convert md_trim_bio() to use bio_advance()
  block: Add bio_end()
  block: Use bio_sectors() more consistently
  block: Don't use bi_idx in bio_split() or require it to be 0
  block: Remove bi_idx references
  block: Remove some unnecessary bi_vcnt usage
  block: Add submit_bio_wait(), remove from md
  raid10: Use bio_reset()
  raid1: use bio_reset()
  raid5: use bio_reset()
  raid1: Refactor narrow_write_error() to not use bi_idx
  block: Add bio_copy_data()
  pktcdvd: use bio_copy_data()
  pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
  raid1: use bio_copy_data()
  bounce: Refactor __blk_queue_bounce to not use bi_io_vec
  block: Add bio_for_each_segment_all()
  block: Convert some code to bio_for_each_segment_all()
  block: Add bio_alloc_pages()
  raid1: use bio_alloc_pages()
  block: Add an explicit bio flag for bios that own their bvec
  bio-integrity: Add explicit field for owner of bip_buf
  block: Add BIO_SUBMITTED flag, kill BIO_CLONED

 block/blk-core.c |  88 +++-
 block/cfq-iosched.c  |   7 +-
 block/deadline-iosched.c |   2 +-
 drivers/block/aoe/aoeblk.c   |   2 +-
 drivers/block/aoe/aoecmd.c   |   2 +-
 drivers/block/brd.c  |   3 +-
 drivers/block/drbd/drbd_req.c|   8 +-
 drivers/block/floppy.c   |   1 -
 drivers/block/pktcdvd.c  | 102 --
 drivers/block/ps3vram.c  |   2 +-
 drivers/md/dm-crypt.c|   3 +-
 drivers/md/dm-raid1.c|   2 +-
 drivers/md/dm-stripe.c   |   2 +-
 drivers/md/dm-verity.c   |   4 +-
 drivers/md/dm.c  |   1 -
 drivers/md/faulty.c  |   6 +-
 drivers/md/linear.c  |   3 +-
 drivers/md/md.c  |  19 +--
 drivers/md/raid0.c   |   9 +-
 drivers/md/raid1.c   | 131 ++
 drivers/md/raid10.c  |  78 +++
 drivers/md/raid5.c   |  50 +++
 drivers/message/fusion/mptsas.c  |   6 +-
 drivers/s390/block/dcssblk.c |   3 +-
 drivers/scsi/libsas/sas_expander.c   |   6 +-
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  10 +-
 fs/bio-integrity.c   | 134 ++
 fs/bio.c | 226 +++
 fs/btrfs/extent_io.c |   3 +-
 fs/buffer.c  |   1 -
 fs/direct-io.c   |   8 +-
 fs/exofs/ore.c   |   2 +-
 fs/exofs/ore_raid.c  |   2 +-
 fs/gfs2/lops.c   |   2 +-
 fs/jfs/jfs_logmgr.c  |   2 -
 fs/logfs/dev_bdev.c  |   5 -
 include/linux/bio.h  |  34 +++--
 include/linux/blk_types.h|   3 +-
 include/trace/events/block.h |  10 +-
 mm/bounce.c  |  75 +++---
 mm/page_io.c |   1 -
 41 files changed, 471 insertions(+), 587 deletions(-)

-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 03/26] block: Refactor blk_update_request()

2012-09-10 Thread Kent Overstreet
Converts it to use bio_advance(), simplifying it quite a bit in the
process.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 block/blk-core.c | 84 +++-
 1 file changed, 16 insertions(+), 68 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2d739ca..55c833c9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -153,25 +153,19 @@ EXPORT_SYMBOL(blk_rq_init);
 static void req_bio_endio(struct request *rq, struct bio *bio,
  unsigned int nbytes, int error)
 {
+   /*
+* XXX: bio_endio() does this. only need this because of the weird
+* flush seq thing.
+*/
if (error)
clear_bit(BIO_UPTODATE, bio-bi_flags);
else if (!test_bit(BIO_UPTODATE, bio-bi_flags))
error = -EIO;
 
-   if (unlikely(nbytes  bio-bi_size)) {
-   printk(KERN_ERR %s: want %u bytes done, %u left\n,
-  __func__, nbytes, bio-bi_size);
-   nbytes = bio-bi_size;
-   }
-
if (unlikely(rq-cmd_flags  REQ_QUIET))
set_bit(BIO_QUIET, bio-bi_flags);
 
-   bio-bi_size -= nbytes;
-   bio-bi_sector += (nbytes  9);
-
-   if (bio_integrity(bio))
-   bio_integrity_advance(bio, nbytes);
+   bio_advance(bio, nbytes);
 
/* don't actually finish bio if it's part of flush sequence */
if (bio-bi_size == 0  !(rq-cmd_flags  REQ_FLUSH_SEQ))
@@ -2214,8 +2208,7 @@ EXPORT_SYMBOL(blk_fetch_request);
  **/
 bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 {
-   int total_bytes, bio_nbytes, next_idx = 0;
-   struct bio *bio;
+   int total_bytes;
 
if (!req-bio)
return false;
@@ -2259,56 +2252,21 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
 
blk_account_io_completion(req, nr_bytes);
 
-   total_bytes = bio_nbytes = 0;
-   while ((bio = req-bio) != NULL) {
-   int nbytes;
+   total_bytes = 0;
+   while (req-bio) {
+   struct bio *bio = req-bio;
+   unsigned bio_bytes = min(bio-bi_size, nr_bytes);
 
-   if (nr_bytes = bio-bi_size) {
+   if (bio_bytes == bio-bi_size)
req-bio = bio-bi_next;
-   nbytes = bio-bi_size;
-   req_bio_endio(req, bio, nbytes, error);
-   next_idx = 0;
-   bio_nbytes = 0;
-   } else {
-   int idx = bio-bi_idx + next_idx;
-
-   if (unlikely(idx = bio-bi_vcnt)) {
-   blk_dump_rq_flags(req, __end_that);
-   printk(KERN_ERR %s: bio idx %d = vcnt %d\n,
-  __func__, idx, bio-bi_vcnt);
-   break;
-   }
-
-   nbytes = bio_iovec_idx(bio, idx)-bv_len;
-   BIO_BUG_ON(nbytes  bio-bi_size);
-
-   /*
-* not a complete bvec done
-*/
-   if (unlikely(nbytes  nr_bytes)) {
-   bio_nbytes += nr_bytes;
-   total_bytes += nr_bytes;
-   break;
-   }
 
-   /*
-* advance to the next vector
-*/
-   next_idx++;
-   bio_nbytes += nbytes;
-   }
+   req_bio_endio(req, bio, bio_bytes, error);
 
-   total_bytes += nbytes;
-   nr_bytes -= nbytes;
+   total_bytes += bio_bytes;
+   nr_bytes -= bio_bytes;
 
-   bio = req-bio;
-   if (bio) {
-   /*
-* end more in this run, or just return 'not-done'
-*/
-   if (unlikely(nr_bytes = 0))
-   break;
-   }
+   if (!nr_bytes)
+   break;
}
 
/*
@@ -2324,16 +2282,6 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
return false;
}
 
-   /*
-* if the request wasn't completed, update state
-*/
-   if (bio_nbytes) {
-   req_bio_endio(req, bio, bio_nbytes, error);
-   bio-bi_idx += next_idx;
-   bio_iovec(bio)-bv_offset += nr_bytes;
-   bio_iovec(bio)-bv_len -= nr_bytes;
-   }
-
req-__data_len -= total_bytes;
req-buffer = bio_data(req-bio);
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info

[PATCH v2 07/26] block: Don't use bi_idx in bio_split() or require it to be 0

2012-09-10 Thread Kent Overstreet
Prep work for immutable bio_vecs/efficient bio splitting: they require
auditing and removing most uses of bi_idx.

So here we convert bio_split() to respect the current value of bi_idx
and use the bio_iovec() macro, instead of assuming bi_idx will be 0.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/block/drbd/drbd_req.c | 6 +++---
 drivers/md/raid0.c| 3 +--
 drivers/md/raid10.c   | 3 +--
 fs/bio-integrity.c| 4 ++--
 fs/bio.c  | 7 +++
 5 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index af69a96..57eb253 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1155,11 +1155,11 @@ void drbd_make_request(struct request_queue *q, struct 
bio *bio)
 
/* can this bio be split generically?
 * Maybe add our own split-arbitrary-bios function. */
-   if (bio-bi_vcnt != 1 || bio-bi_idx != 0 || bio-bi_size  
DRBD_MAX_BIO_SIZE) {
+   if (bio_segments(bio) != 1 || bio-bi_size  DRBD_MAX_BIO_SIZE) {
/* rather error out here than BUG in bio_split */
dev_err(DEV, bio would need to, but cannot, be split: 
-   (vcnt=%u,idx=%u,size=%u,sector=%llu)\n,
-   bio-bi_vcnt, bio-bi_idx, bio-bi_size,
+   (segments=%u,size=%u,sector=%llu)\n,
+   bio_segments(bio), bio-bi_size,
(unsigned long long)bio-bi_sector);
bio_endio(bio, -EINVAL);
} else {
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 387cb89..0587450 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -509,8 +509,7 @@ static void raid0_make_request(struct mddev *mddev, struct 
bio *bio)
sector_t sector = bio-bi_sector;
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this 
happening */
-   if (bio-bi_vcnt != 1 ||
-   bio-bi_idx != 0)
+   if (bio_segments(bio) != 1)
goto bad_map;
/* This is a one page bio that upper layers
 * refuse to split for us, so we need to split it.
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9715aaf..bbd08f5 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1081,8 +1081,7 @@ static void make_request(struct mddev *mddev, struct bio 
* bio)
 || conf-prev.near_copies  conf-prev.raid_disks))) {
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this 
happening */
-   if (bio-bi_vcnt != 1 ||
-   bio-bi_idx != 0)
+   if (bio_segments(bio) != 1)
goto bad_map;
/* This is a one page bio that upper layers
 * refuse to split for us, so we need to split it.
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 1d64f7f..e8555a5 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -657,8 +657,8 @@ void bio_integrity_split(struct bio *bio, struct bio_pair 
*bp, int sectors)
bp-bio1.bi_integrity = bp-bip1;
bp-bio2.bi_integrity = bp-bip2;
 
-   bp-iv1 = bip-bip_vec[0];
-   bp-iv2 = bip-bip_vec[0];
+   bp-iv1 = bip-bip_vec[bip-bip_idx];
+   bp-iv2 = bip-bip_vec[bip-bip_idx];
 
bp-bip1.bip_vec = bp-iv1;
bp-bip2.bip_vec = bp-iv2;
diff --git a/fs/bio.c b/fs/bio.c
index 07587c0..addeac2 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1616,8 +1616,7 @@ struct bio_pair *bio_split(struct bio *bi, int 
first_sectors)
trace_block_split(bdev_get_queue(bi-bi_bdev), bi,
bi-bi_sector + first_sectors);
 
-   BUG_ON(bi-bi_vcnt != 1);
-   BUG_ON(bi-bi_idx != 0);
+   BUG_ON(bio_segments(bi) != 1);
atomic_set(bp-cnt, 3);
bp-error = 0;
bp-bio1 = *bi;
@@ -1626,8 +1625,8 @@ struct bio_pair *bio_split(struct bio *bi, int 
first_sectors)
bp-bio2.bi_size -= first_sectors  9;
bp-bio1.bi_size = first_sectors  9;
 
-   bp-bv1 = bi-bi_io_vec[0];
-   bp-bv2 = bi-bi_io_vec[0];
+   bp-bv1 = *bio_iovec(bi);
+   bp-bv2 = *bio_iovec(bi);
bp-bv2.bv_offset += first_sectors  9;
bp-bv2.bv_len -= first_sectors  9;
bp-bv1.bv_len = first_sectors  9;
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 06/26] block: Use bio_sectors() more consistently

2012-09-10 Thread Kent Overstreet
Bunch of places in the code weren't using it where they could be -
this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
into a struct bvec_iter.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/block/aoe/aoeblk.c   |  2 +-
 drivers/block/aoe/aoecmd.c   |  2 +-
 drivers/block/brd.c  |  3 +--
 drivers/block/pktcdvd.c  |  2 +-
 drivers/block/ps3vram.c  |  2 +-
 drivers/md/dm-raid1.c|  2 +-
 drivers/md/raid0.c   |  6 +++---
 drivers/md/raid1.c   | 17 -
 drivers/md/raid10.c  | 24 +++-
 drivers/md/raid5.c   |  8 
 include/trace/events/block.h | 10 +-
 11 files changed, 37 insertions(+), 41 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 321de7b..6e4420a 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -199,7 +199,7 @@ aoeblk_make_request(struct request_queue *q, struct bio 
*bio)
buf-bio = bio;
buf-resid = bio-bi_size;
buf-sector = bio-bi_sector;
-   buf-bv = bio-bi_io_vec[bio-bi_idx];
+   buf-bv = bio_iovec(bio);
buf-bv_resid = buf-bv-bv_len;
WARN_ON(buf-bv_resid == 0);
buf-bv_off = buf-bv-bv_offset;
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index de0435e..2b52ebc 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -720,7 +720,7 @@ gettgt(struct aoedev *d, char *addr)
 static inline void
 diskstats(struct gendisk *disk, struct bio *bio, ulong duration, sector_t 
sector)
 {
-   unsigned long n_sect = bio-bi_size  9;
+   unsigned long n_sect = bio_sectors(bio);
const int rw = bio_data_dir(bio);
struct hd_struct *part;
int cpu;
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 531ceb3..d5c4978 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -334,8 +334,7 @@ static void brd_make_request(struct request_queue *q, 
struct bio *bio)
int err = -EIO;
 
sector = bio-bi_sector;
-   if (sector + (bio-bi_size  SECTOR_SHIFT) 
-   get_capacity(bdev-bd_disk))
+   if (sector + bio_sectors(bio)  get_capacity(bdev-bd_disk))
goto out;
 
if (unlikely(bio-bi_rw  REQ_DISCARD)) {
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 8df3216..0824627 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2433,7 +2433,7 @@ static void pkt_make_request(struct request_queue *q, 
struct bio *bio)
cloned_bio-bi_bdev = pd-bdev;
cloned_bio-bi_private = psd;
cloned_bio-bi_end_io = pkt_end_io_read_cloned;
-   pd-stats.secs_r += bio-bi_size  9;
+   pd-stats.secs_r += bio_sectors(bio);
pkt_queue_bio(pd, cloned_bio);
return;
}
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index f58cdcf..1ff38e8 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -553,7 +553,7 @@ static struct bio *ps3vram_do_bio(struct 
ps3_system_bus_device *dev,
struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
int write = bio_data_dir(bio) == WRITE;
const char *op = write ? write : read;
-   loff_t offset = bio-bi_sector  9;
+   loff_t offset = bio_sectors(bio);
int error = 0;
struct bio_vec *bvec;
unsigned int i;
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index bc5ddba8..3dac2de 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -457,7 +457,7 @@ static void map_region(struct dm_io_region *io, struct 
mirror *m,
 {
io-bdev = m-dev-bdev;
io-sector = map_sector(m, bio);
-   io-count = bio-bi_size  9;
+   io-count = bio_sectors(bio);
 }
 
 static void hold_bio(struct mirror_set *ms, struct bio *bio)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index de63a1f..387cb89 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -484,11 +484,11 @@ static inline int is_io_in_chunk_boundary(struct mddev 
*mddev,
 {
if (likely(is_power_of_2(chunk_sects))) {
return chunk_sects = ((bio-bi_sector  (chunk_sects-1))
-   + (bio-bi_size  9));
+   + bio_sectors(bio));
} else{
sector_t sector = bio-bi_sector;
return chunk_sects = (sector_div(sector, chunk_sects)
-   + (bio-bi_size  9));
+   + bio_sectors(bio));
}
 }
 
@@ -542,7 +542,7 @@ bad_map:
printk(md/raid0:%s: make_request bug: can't convert block across 
chunks
or bigger than %dk %llu %d\n,
   mdname(mddev), chunk_sects / 2,
-  (unsigned long long)bio-bi_sector, bio

[PATCH v2 15/26] block: Add bio_copy_data()

2012-09-10 Thread Kent Overstreet
This gets open coded quite a bit and it's tricky to get right, so make a
generic version and convert some existing users over to it instead.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 fs/bio.c| 70 +
 include/linux/bio.h |  2 ++
 2 files changed, 72 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index 1342a16..7fb9f4e 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -827,6 +827,76 @@ void bio_advance(struct bio *bio, unsigned bytes)
 }
 EXPORT_SYMBOL(bio_advance);
 
+/**
+ * bio_copy_data - copy contents of data buffers from one chain of bios to
+ * another
+ * @src: source bio list
+ * @dst: destination bio list
+ *
+ * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
+ * @src and @dst as linked lists of bios.
+ *
+ * Stops when it reaches the end of either @src or @dst - that is, copies
+ * min(src-bi_size, dst-bi_size) bytes (or the equivalent for lists of bios).
+ */
+void bio_copy_data(struct bio *dst, struct bio *src)
+{
+   struct bio_vec *src_bv, *dst_bv;
+   unsigned src_offset, dst_offset, bytes;
+   void *src_p, *dst_p;
+
+   src_bv = bio_iovec(src);
+   dst_bv = bio_iovec(dst);
+
+   src_offset = src_bv-bv_offset;
+   dst_offset = dst_bv-bv_offset;
+
+   while (1) {
+   if (src_offset == src_bv-bv_offset + src_bv-bv_len) {
+   src_bv++;
+   if (src_bv == bio_iovec_idx(src, src-bi_vcnt)) {
+   src = src-bi_next;
+   if (!src)
+   break;
+
+   src_bv = bio_iovec(src);
+   }
+
+   src_offset = src_bv-bv_offset;
+   }
+
+   if (dst_offset == dst_bv-bv_offset + dst_bv-bv_len) {
+   dst_bv++;
+   if (dst_bv == bio_iovec_idx(dst, dst-bi_vcnt)) {
+   dst = dst-bi_next;
+   if (!dst)
+   break;
+
+   dst_bv = bio_iovec(dst);
+   }
+
+   dst_offset = dst_bv-bv_offset;
+   }
+
+   bytes = min(dst_bv-bv_offset + dst_bv-bv_len - dst_offset,
+   src_bv-bv_offset + src_bv-bv_len - src_offset);
+
+   src_p = kmap_atomic(src_bv-bv_page);
+   dst_p = kmap_atomic(dst_bv-bv_page);
+
+   memcpy(dst_p + dst_bv-bv_offset,
+  src_p + src_bv-bv_offset,
+  bytes);
+
+   kunmap_atomic(dst_p);
+   kunmap_atomic(src_p);
+
+   src_offset += bytes;
+   dst_offset += bytes;
+   }
+}
+EXPORT_SYMBOL(bio_copy_data);
+
 struct bio_map_data {
struct bio_vec *iovecs;
struct sg_iovec *sgvecs;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 949c48a..92015ce 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -286,6 +286,8 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
 }
 #endif
 
+extern void bio_copy_data(struct bio *dst, struct bio *src);
+
 extern struct bio *bio_copy_user(struct request_queue *, struct rq_map_data *,
 unsigned long, unsigned int, int, gfp_t);
 extern struct bio *bio_copy_user_iov(struct request_queue *,
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 16/26] pktcdvd: use bio_copy_data()

2012-09-10 Thread Kent Overstreet
Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: Jiri Kosina jkos...@suse.cz
---
 drivers/block/pktcdvd.c | 79 -
 1 file changed, 12 insertions(+), 67 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 0824627..1079a77 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -948,31 +948,6 @@ static int pkt_set_segment_merging(struct pktcdvd_device 
*pd, struct request_que
 }
 
 /*
- * Copy CD_FRAMESIZE bytes from src_bio into a destination page
- */
-static void pkt_copy_bio_data(struct bio *src_bio, int seg, int offs, struct 
page *dst_page, int dst_offs)
-{
-   unsigned int copy_size = CD_FRAMESIZE;
-
-   while (copy_size  0) {
-   struct bio_vec *src_bvl = bio_iovec_idx(src_bio, seg);
-   void *vfrom = kmap_atomic(src_bvl-bv_page) +
-   src_bvl-bv_offset + offs;
-   void *vto = page_address(dst_page) + dst_offs;
-   int len = min_t(int, copy_size, src_bvl-bv_len - offs);
-
-   BUG_ON(len  0);
-   memcpy(vto, vfrom, len);
-   kunmap_atomic(vfrom);
-
-   seg++;
-   offs = 0;
-   dst_offs += len;
-   copy_size -= len;
-   }
-}
-
-/*
  * Copy all data for this packet to pkt-pages[], so that
  * a) The number of required segments for the write bio is minimized, which
  *is necessary for some scsi controllers.
@@ -1325,55 +1300,35 @@ try_next_bio:
  */
 static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 {
-   struct bio *bio;
int f;
-   int frames_write;
struct bio_vec *bvec = pkt-w_bio-bi_io_vec;
 
+   bio_reset(pkt-w_bio);
+   pkt-w_bio-bi_sector = pkt-sector;
+   pkt-w_bio-bi_bdev = pd-bdev;
+   pkt-w_bio-bi_end_io = pkt_end_io_packet_write;
+   pkt-w_bio-bi_private = pkt;
+
+   /* XXX: locking? */
for (f = 0; f  pkt-frames; f++) {
bvec[f].bv_page = pkt-pages[(f * CD_FRAMESIZE) / PAGE_SIZE];
bvec[f].bv_offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
+   if (!bio_add_page(pkt-w_bio, bvec[f].bv_page, CD_FRAMESIZE, 
bvec[f].bv_offset))
+   BUG();
}
+   VPRINTK(DRIVER_NAME: vcnt=%d\n, pkt-w_bio-bi_vcnt);
 
/*
 * Fill-in bvec with data from orig_bios.
 */
-   frames_write = 0;
spin_lock(pkt-lock);
-   bio_list_for_each(bio, pkt-orig_bios) {
-   int segment = bio-bi_idx;
-   int src_offs = 0;
-   int first_frame = (bio-bi_sector - pkt-sector) / 
(CD_FRAMESIZE  9);
-   int num_frames = bio-bi_size / CD_FRAMESIZE;
-   BUG_ON(first_frame  0);
-   BUG_ON(first_frame + num_frames  pkt-frames);
-   for (f = first_frame; f  first_frame + num_frames; f++) {
-   struct bio_vec *src_bvl = bio_iovec_idx(bio, segment);
-
-   while (src_offs = src_bvl-bv_len) {
-   src_offs -= src_bvl-bv_len;
-   segment++;
-   BUG_ON(segment = bio-bi_vcnt);
-   src_bvl = bio_iovec_idx(bio, segment);
-   }
+   bio_copy_data(pkt-w_bio, pkt-orig_bios.head);
 
-   if (src_bvl-bv_len - src_offs = CD_FRAMESIZE) {
-   bvec[f].bv_page = src_bvl-bv_page;
-   bvec[f].bv_offset = src_bvl-bv_offset + 
src_offs;
-   } else {
-   pkt_copy_bio_data(bio, segment, src_offs,
- bvec[f].bv_page, 
bvec[f].bv_offset);
-   }
-   src_offs += CD_FRAMESIZE;
-   frames_write++;
-   }
-   }
pkt_set_state(pkt, PACKET_WRITE_WAIT_STATE);
spin_unlock(pkt-lock);
 
VPRINTK(pkt_start_write: Writing %d frames for zone %llx\n,
-   frames_write, (unsigned long long)pkt-sector);
-   BUG_ON(frames_write != pkt-write_size);
+   pkt-write_size, (unsigned long long)pkt-sector);
 
if (test_bit(PACKET_MERGE_SEGS, pd-flags) || (pkt-write_size  
pkt-frames)) {
pkt_make_local_copy(pkt, bvec);
@@ -1383,16 +1338,6 @@ static void pkt_start_write(struct pktcdvd_device *pd, 
struct packet_data *pkt)
}
 
/* Start the write request */
-   bio_reset(pkt-w_bio);
-   pkt-w_bio-bi_sector = pkt-sector;
-   pkt-w_bio-bi_bdev = pd-bdev;
-   pkt-w_bio-bi_end_io = pkt_end_io_packet_write;
-   pkt-w_bio-bi_private = pkt;
-   for (f = 0; f  pkt-frames; f++)
-   if (!bio_add_page(pkt-w_bio, bvec[f].bv_page, CD_FRAMESIZE, 
bvec[f].bv_offset))
-   BUG();
-   VPRINTK

[PATCH v2 26/26] block: Add BIO_SUBMITTED flag, kill BIO_CLONED

2012-09-10 Thread Kent Overstreet
BIO_CLONED wasn't very useful, and didn't have very clear semantics, so
kill it.

Replace it with a more useful flag - BIO_SUBMITTED means the bio has
been passed to generic_make_request() and the bvec can no longer be
modified.

Roll both changes into the same patch so we can steal the old bit for
the new flag.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 block/blk-core.c  | 2 ++
 drivers/md/dm.c   | 1 -
 fs/bio-integrity.c| 1 -
 fs/bio.c  | 8 +---
 include/linux/blk_types.h | 2 +-
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 97511cb..1d4e893 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1638,6 +1638,8 @@ generic_make_request_checks(struct bio *bio)
 
might_sleep();
 
+   bio-bi_flags |= 1  BIO_SUBMITTED;
+
if (bio_check_eod(bio, nr_sectors))
goto end_io;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8378797..777e70d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1065,7 +1065,6 @@ static struct bio *split_bvec(struct bio *bio, sector_t 
sector,
clone-bi_size = to_bytes(len);
clone-bi_io_vec-bv_offset = offset;
clone-bi_io_vec-bv_len = clone-bi_size;
-   clone-bi_flags |= 1  BIO_CLONED;
 
if (bio_integrity(bio)) {
bio_integrity_clone(clone, bio, GFP_NOIO);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 462a131..a77a566 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -621,7 +621,6 @@ void bio_integrity_trim(struct bio *bio, unsigned int 
offset,
 
BUG_ON(bip == NULL);
BUG_ON(bi == NULL);
-   BUG_ON(!bio_flagged(bio, BIO_CLONED));
 
nr_sectors = bio_integrity_hw_sectors(bi, sectors);
bip-bip_sector = bip-bip_sector + offset;
diff --git a/fs/bio.c b/fs/bio.c
index 5e91e36..d3b6e2a 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -531,7 +531,7 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
 */
bio-bi_sector = bio_src-bi_sector;
bio-bi_bdev = bio_src-bi_bdev;
-   bio-bi_flags |= 1  BIO_CLONED;
+   bio-bi_flags |= (bio_src-bi_flags  (1  BIO_SUBMITTED));
bio-bi_rw = bio_src-bi_rw;
bio-bi_vcnt = bio_src-bi_vcnt;
bio-bi_size = bio_src-bi_size;
@@ -604,9 +604,9 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
struct bio_vec *bvec;
 
/*
-* cloned bio must not modify vec list
+* submitted bio must not modify vec list
 */
-   if (unlikely(bio_flagged(bio, BIO_CLONED)))
+   if (unlikely(bio_flagged(bio, BIO_SUBMITTED)))
return 0;
 
if (((bio-bi_size + len)  9)  max_sectors)
@@ -844,6 +844,8 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
+   BUG_ON(bio_flagged(bio, BIO_SUBMITTED));
+
bio_for_each_segment_all(bv, bio, i) {
bv-bv_page = alloc_page(gfp_mask);
if (!bv-bv_page) {
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index e9375cf..fb49107 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -103,7 +103,7 @@ struct bio {
 #define BIO_RW_BLOCK   1   /* RW_AHEAD set, and read/write would block */
 #define BIO_EOF2   /* out-out-bounds error */
 #define BIO_SEG_VALID  3   /* bi_phys_segments valid */
-#define BIO_CLONED 4   /* doesn't own data */
+#define BIO_SUBMITTED  4   /* bio has been submitted */
 #define BIO_BOUNCED5   /* bio is a bounce bio */
 #define BIO_USER_MAPPED 6  /* contains user pages */
 #define BIO_EOPNOTSUPP 7   /* not supported */
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 25/26] bio-integrity: Add explicit field for owner of bip_buf

2012-09-10 Thread Kent Overstreet
This was the only real user of BIO_CLONED, which didn't have very clear
semantics. Convert to its own flag so we can get rid of BIO_CLONED.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: Martin K. Petersen martin.peter...@oracle.com
---
 fs/bio-integrity.c  | 5 ++---
 include/linux/bio.h | 1 +
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index e8555a5..462a131 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -94,9 +94,7 @@ void bio_integrity_free(struct bio *bio)
struct bio_integrity_payload *bip = bio-bi_integrity;
struct bio_set *bs = bio-bi_pool;
 
-   /* A cloned bio doesn't own the integrity metadata */
-   if (!bio_flagged(bio, BIO_CLONED)  !bio_flagged(bio, BIO_FS_INTEGRITY)
-bip-bip_buf != NULL)
+   if (bip-bip_owns_buf)
kfree(bip-bip_buf);
 
if (bs) {
@@ -382,6 +380,7 @@ int bio_integrity_prep(struct bio *bio)
return -EIO;
}
 
+   bip-bip_owns_buf = 1;
bip-bip_buf = buf;
bip-bip_size = len;
bip-bip_sector = bio-bi_sector;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index edd66f3..f429d0f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -178,6 +178,7 @@ struct bio_integrity_payload {
unsigned short  bip_slab;   /* slab the bip came from */
unsigned short  bip_vcnt;   /* # of integrity bio_vecs */
unsigned short  bip_idx;/* current bip_vec index */
+   unsignedbip_owns_buf:1; /* should free bip_buf */
 
struct work_struct  bip_work;   /* I/O completion */
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 24/26] block: Add an explicit bio flag for bios that own their bvec

2012-09-10 Thread Kent Overstreet
This is for the new bio splitting code. When we split a bio, if the
split occured on a bvec boundry we reuse the bvec for the new bio. But
that means bio_free() can't free it, hence the explicit flag.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
Acked-by: Tejun Heo t...@kernel.org
---
 fs/bio.c  | 4 +++-
 include/linux/bio.h   | 5 -
 include/linux/blk_types.h | 1 +
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 65e6eac..5e91e36 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -250,7 +250,7 @@ static void bio_free(struct bio *bio)
__bio_free(bio);
 
if (bs) {
-   if (bio_has_allocated_vec(bio))
+   if (bio_flagged(bio, BIO_OWNS_VEC))
bvec_free_bs(bs, bio-bi_io_vec, BIO_POOL_IDX(bio));
 
/*
@@ -449,6 +449,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, 
struct bio_set *bs)
 
if (unlikely(!bvl))
goto err_free;
+
+   bio-bi_flags |= 1  BIO_OWNS_VEC;
} else if (nr_iovecs) {
bvl = bio-bi_inline_vecs;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index bd45154..edd66f3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -85,11 +85,6 @@ static inline void *bio_data(struct bio *bio)
return NULL;
 }
 
-static inline int bio_has_allocated_vec(struct bio *bio)
-{
-   return bio-bi_io_vec  bio-bi_io_vec != bio-bi_inline_vecs;
-}
-
 /*
  * will die
  */
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 3eefbb2..e9375cf 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -117,6 +117,7 @@ struct bio {
  * BIO_POOL_IDX()
  */
 #define BIO_RESET_BITS 12
+#define BIO_OWNS_VEC   12  /* bio_free() should free bvec */
 
 #define bio_flagged(bio, flag) ((bio)-bi_flags  (1  (flag)))
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 23/26] raid1: use bio_alloc_pages()

2012-09-10 Thread Kent Overstreet
Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/raid1.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d30b4cb..18b743a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -92,7 +92,6 @@ static void r1bio_pool_free(void *r1_bio, void *data)
 static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 {
struct pool_info *pi = data;
-   struct page *page;
struct r1bio *r1_bio;
struct bio *bio;
int i, j;
@@ -122,14 +121,10 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void 
*data)
j = 1;
while(j--) {
bio = r1_bio-bios[j];
-   for (i = 0; i  RESYNC_PAGES; i++) {
-   page = alloc_page(gfp_flags);
-   if (unlikely(!page))
-   goto out_free_pages;
+   bio-bi_vcnt = RESYNC_PAGES;
 
-   bio-bi_io_vec[i].bv_page = page;
-   bio-bi_vcnt = i+1;
-   }
+   if (bio_alloc_pages(bio, gfp_flags))
+   goto out_free_bio;
}
/* If not user-requests, copy the page pointers to all bios */
if (!test_bit(MD_RECOVERY_REQUESTED, pi-mddev-recovery)) {
@@ -143,11 +138,6 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 
return r1_bio;
 
-out_free_pages:
-   for (j=0 ; j  pi-raid_disks; j++)
-   for (i=0; i  r1_bio-bios[j]-bi_vcnt ; i++)
-   put_page(r1_bio-bios[j]-bi_io_vec[i].bv_page);
-   j = -1;
 out_free_bio:
while (++j  pi-raid_disks)
bio_put(r1_bio-bios[j]);
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 22/26] block: Add bio_alloc_pages()

2012-09-10 Thread Kent Overstreet
More utility code to replace stuff that's getting open coded.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 fs/bio.c| 28 
 include/linux/bio.h |  1 +
 2 files changed, 29 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index d88ad77..65e6eac 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -828,6 +828,34 @@ void bio_advance(struct bio *bio, unsigned bytes)
 EXPORT_SYMBOL(bio_advance);
 
 /**
+ * bio_alloc_pages - allocates a single page for each bvec in a bio
+ * @bio: bio to allocate pages for
+ * @gfp_mask: flags for allocation
+ *
+ * Allocates pages up to @bio-bi_vcnt.
+ *
+ * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages 
are
+ * freed.
+ */
+int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
+{
+   int i;
+   struct bio_vec *bv;
+
+   bio_for_each_segment_all(bv, bio, i) {
+   bv-bv_page = alloc_page(gfp_mask);
+   if (!bv-bv_page) {
+   while (bv-- != bio-bi_io_vec)
+   __free_page(bv-bv_page);
+   return -ENOMEM;
+   }
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(bio_alloc_pages);
+
+/**
  * bio_copy_data - copy contents of data buffers from one chain of bios to
  * another
  * @src: source bio list
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b433ff8..bd45154 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -289,6 +289,7 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
 #endif
 
 extern void bio_copy_data(struct bio *dst, struct bio *src);
+extern int bio_alloc_pages(struct bio *bio, gfp_t gfp);
 
 extern struct bio *bio_copy_user(struct request_queue *, struct rq_map_data *,
 unsigned long, unsigned int, int, gfp_t);
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 20/26] block: Add bio_for_each_segment_all()

2012-09-10 Thread Kent Overstreet
This is part of the immutable bvec prep work; bio_for_each_segment() is
going to have a different implementation so these need to be split
apart.

This change is also to better document the intent of code that's using
it - bio_for_each_segment_all() is only legal to use for code that owns
the bio.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/md/raid1.c  |  2 +-
 fs/bio.c| 12 ++--
 fs/exofs/ore.c  |  2 +-
 fs/exofs/ore_raid.c |  2 +-
 include/linux/bio.h | 16 +---
 mm/bounce.c |  2 +-
 6 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6cd1fb2..ade95ac 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1283,7 +1283,7 @@ read_again:
 * know the original bi_idx, so we just free
 * them all
 */
-   __bio_for_each_segment(bvec, mbio, j, 0)
+   bio_for_each_segment_all(bvec, mbio, j)
bvec-bv_page = r1_bio-behind_bvecs[j].bv_page;
if (test_bit(WriteMostly, 
conf-mirrors[i].rdev-flags))
atomic_inc(r1_bio-behind_remaining);
diff --git a/fs/bio.c b/fs/bio.c
index 7fb9f4e..efdc437 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -959,7 +959,7 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec 
*iovecs,
int iov_idx = 0;
unsigned int iov_off = 0;
 
-   __bio_for_each_segment(bvec, bio, i, 0) {
+   bio_for_each_segment_all(bvec, bio, i) {
char *bv_addr = page_address(bvec-bv_page);
unsigned int bv_len = iovecs[i].bv_len;
 
@@ -1141,7 +1141,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
return bio;
 cleanup:
if (!map_data)
-   bio_for_each_segment(bvec, bio, i)
+   bio_for_each_segment_all(bvec, bio, i)
__free_page(bvec-bv_page);
 
bio_put(bio);
@@ -1355,7 +1355,7 @@ static void __bio_unmap_user(struct bio *bio)
/*
 * make sure we dirty pages we wrote to
 */
-   __bio_for_each_segment(bvec, bio, i, 0) {
+   bio_for_each_segment_all(bvec, bio, i) {
if (bio_data_dir(bio) == READ)
set_page_dirty_lock(bvec-bv_page);
 
@@ -1461,7 +1461,7 @@ static void bio_copy_kern_endio(struct bio *bio, int err)
int i;
char *p = bmd-sgvecs[0].iov_base;
 
-   __bio_for_each_segment(bvec, bio, i, 0) {
+   bio_for_each_segment_all(bvec, bio, i) {
char *addr = page_address(bvec-bv_page);
int len = bmd-iovecs[i].bv_len;
 
@@ -1501,7 +1501,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void 
*data, unsigned int len,
if (!reading) {
void *p = data;
 
-   bio_for_each_segment(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i) {
char *addr = page_address(bvec-bv_page);
 
memcpy(addr, p, bvec-bv_len);
@@ -1780,7 +1780,7 @@ sector_t bio_sector_offset(struct bio *bio, unsigned 
short index,
if (index = bio-bi_idx)
index = bio-bi_vcnt - 1;
 
-   __bio_for_each_segment(bv, bio, i, 0) {
+   bio_for_each_segment_all(bv, bio, i) {
if (i == index) {
if (offset  bv-bv_offset)
sectors += (offset - bv-bv_offset) / sector_sz;
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index f936cb5..b744228 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -401,7 +401,7 @@ static void _clear_bio(struct bio *bio)
struct bio_vec *bv;
unsigned i;
 
-   __bio_for_each_segment(bv, bio, i, 0) {
+   bio_for_each_segment_all(bv, bio, i) {
unsigned this_count = bv-bv_len;
 
if (likely(PAGE_SIZE == this_count))
diff --git a/fs/exofs/ore_raid.c b/fs/exofs/ore_raid.c
index 5f376d1..4dec928 100644
--- a/fs/exofs/ore_raid.c
+++ b/fs/exofs/ore_raid.c
@@ -432,7 +432,7 @@ static void _mark_read4write_pages_uptodate(struct 
ore_io_state *ios, int ret)
if (!bio)
continue;
 
-   __bio_for_each_segment(bv, bio, i, 0) {
+   bio_for_each_segment_all(bv, bio, i) {
struct page *page = bv-bv_page;
 
SetPageUptodate(page);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 92015ce..b433ff8 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -137,16 +137,18 @@ static inline int bio_has_allocated_vec(struct bio *bio)
 #define bio_io_error(bio) bio_endio((bio), -EIO)
 
 /*
- * drivers should not use the __ version unless they _really_ want to
- * run through the entire bio and not just pending pieces
+ * drivers should _never_ use the all version - the bio may have been split

[PATCH v2 21/26] block: Convert some code to bio_for_each_segment_all()

2012-09-10 Thread Kent Overstreet
A few places in the code were either open coding or using the wrong
version - fix.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/dm-crypt.c |  3 +--
 drivers/md/raid1.c| 10 +++---
 fs/bio.c  | 20 ++--
 fs/direct-io.c|  8 
 4 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index bbf459b..f50798e 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -858,8 +858,7 @@ static void crypt_free_buffer_pages(struct crypt_config 
*cc, struct bio *clone)
unsigned int i;
struct bio_vec *bv;
 
-   for (i = 0; i  clone-bi_vcnt; i++) {
-   bv = bio_iovec_idx(clone, i);
+   bio_for_each_segment_all(bv, clone, i) {
BUG_ON(!bv-bv_page);
mempool_free(bv-bv_page, cc-page_pool);
bv-bv_page = NULL;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ade95ac..d30b4cb 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -921,7 +921,7 @@ static void alloc_behind_pages(struct bio *bio, struct 
r1bio *r1_bio)
if (unlikely(!bvecs))
return;
 
-   bio_for_each_segment(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i) {
bvecs[i] = *bvec;
bvecs[i].bv_page = alloc_page(GFP_NOIO);
if (unlikely(!bvecs[i].bv_page))
@@ -1276,12 +1276,8 @@ read_again:
struct bio_vec *bvec;
int j;
 
-   /* Yes, I really want the '__' version so that
-* we clear any unused pointer in the io_vec, rather
-* than leave them unchanged.  This is important
-* because when we come to free the pages, we won't
-* know the original bi_idx, so we just free
-* them all
+   /*
+* We trimmed the bio, so _all is legit
 */
bio_for_each_segment_all(bvec, mbio, j)
bvec-bv_page = r1_bio-behind_bvecs[j].bv_page;
diff --git a/fs/bio.c b/fs/bio.c
index efdc437..d88ad77 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1546,11 +1546,11 @@ EXPORT_SYMBOL(bio_copy_kern);
  */
 void bio_set_pages_dirty(struct bio *bio)
 {
-   struct bio_vec *bvec = bio-bi_io_vec;
+   struct bio_vec *bvec;
int i;
 
-   for (i = 0; i  bio-bi_vcnt; i++) {
-   struct page *page = bvec[i].bv_page;
+   bio_for_each_segment_all(bvec, bio, i) {
+   struct page *page = bvec-bv_page;
 
if (page  !PageCompound(page))
set_page_dirty_lock(page);
@@ -1559,11 +1559,11 @@ void bio_set_pages_dirty(struct bio *bio)
 
 static void bio_release_pages(struct bio *bio)
 {
-   struct bio_vec *bvec = bio-bi_io_vec;
+   struct bio_vec *bvec;
int i;
 
-   for (i = 0; i  bio-bi_vcnt; i++) {
-   struct page *page = bvec[i].bv_page;
+   bio_for_each_segment_all(bvec, bio, i) {
+   struct page *page = bvec-bv_page;
 
if (page)
put_page(page);
@@ -1612,16 +1612,16 @@ static void bio_dirty_fn(struct work_struct *work)
 
 void bio_check_pages_dirty(struct bio *bio)
 {
-   struct bio_vec *bvec = bio-bi_io_vec;
+   struct bio_vec *bvec;
int nr_clean_pages = 0;
int i;
 
-   for (i = 0; i  bio-bi_vcnt; i++) {
-   struct page *page = bvec[i].bv_page;
+   bio_for_each_segment_all(bvec, bio, i) {
+   struct page *page = bvec-bv_page;
 
if (PageDirty(page) || PageCompound(page)) {
page_cache_release(page);
-   bvec[i].bv_page = NULL;
+   bvec-bv_page = NULL;
} else {
nr_clean_pages++;
}
diff --git a/fs/direct-io.c b/fs/direct-io.c
index f86c720..6089916 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -441,8 +441,8 @@ static struct bio *dio_await_one(struct dio *dio)
 static int dio_bio_complete(struct dio *dio, struct bio *bio)
 {
const int uptodate = test_bit(BIO_UPTODATE, bio-bi_flags);
-   struct bio_vec *bvec = bio-bi_io_vec;
-   int page_no;
+   struct bio_vec *bvec;
+   unsigned i;
 
if (!uptodate)
dio-io_error = -EIO;
@@ -450,8 +450,8 @@ static int dio_bio_complete(struct dio *dio, struct bio 
*bio)
if (dio-is_async  dio-rw == READ) {
bio_check_pages_dirty(bio); /* transfers ownership */
} else {
-   for (page_no = 0; page_no  bio-bi_vcnt; page_no++) {
-   struct page *page = bvec[page_no].bv_page;
+   bio_for_each_segment_all(bvec, bio, i

[PATCH v2 19/26] bounce: Refactor __blk_queue_bounce to not use bi_io_vec

2012-09-10 Thread Kent Overstreet
A bunch of what __blk_queue_bounce() was doing was problematic for the
immutable bvec work; this cleans that up and the code is quite a bit
smaller, too.

The __bio_for_each_segment() in copy_to_high_bio_irq() was changed
because that one's looping over the original bio, not the bounce bio -
since the bounce code doesn't own that bio the __ version wasn't
correct.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 mm/bounce.c | 73 -
 1 file changed, 19 insertions(+), 54 deletions(-)

diff --git a/mm/bounce.c b/mm/bounce.c
index 0420867..3068300 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -101,7 +101,7 @@ static void copy_to_high_bio_irq(struct bio *to, struct bio 
*from)
struct bio_vec *tovec, *fromvec;
int i;
 
-   __bio_for_each_segment(tovec, to, i, 0) {
+   bio_for_each_segment(tovec, to, i) {
fromvec = from-bi_io_vec + i;
 
/*
@@ -181,78 +181,43 @@ static void bounce_end_io_read_isa(struct bio *bio, int 
err)
 static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
   mempool_t *pool)
 {
-   struct page *page;
-   struct bio *bio = NULL;
-   int i, rw = bio_data_dir(*bio_orig);
+   struct bio *bio;
+   int rw = bio_data_dir(*bio_orig);
struct bio_vec *to, *from;
+   unsigned i;
 
-   bio_for_each_segment(from, *bio_orig, i) {
-   page = from-bv_page;
+   bio_for_each_segment(from, *bio_orig, i)
+   if (page_to_pfn(from-bv_page)  queue_bounce_pfn(q))
+   goto bounce;
 
-   /*
-* is destination page below bounce pfn?
-*/
-   if (page_to_pfn(page) = queue_bounce_pfn(q))
-   continue;
-
-   /*
-* irk, bounce it
-*/
-   if (!bio) {
-   unsigned int cnt = (*bio_orig)-bi_vcnt;
+   return;
+bounce:
+   bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
 
-   bio = bio_alloc(GFP_NOIO, cnt);
-   memset(bio-bi_io_vec, 0, cnt * sizeof(struct bio_vec));
-   }
-   
+   bio_for_each_segment(to, bio, i) {
+   struct page *page = to-bv_page;
 
-   to = bio-bi_io_vec + i;
+   if (page_to_pfn(page) = queue_bounce_pfn(q))
+   continue;
 
-   to-bv_page = mempool_alloc(pool, q-bounce_gfp);
-   to-bv_len = from-bv_len;
-   to-bv_offset = from-bv_offset;
inc_zone_page_state(to-bv_page, NR_BOUNCE);
+   to-bv_page = mempool_alloc(pool, q-bounce_gfp);
 
if (rw == WRITE) {
char *vto, *vfrom;
 
-   flush_dcache_page(from-bv_page);
+   flush_dcache_page(page);
+
vto = page_address(to-bv_page) + to-bv_offset;
-   vfrom = kmap(from-bv_page) + from-bv_offset;
+   vfrom = kmap_atomic(page) + to-bv_offset;
memcpy(vto, vfrom, to-bv_len);
-   kunmap(from-bv_page);
+   kunmap_atomic(vfrom);
}
}
 
-   /*
-* no pages bounced
-*/
-   if (!bio)
-   return;
-
trace_block_bio_bounce(q, *bio_orig);
 
-   /*
-* at least one page was bounced, fill in possible non-highmem
-* pages
-*/
-   __bio_for_each_segment(from, *bio_orig, i, 0) {
-   to = bio_iovec_idx(bio, i);
-   if (!to-bv_page) {
-   to-bv_page = from-bv_page;
-   to-bv_len = from-bv_len;
-   to-bv_offset = from-bv_offset;
-   }
-   }
-
-   bio-bi_bdev = (*bio_orig)-bi_bdev;
bio-bi_flags |= (1  BIO_BOUNCED);
-   bio-bi_sector = (*bio_orig)-bi_sector;
-   bio-bi_rw = (*bio_orig)-bi_rw;
-
-   bio-bi_vcnt = (*bio_orig)-bi_vcnt;
-   bio-bi_idx = (*bio_orig)-bi_idx;
-   bio-bi_size = (*bio_orig)-bi_size;
 
if (pool == page_pool) {
bio-bi_end_io = bounce_end_io_write;
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 18/26] raid1: use bio_copy_data()

2012-09-10 Thread Kent Overstreet
This doesn't really delete any code _yet_, but once immutable bvecs are
done we can just delete the rest of the code in that loop.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/raid1.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b1072da..6cd1fb2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1895,10 +1895,9 @@ static int process_checks(struct r1bio *r1_bio)
else
bi-bv_len = size;
size -= PAGE_SIZE;
-   memcpy(page_address(bi-bv_page),
-  page_address(pbio-bi_io_vec[j].bv_page),
-  PAGE_SIZE);
}
+
+   bio_copy_data(sbio, pbio);
}
return 0;
 }
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 17/26] pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage

2012-09-10 Thread Kent Overstreet
In the short term this'll help with code auditing, and if this code ever
gets used now it's converted :)

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jiri Kosina jkos...@suse.cz
---
 drivers/block/pktcdvd.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 1079a77..5318ad39 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1156,16 +1156,15 @@ static int pkt_start_recovery(struct packet_data *pkt)
new_sector = new_block * (CD_FRAMESIZE  9);
pkt-sector = new_sector;
 
+   bio_reset(pkt-bio);
+   pkt-bio-bi_bdev = pd-bdev;
+   pkt-bio-bi_rw = REQ_WRITE;
pkt-bio-bi_sector = new_sector;
-   pkt-bio-bi_next = NULL;
-   pkt-bio-bi_flags = 1  BIO_UPTODATE;
-   pkt-bio-bi_idx = 0;
-
-   BUG_ON(pkt-bio-bi_rw != REQ_WRITE);
-   BUG_ON(pkt-bio-bi_vcnt != pkt-frames);
-   BUG_ON(pkt-bio-bi_size != pkt-frames * CD_FRAMESIZE);
-   BUG_ON(pkt-bio-bi_end_io != pkt_end_io_packet_write);
-   BUG_ON(pkt-bio-bi_private != pkt);
+   pkt-bio-bi_size = pkt-frames * CD_FRAMESIZE;
+   pkt-bio-bi_vcnt = pkt-frames;
+
+   pkt-bio-bi_end_io = pkt_end_io_packet_write;
+   pkt-bio-bi_private = pkt;
 
drop_super(sb);
return 1;
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 14/26] raid1: Refactor narrow_write_error() to not use bi_idx

2012-09-10 Thread Kent Overstreet
More bi_idx removal. This code was just open coding bio_clone(). This
could probably be further improved by using bio_advance() instead of
skipping over null pages, but that'd be a larger rework.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/raid1.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index bd3e3b9..b1072da 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2052,8 +2052,6 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
struct mddev *mddev = r1_bio-mddev;
struct r1conf *conf = mddev-private;
struct md_rdev *rdev = conf-mirrors[i].rdev;
-   int vcnt, idx;
-   struct bio_vec *vec;
 
/* bio has the data to be written to device 'i' where
 * we just recently had a write error.
@@ -2081,30 +2079,32 @@ static int narrow_write_error(struct r1bio *r1_bio, int 
i)
~(sector_t)(block_sectors - 1))
- sector;
 
-   if (test_bit(R1BIO_BehindIO, r1_bio-state)) {
-   vcnt = r1_bio-behind_page_count;
-   vec = r1_bio-behind_bvecs;
-   idx = 0;
-   while (vec[idx].bv_page == NULL)
-   idx++;
-   } else {
-   vcnt = r1_bio-master_bio-bi_vcnt;
-   vec = r1_bio-master_bio-bi_io_vec;
-   idx = r1_bio-master_bio-bi_idx;
-   }
while (sect_to_write) {
struct bio *wbio;
if (sectors  sect_to_write)
sectors = sect_to_write;
/* Write at 'sector' for 'sectors'*/
 
-   wbio = bio_alloc_mddev(GFP_NOIO, vcnt, mddev);
-   memcpy(wbio-bi_io_vec, vec, vcnt * sizeof(struct bio_vec));
-   wbio-bi_sector = r1_bio-sector;
+   if (test_bit(R1BIO_BehindIO, r1_bio-state)) {
+   unsigned vcnt = r1_bio-behind_page_count;
+   struct bio_vec *vec = r1_bio-behind_bvecs;
+
+   while (!vec-bv_page) {
+   vec++;
+   vcnt--;
+   }
+
+   wbio = bio_alloc_mddev(GFP_NOIO, vcnt, mddev);
+   memcpy(wbio-bi_io_vec, vec, vcnt * sizeof(struct 
bio_vec));
+
+   wbio-bi_vcnt = vcnt;
+   } else {
+   wbio = bio_clone_mddev(r1_bio-master_bio, GFP_NOIO, 
mddev);
+   }
+
wbio-bi_rw = WRITE;
-   wbio-bi_vcnt = vcnt;
+   wbio-bi_sector = r1_bio-sector;
wbio-bi_size = r1_bio-sectors  9;
-   wbio-bi_idx = idx;
 
md_trim_bio(wbio, sector - r1_bio-sector, sectors);
wbio-bi_sector += rdev-data_offset;
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 13/26] raid5: use bio_reset()

2012-09-10 Thread Kent Overstreet
Had to shuffle the code around a bit (where bi_rw and bi_end_io were
set), but shouldn't really be anything tricky here

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/raid5.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7c19dbe..ebe43f7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -561,14 +561,6 @@ static void ops_run_io(struct stripe_head *sh, struct 
stripe_head_state *s)
bi = sh-dev[i].req;
rbi = sh-dev[i].rreq; /* For writing to replacement */
 
-   bi-bi_rw = rw;
-   rbi-bi_rw = rw;
-   if (rw  WRITE) {
-   bi-bi_end_io = raid5_end_write_request;
-   rbi-bi_end_io = raid5_end_write_request;
-   } else
-   bi-bi_end_io = raid5_end_read_request;
-
rcu_read_lock();
rrdev = rcu_dereference(conf-disks[i].replacement);
smp_mb(); /* Ensure that if rrdev is NULL, rdev won't be */
@@ -643,7 +635,14 @@ static void ops_run_io(struct stripe_head *sh, struct 
stripe_head_state *s)
 
set_bit(STRIPE_IO_STARTED, sh-state);
 
+   bio_reset(bi);
bi-bi_bdev = rdev-bdev;
+   bi-bi_rw = rw;
+   bi-bi_end_io = (rw  WRITE)
+   ? raid5_end_write_request
+   : raid5_end_read_request;
+   bi-bi_private = sh;
+
pr_debug(%s: for %llu schedule op %ld on disc %d\n,
__func__, (unsigned long long)sh-sector,
bi-bi_rw, i);
@@ -657,12 +656,9 @@ static void ops_run_io(struct stripe_head *sh, struct 
stripe_head_state *s)
if (test_bit(R5_ReadNoMerge, sh-dev[i].flags))
bi-bi_rw |= REQ_FLUSH;
 
-   bi-bi_flags = 1  BIO_UPTODATE;
-   bi-bi_idx = 0;
bi-bi_io_vec[0].bv_len = STRIPE_SIZE;
bi-bi_io_vec[0].bv_offset = 0;
bi-bi_size = STRIPE_SIZE;
-   bi-bi_next = NULL;
if (rrdev)
set_bit(R5_DOUBLE_LOCKED, sh-dev[i].flags);
generic_make_request(bi);
@@ -674,7 +670,14 @@ static void ops_run_io(struct stripe_head *sh, struct 
stripe_head_state *s)
 
set_bit(STRIPE_IO_STARTED, sh-state);
 
+   bio_reset(rbi);
rbi-bi_bdev = rrdev-bdev;
+   rbi-bi_rw = rw;
+   rbi-bi_end_io = (rw  WRITE)
+   ? raid5_end_write_request
+   : raid5_end_read_request;
+   rbi-bi_private = sh;
+
pr_debug(%s: for %llu schedule op %ld on 
 replacement disc %d\n,
__func__, (unsigned long long)sh-sector,
@@ -686,12 +689,9 @@ static void ops_run_io(struct stripe_head *sh, struct 
stripe_head_state *s)
else
rbi-bi_sector = (sh-sector
  + rrdev-data_offset);
-   rbi-bi_flags = 1  BIO_UPTODATE;
-   rbi-bi_idx = 0;
rbi-bi_io_vec[0].bv_len = STRIPE_SIZE;
rbi-bi_io_vec[0].bv_offset = 0;
rbi-bi_size = STRIPE_SIZE;
-   rbi-bi_next = NULL;
generic_make_request(rbi);
}
if (!rdev  !rrdev) {
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 12/26] raid1: use bio_reset()

2012-09-10 Thread Kent Overstreet
I couldn't figure out what sbio-bi_end_io in process_checks() was
supposed to be, so I took the easy way out.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/raid1.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ee85154..bd3e3b9 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1835,6 +1835,7 @@ static int process_checks(struct r1bio *r1_bio)
int primary;
int i;
int vcnt;
+   bio_end_io_t *bi_end_io;
 
for (primary = 0; primary  conf-raid_disks * 2; primary++)
if (r1_bio-bios[primary]-bi_end_io == end_sync_read 
@@ -1876,13 +1877,11 @@ static int process_checks(struct r1bio *r1_bio)
continue;
}
/* fixup the bio for reuse */
+   bi_end_io = sbio-bi_end_io;
+   bio_reset(sbio);
+
sbio-bi_vcnt = vcnt;
sbio-bi_size = r1_bio-sectors  9;
-   sbio-bi_idx = 0;
-   sbio-bi_phys_segments = 0;
-   sbio-bi_flags = ~(BIO_POOL_MASK - 1);
-   sbio-bi_flags |= 1  BIO_UPTODATE;
-   sbio-bi_next = NULL;
sbio-bi_sector = r1_bio-sector +
conf-mirrors[i].rdev-data_offset;
sbio-bi_bdev = conf-mirrors[i].rdev-bdev;
@@ -2426,18 +2425,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr, int *skipp
for (i = 0; i  conf-raid_disks * 2; i++) {
struct md_rdev *rdev;
bio = r1_bio-bios[i];
-
-   /* take from bio_init */
-   bio-bi_next = NULL;
-   bio-bi_flags = ~(BIO_POOL_MASK-1);
-   bio-bi_flags |= 1  BIO_UPTODATE;
-   bio-bi_rw = READ;
-   bio-bi_vcnt = 0;
-   bio-bi_idx = 0;
-   bio-bi_phys_segments = 0;
-   bio-bi_size = 0;
-   bio-bi_end_io = NULL;
-   bio-bi_private = NULL;
+   bio_reset(bio);
 
rdev = rcu_dereference(conf-mirrors[i].rdev);
if (rdev == NULL ||
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 11/26] raid10: Use bio_reset()

2012-09-10 Thread Kent Overstreet
More prep work for immutable bio vecs, mainly getting rid of references
to bi_idx.

bio_reset was being open coded in a few places. The one in sync_request
was a bit nontrivial to convert, so could use some extra eyeballs.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/raid10.c | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f001c1b..6b83207 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1958,13 +1958,10 @@ static void sync_request_write(struct mddev *mddev, 
struct r10bio *r10_bio)
 * First we need to fixup bv_offset, bv_len and
 * bi_vecs, as the read request might have corrupted these
 */
+   bio_reset(tbio);
+
tbio-bi_vcnt = vcnt;
tbio-bi_size = r10_bio-sectors  9;
-   tbio-bi_idx = 0;
-   tbio-bi_phys_segments = 0;
-   tbio-bi_flags = ~(BIO_POOL_MASK - 1);
-   tbio-bi_flags |= 1  BIO_UPTODATE;
-   tbio-bi_next = NULL;
tbio-bi_rw = WRITE;
tbio-bi_private = r10_bio;
tbio-bi_sector = r10_bio-devs[i].addr;
@@ -2970,6 +2967,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr,
}
}
bio = r10_bio-devs[0].bio;
+   bio_reset(bio);
bio-bi_next = biolist;
biolist = bio;
bio-bi_private = r10_bio;
@@ -2994,6 +2992,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr,
rdev = mirror-rdev;
if (!test_bit(In_sync, rdev-flags)) {
bio = r10_bio-devs[1].bio;
+   bio_reset(bio);
bio-bi_next = biolist;
biolist = bio;
bio-bi_private = r10_bio;
@@ -3022,6 +3021,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr,
if (rdev == NULL || bio == NULL ||
test_bit(Faulty, rdev-flags))
break;
+   bio_reset(bio);
bio-bi_next = biolist;
biolist = bio;
bio-bi_private = r10_bio;
@@ -3120,7 +3120,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr,
r10_bio-devs[i].repl_bio-bi_end_io = NULL;
 
bio = r10_bio-devs[i].bio;
-   bio-bi_end_io = NULL;
+   bio_reset(bio);
clear_bit(BIO_UPTODATE, bio-bi_flags);
if (conf-mirrors[d].rdev == NULL ||
test_bit(Faulty, conf-mirrors[d].rdev-flags))
@@ -3157,6 +3157,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr,
 
/* Need to set up for writing to the replacement */
bio = r10_bio-devs[i].repl_bio;
+   bio_reset(bio);
clear_bit(BIO_UPTODATE, bio-bi_flags);
 
sector = r10_bio-devs[i].addr;
@@ -3190,17 +3191,6 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr,
}
}
 
-   for (bio = biolist; bio ; bio=bio-bi_next) {
-
-   bio-bi_flags = ~(BIO_POOL_MASK - 1);
-   if (bio-bi_end_io)
-   bio-bi_flags |= 1  BIO_UPTODATE;
-   bio-bi_vcnt = 0;
-   bio-bi_idx = 0;
-   bio-bi_phys_segments = 0;
-   bio-bi_size = 0;
-   }
-
nr_sectors = 0;
if (sector_nr + max_sync  max_sector)
max_sector = sector_nr + max_sync;
@@ -4253,17 +4243,14 @@ read_more:
}
if (!rdev2 || test_bit(Faulty, rdev2-flags))
continue;
+
+   bio_reset(b);
b-bi_bdev = rdev2-bdev;
b-bi_sector = r10_bio-devs[s/2].addr + rdev2-new_data_offset;
b-bi_private = r10_bio;
b-bi_end_io = end_reshape_write;
b-bi_rw = WRITE;
-   b-bi_flags = ~(BIO_POOL_MASK - 1);
-   b-bi_flags |= 1  BIO_UPTODATE;
b-bi_next = blist;
-   b-bi_vcnt = 0;
-   b-bi_idx = 0;
-   b-bi_size = 0;
blist = b;
}
 
-- 
1.7.12

--
To unsubscribe from this list: send the line

[PATCH v2 10/26] block: Add submit_bio_wait(), remove from md

2012-09-10 Thread Kent Overstreet
Random cleanup - this code was duplicated and it's not really specific
to md.

Also added the ability to return the actual error code.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/raid1.c  | 19 ---
 drivers/md/raid10.c | 19 ---
 fs/bio.c| 36 
 include/linux/bio.h |  1 +
 4 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 2488440..ee85154 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2048,25 +2048,6 @@ static void fix_read_error(struct r1conf *conf, int 
read_disk,
}
 }
 
-static void bi_complete(struct bio *bio, int error)
-{
-   complete((struct completion *)bio-bi_private);
-}
-
-static int submit_bio_wait(int rw, struct bio *bio)
-{
-   struct completion event;
-   rw |= REQ_SYNC;
-
-   init_completion(event);
-   bio-bi_private = event;
-   bio-bi_end_io = bi_complete;
-   submit_bio(rw, bio);
-   wait_for_completion(event);
-
-   return test_bit(BIO_UPTODATE, bio-bi_flags);
-}
-
 static int narrow_write_error(struct r1bio *r1_bio, int i)
 {
struct mddev *mddev = r1_bio-mddev;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6d06d83..f001c1b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2410,25 +2410,6 @@ static void fix_read_error(struct r10conf *conf, struct 
mddev *mddev, struct r10
}
 }
 
-static void bi_complete(struct bio *bio, int error)
-{
-   complete((struct completion *)bio-bi_private);
-}
-
-static int submit_bio_wait(int rw, struct bio *bio)
-{
-   struct completion event;
-   rw |= REQ_SYNC;
-
-   init_completion(event);
-   bio-bi_private = event;
-   bio-bi_end_io = bi_complete;
-   submit_bio(rw, bio);
-   wait_for_completion(event);
-
-   return test_bit(BIO_UPTODATE, bio-bi_flags);
-}
-
 static int narrow_write_error(struct r10bio *r10_bio, int i)
 {
struct bio *bio = r10_bio-master_bio;
diff --git a/fs/bio.c b/fs/bio.c
index addeac2..1342a16 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -750,6 +750,42 @@ int bio_add_page(struct bio *bio, struct page *page, 
unsigned int len,
 }
 EXPORT_SYMBOL(bio_add_page);
 
+struct submit_bio_ret {
+   struct completion event;
+   int error;
+};
+
+static void submit_bio_wait_endio(struct bio *bio, int error)
+{
+   struct submit_bio_ret *ret = bio-bi_private;
+
+   ret-error = error;
+   complete(ret-event);
+}
+
+/**
+ * submit_bio_wait - submit a bio, and wait until it completes
+ * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
+ * @bio: The struct bio which describes the I/O
+ *
+ * Simple wrapper around submit_bio(). Returns 0 on success, or the error from
+ * bio_endio() on failure.
+ */
+int submit_bio_wait(int rw, struct bio *bio)
+{
+   struct submit_bio_ret ret;
+
+   rw |= REQ_SYNC;
+   init_completion(ret.event);
+   bio-bi_private = ret;
+   bio-bi_end_io = submit_bio_wait_endio;
+   submit_bio(rw, bio);
+   wait_for_completion(ret.event);
+
+   return ret.error;
+}
+EXPORT_SYMBOL(submit_bio_wait);
+
 /**
  * bio_advance - increment/complete a bio by some number of bytes
  * @bio:   bio to advance
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 92bff0e..949c48a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -249,6 +249,7 @@ extern void bio_endio(struct bio *, int);
 struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
 
+extern int submit_bio_wait(int rw, struct bio *bio);
 void bio_advance(struct bio *, unsigned);
 
 extern void bio_init(struct bio *);
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 09/26] block: Remove some unnecessary bi_vcnt usage

2012-09-10 Thread Kent Overstreet
More prep work for immutable bvecs/effecient bio splitting - usage of
bi_vcnt has to be auditing, so getting rid of all the unnecessary usage
makes that easier.

Plus, bio_segments() is really what this code wanted, as it respects the
current value of bi_idx.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/message/fusion/mptsas.c  |  6 +++---
 drivers/scsi/libsas/sas_expander.c   |  6 +++---
 drivers/scsi/mpt2sas/mpt2sas_transport.c | 10 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 551262e..5406a9f 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -2235,10 +2235,10 @@ static int mptsas_smp_handler(struct Scsi_Host *shost, 
struct sas_rphy *rphy,
}
 
/* do we need to support multiple segments? */
-   if (req-bio-bi_vcnt  1 || rsp-bio-bi_vcnt  1) {
+   if (bio_segments(req-bio)  1 || bio_segments(rsp-bio)  1) {
printk(MYIOC_s_ERR_FMT %s: multiple segments req %u %u, rsp %u 
%u\n,
-   ioc-name, __func__, req-bio-bi_vcnt, blk_rq_bytes(req),
-   rsp-bio-bi_vcnt, blk_rq_bytes(rsp));
+   ioc-name, __func__, bio_segments(req-bio), 
blk_rq_bytes(req),
+   bio_segments(rsp-bio), blk_rq_bytes(rsp));
return -EINVAL;
}
 
diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
index efc6e72..ee331a7 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2151,10 +2151,10 @@ int sas_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
}
 
/* do we need to support multiple segments? */
-   if (req-bio-bi_vcnt  1 || rsp-bio-bi_vcnt  1) {
+   if (bio_segments(req-bio)  1 || bio_segments(rsp-bio)  1) {
printk(%s: multiple segments req %u %u, rsp %u %u\n,
-  __func__, req-bio-bi_vcnt, blk_rq_bytes(req),
-  rsp-bio-bi_vcnt, blk_rq_bytes(rsp));
+  __func__, bio_segments(req-bio), blk_rq_bytes(req),
+  bio_segments(rsp-bio), blk_rq_bytes(rsp));
return -EINVAL;
}
 
diff --git a/drivers/scsi/mpt2sas/mpt2sas_transport.c 
b/drivers/scsi/mpt2sas/mpt2sas_transport.c
index c6cf20f..403a57b 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_transport.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_transport.c
@@ -1939,7 +1939,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
ioc-transport_cmds.status = MPT2_CMD_PENDING;
 
/* Check if the request is split across multiple segments */
-   if (req-bio-bi_vcnt  1) {
+   if (bio_segments(req-bio)  1) {
u32 offset = 0;
 
/* Allocate memory and copy the request */
@@ -1971,7 +1971,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
 
/* Check if the response needs to be populated across
 * multiple segments */
-   if (rsp-bio-bi_vcnt  1) {
+   if (bio_segments(rsp-bio)  1) {
pci_addr_in = pci_alloc_consistent(ioc-pdev, blk_rq_bytes(rsp),
pci_dma_in);
if (!pci_addr_in) {
@@ -2038,7 +2038,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
sgl_flags = sgl_flags  MPI2_SGE_FLAGS_SHIFT;
-   if (req-bio-bi_vcnt  1) {
+   if (bio_segments(req-bio)  1) {
ioc-base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(req) - 4), pci_dma_out);
} else {
@@ -2054,7 +2054,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
MPI2_SGE_FLAGS_END_OF_LIST);
sgl_flags = sgl_flags  MPI2_SGE_FLAGS_SHIFT;
-   if (rsp-bio-bi_vcnt  1) {
+   if (bio_segments(rsp-bio)  1) {
ioc-base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(rsp) + 4), pci_dma_in);
} else {
@@ -2099,7 +2099,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
le16_to_cpu(mpi_reply-ResponseDataLength);
/* check if the resp needs to be copied from the allocated
 * pci mem */
-   if (rsp-bio-bi_vcnt  1) {
+   if (bio_segments(rsp-bio)  1) {
u32 offset = 0;
u32 bytes_to_copy =
le16_to_cpu(mpi_reply-ResponseDataLength);
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo

[PATCH v2 08/26] block: Remove bi_idx references

2012-09-10 Thread Kent Overstreet
These were harmless but uneccessary,andt getting rid of them makes the
code easier to audit since most of them need to be removed.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 drivers/block/floppy.c | 1 -
 drivers/md/dm-verity.c | 2 +-
 drivers/md/raid10.c| 1 -
 fs/buffer.c| 1 -
 fs/jfs/jfs_logmgr.c| 2 --
 fs/logfs/dev_bdev.c| 5 -
 mm/page_io.c   | 1 -
 7 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 95e52879..24e5cef 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3778,7 +3778,6 @@ static int __floppy_read_block_0(struct block_device 
*bdev)
bio_vec.bv_len = size;
bio_vec.bv_offset = 0;
bio.bi_vcnt = 1;
-   bio.bi_idx = 0;
bio.bi_size = size;
bio.bi_bdev = bdev;
bio.bi_sector = 0;
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 18ef6c5..6956626 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -496,7 +496,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio,
 
bio-bi_end_io = verity_end_io;
bio-bi_private = io;
-   io-io_vec_size = bio-bi_vcnt - bio-bi_idx;
+   io-io_vec_size = bio_segments(bio);
if (io-io_vec_size  DM_VERITY_IO_VEC_INLINE)
io-io_vec = io-io_vec_inline;
else
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index bbd08f5..6d06d83 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4249,7 +4249,6 @@ read_more:
read_bio-bi_flags = ~(BIO_POOL_MASK - 1);
read_bio-bi_flags |= 1  BIO_UPTODATE;
read_bio-bi_vcnt = 0;
-   read_bio-bi_idx = 0;
read_bio-bi_size = 0;
r10_bio-master_bio = read_bio;
r10_bio-read_slot = r10_bio-devs[r10_bio-read_slot].devnum;
diff --git a/fs/buffer.c b/fs/buffer.c
index 58e2e7b..38d8793 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2893,7 +2893,6 @@ int submit_bh(int rw, struct buffer_head * bh)
bio-bi_io_vec[0].bv_offset = bh_offset(bh);
 
bio-bi_vcnt = 1;
-   bio-bi_idx = 0;
bio-bi_size = bh-b_size;
 
bio-bi_end_io = end_bio_bh_io_sync;
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 2eb952c..8ae5e35 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2004,7 +2004,6 @@ static int lbmRead(struct jfs_log * log, int pn, struct 
lbuf ** bpp)
bio-bi_io_vec[0].bv_offset = bp-l_offset;
 
bio-bi_vcnt = 1;
-   bio-bi_idx = 0;
bio-bi_size = LOGPSIZE;
 
bio-bi_end_io = lbmIODone;
@@ -2145,7 +2144,6 @@ static void lbmStartIO(struct lbuf * bp)
bio-bi_io_vec[0].bv_offset = bp-l_offset;
 
bio-bi_vcnt = 1;
-   bio-bi_idx = 0;
bio-bi_size = LOGPSIZE;
 
bio-bi_end_io = lbmIODone;
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index e784a21..550475c 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -32,7 +32,6 @@ static int sync_request(struct page *page, struct 
block_device *bdev, int rw)
bio_vec.bv_len = PAGE_SIZE;
bio_vec.bv_offset = 0;
bio.bi_vcnt = 1;
-   bio.bi_idx = 0;
bio.bi_size = PAGE_SIZE;
bio.bi_bdev = bdev;
bio.bi_sector = page-index * (PAGE_SIZE  9);
@@ -108,7 +107,6 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, 
pgoff_t index,
if (i = max_pages) {
/* Block layer cannot split bios :( */
bio-bi_vcnt = i;
-   bio-bi_idx = 0;
bio-bi_size = i * PAGE_SIZE;
bio-bi_bdev = super-s_bdev;
bio-bi_sector = ofs  9;
@@ -136,7 +134,6 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, 
pgoff_t index,
unlock_page(page);
}
bio-bi_vcnt = nr_pages;
-   bio-bi_idx = 0;
bio-bi_size = nr_pages * PAGE_SIZE;
bio-bi_bdev = super-s_bdev;
bio-bi_sector = ofs  9;
@@ -202,7 +199,6 @@ static int do_erase(struct super_block *sb, u64 ofs, 
pgoff_t index,
if (i = max_pages) {
/* Block layer cannot split bios :( */
bio-bi_vcnt = i;
-   bio-bi_idx = 0;
bio-bi_size = i * PAGE_SIZE;
bio-bi_bdev = super-s_bdev;
bio-bi_sector = ofs  9;
@@ -224,7 +220,6 @@ static int do_erase(struct super_block *sb, u64 ofs, 
pgoff_t index,
bio-bi_io_vec[i].bv_offset = 0;
}
bio-bi_vcnt = nr_pages;
-   bio-bi_idx = 0;
bio-bi_size = nr_pages * PAGE_SIZE;
bio-bi_bdev = super-s_bdev;
bio-bi_sector = ofs  9;
diff --git a/mm/page_io.c b/mm/page_io.c
index 78eee32..8d3c0c0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -35,7 +35,6 @@ static struct bio *get_swap_bio(gfp_t gfp_flags

[PATCH v2 05/26] block: Add bio_end()

2012-09-10 Thread Kent Overstreet
Just a little convenience macro - main reason to add it now is preparing
for immutable bio vecs, it'll reduce the size of the patch that puts
bi_sector/bi_size/bi_idx into a struct bvec_iter.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 block/blk-core.c  |  2 +-
 block/cfq-iosched.c   |  7 ++-
 block/deadline-iosched.c  |  2 +-
 drivers/block/drbd/drbd_req.c |  2 +-
 drivers/block/pktcdvd.c   |  6 +++---
 drivers/md/dm-stripe.c|  2 +-
 drivers/md/dm-verity.c|  2 +-
 drivers/md/faulty.c   |  6 ++
 drivers/md/linear.c   |  3 +--
 drivers/md/raid1.c|  4 ++--
 drivers/md/raid5.c| 14 +++---
 drivers/s390/block/dcssblk.c  |  3 +--
 fs/btrfs/extent_io.c  |  3 +--
 fs/gfs2/lops.c|  2 +-
 include/linux/bio.h   |  1 +
 15 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 55c833c9..97511cb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1557,7 +1557,7 @@ static void handle_bad_sector(struct bio *bio)
printk(KERN_INFO %s: rw=%ld, want=%Lu, limit=%Lu\n,
bdevname(bio-bi_bdev, b),
bio-bi_rw,
-   (unsigned long long)bio-bi_sector + bio_sectors(bio),
+   (unsigned long long)bio_end(bio),
(long long)(i_size_read(bio-bi_bdev-bd_inode)  9));
 
set_bit(BIO_EOF, bio-bi_flags);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index fb52df9..8eae0f3 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1883,11 +1883,8 @@ cfq_find_rq_fmerge(struct cfq_data *cfqd, struct bio 
*bio)
return NULL;
 
cfqq = cic_to_cfqq(cic, cfq_bio_sync(bio));
-   if (cfqq) {
-   sector_t sector = bio-bi_sector + bio_sectors(bio);
-
-   return elv_rb_find(cfqq-sort_list, sector);
-   }
+   if (cfqq)
+   return elv_rb_find(cfqq-sort_list, bio_end(bio));
 
return NULL;
 }
diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index 599b12e..a3b4df9 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -132,7 +132,7 @@ deadline_merge(struct request_queue *q, struct request 
**req, struct bio *bio)
 * check for front merge
 */
if (dd-front_merges) {
-   sector_t sector = bio-bi_sector + bio_sectors(bio);
+   sector_t sector = bio_end(bio);
 
__rq = elv_rb_find(dd-sort_list[bio_data_dir(bio)], sector);
if (__rq) {
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 01b2ac6..af69a96 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, struct 
bio *bio)
/* to make some things easier, force alignment of requests within the
 * granularity of our hash tables */
s_enr = bio-bi_sector  HT_SHIFT;
-   e_enr = bio-bi_size ? (bio-bi_sector+(bio-bi_size9)-1)  HT_SHIFT 
: s_enr;
+   e_enr = (bio_end(bio) - 1)  HT_SHIFT;
 
if (likely(s_enr == e_enr)) {
do {
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 2e7de7a..8df3216 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -901,7 +901,7 @@ static void pkt_iosched_process_queue(struct pktcdvd_device 
*pd)
pd-iosched.successive_reads += bio-bi_size  10;
else {
pd-iosched.successive_reads = 0;
-   pd-iosched.last_write = bio-bi_sector + 
bio_sectors(bio);
+   pd-iosched.last_write = bio_end(bio);
}
if (pd-iosched.successive_reads = HI_SPEED_SWITCH) {
if (pd-read_speed == pd-write_speed) {
@@ -2454,7 +2454,7 @@ static void pkt_make_request(struct request_queue *q, 
struct bio *bio)
zone = ZONE(bio-bi_sector, pd);
VPRINTK(pkt_make_request: start = %6llx stop = %6llx\n,
(unsigned long long)bio-bi_sector,
-   (unsigned long long)(bio-bi_sector + bio_sectors(bio)));
+   (unsigned long long)bio_end(bio));
 
/* Check if we have to split the bio */
{
@@ -2462,7 +2462,7 @@ static void pkt_make_request(struct request_queue *q, 
struct bio *bio)
sector_t last_zone;
int first_sectors;
 
-   last_zone = ZONE(bio-bi_sector + bio_sectors(bio) - 1, pd);
+   last_zone = ZONE(bio_end(bio) - 1, pd);
if (last_zone != zone) {
BUG_ON(last_zone != zone + pd-settings.size);
first_sectors = last_zone - bio-bi_sector;
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index a087bf2..047dd08 100644
--- a/drivers/md/dm

[PATCH v2 04/26] md: Convert md_trim_bio() to use bio_advance()

2012-09-10 Thread Kent Overstreet
Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de
---
 drivers/md/md.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7a2b079..51ce48c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -190,25 +190,16 @@ void md_trim_bio(struct bio *bio, int offset, int size)
struct bio_vec *bvec;
int sofar = 0;
 
-   size = 9;
if (offset == 0  size == bio-bi_size)
return;
 
-   bio-bi_sector += offset;
-   bio-bi_size = size;
-   offset = 9;
clear_bit(BIO_SEG_VALID, bio-bi_flags);
 
-   while (bio-bi_idx  bio-bi_vcnt 
-  bio-bi_io_vec[bio-bi_idx].bv_len = offset) {
-   /* remove this whole bio_vec */
-   offset -= bio-bi_io_vec[bio-bi_idx].bv_len;
-   bio-bi_idx++;
-   }
-   if (bio-bi_idx  bio-bi_vcnt) {
-   bio-bi_io_vec[bio-bi_idx].bv_offset += offset;
-   bio-bi_io_vec[bio-bi_idx].bv_len -= offset;
-   }
+   bio_advance(bio, offset  9);
+
+   size = 9;
+   bio-bi_size = size;
+
/* avoid any complications with bi_idx being non-zero*/
if (bio-bi_idx) {
memmove(bio-bi_io_vec, bio-bi_io_vec+bio-bi_idx,
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 02/26] block: Add bio_advance()

2012-09-10 Thread Kent Overstreet
This is prep work for immutable bio vecs; we first want to centralize
where bvecs are modified.

Next two patches convert some existing code to use this function.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
---
 fs/bio.c| 41 +
 include/linux/bio.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index 4783e31..07587c0 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -750,6 +750,47 @@ int bio_add_page(struct bio *bio, struct page *page, 
unsigned int len,
 }
 EXPORT_SYMBOL(bio_add_page);
 
+/**
+ * bio_advance - increment/complete a bio by some number of bytes
+ * @bio:   bio to advance
+ * @bytes: number of bytes to complete
+ *
+ * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
+ * complete doesn't align with a bvec boundary, then bv_len and bv_offset will
+ * be updated on the last bvec as well.
+ *
+ * @bio will then represent the remaining, uncompleted portion of the io.
+ */
+void bio_advance(struct bio *bio, unsigned bytes)
+{
+   if (bio_integrity(bio))
+   bio_integrity_advance(bio, bytes);
+
+   bio-bi_sector += bytes  0;
+   bio-bi_size -= bytes;
+
+   if (!bio-bi_size)
+   return;
+
+   while (bytes) {
+   if (unlikely(bio-bi_idx = bio-bi_vcnt)) {
+   printk(KERN_ERR %s: bio idx %d = vcnt %d\n,
+  __func__, bio-bi_idx, bio-bi_vcnt);
+   break;
+   }
+
+   if (bytes = bio_iovec(bio)-bv_len) {
+   bytes -= bio_iovec(bio)-bv_len;
+   bio-bi_idx++;
+   } else {
+   bio_iovec(bio)-bv_len -= bytes;
+   bio_iovec(bio)-bv_offset += bytes;
+   bytes = 0;
+   }
+   }
+}
+EXPORT_SYMBOL(bio_advance);
+
 struct bio_map_data {
struct bio_vec *iovecs;
struct sg_iovec *sgvecs;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7873465..6763cdf 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -248,6 +248,8 @@ extern void bio_endio(struct bio *, int);
 struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
 
+void bio_advance(struct bio *, unsigned);
+
 extern void bio_init(struct bio *);
 extern void bio_reset(struct bio *);
 
-- 
1.7.12

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 01/26] block: Convert integrity to bvec_alloc_bs(), and a bugfix

2012-09-10 Thread Kent Overstreet
This adds a pointer to the bvec array to struct bio_integrity_payload,
instead of the bvecs always being inline; then the bvecs are allocated
with bvec_alloc_bs().

This is needed eventually for immutable bio vecs - immutable bvecs
aren't useful if we still have to copy them, hence the need for the
pointer. Less code is always nice too, though.

Also fix an amusing bug in bio_integrity_split() - struct bio_pair
doesn't have the integrity bvecs after the bio_integrity_payloads, so
there was a buffer overrun. The code was confusing pointers with arrays.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: Martin K. Petersen martin.peter...@oracle.com
---
 fs/bio-integrity.c  | 124 +---
 include/linux/bio.h |   5 ++-
 2 files changed, 43 insertions(+), 86 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index a3f28f3..1d64f7f 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -27,48 +27,11 @@
 #include linux/workqueue.h
 #include linux/slab.h
 
-struct integrity_slab {
-   struct kmem_cache *slab;
-   unsigned short nr_vecs;
-   char name[8];
-};
-
-#define IS(x) { .nr_vecs = x, .name = bip-__stringify(x) }
-struct integrity_slab bip_slab[BIOVEC_NR_POOLS] __read_mostly = {
-   IS(1), IS(4), IS(16), IS(64), IS(128), IS(BIO_MAX_PAGES),
-};
-#undef IS
+#define BIP_INLINE_VECS4
 
+static struct kmem_cache *bip_slab;
 static struct workqueue_struct *kintegrityd_wq;
 
-static inline unsigned int vecs_to_idx(unsigned int nr)
-{
-   switch (nr) {
-   case 1:
-   return 0;
-   case 2 ... 4:
-   return 1;
-   case 5 ... 16:
-   return 2;
-   case 17 ... 64:
-   return 3;
-   case 65 ... 128:
-   return 4;
-   case 129 ... BIO_MAX_PAGES:
-   return 5;
-   default:
-   BUG();
-   }
-}
-
-static inline int use_bip_pool(unsigned int idx)
-{
-   if (idx == BIOVEC_MAX_IDX)
-   return 1;
-
-   return 0;
-}
-
 /**
  * bio_integrity_alloc - Allocate integrity payload and attach it to bio
  * @bio:   bio to attach integrity metadata to
@@ -84,37 +47,38 @@ struct bio_integrity_payload *bio_integrity_alloc(struct 
bio *bio,
  unsigned int nr_vecs)
 {
struct bio_integrity_payload *bip;
-   unsigned int idx = vecs_to_idx(nr_vecs);
struct bio_set *bs = bio-bi_pool;
+   unsigned long idx = BIO_POOL_NONE;
+   unsigned inline_vecs;
+
+   if (!bs) {
+   bip = kmalloc(sizeof(struct bio_integrity_payload) +
+ sizeof(struct bio_vec) * nr_vecs, gfp_mask);
+   inline_vecs = nr_vecs;
+   } else {
+   bip = mempool_alloc(bs-bio_integrity_pool, gfp_mask);
+   inline_vecs = BIP_INLINE_VECS;
+   }
 
-   if (!bs)
-   bs = fs_bio_set;
-
-   BUG_ON(bio == NULL);
-   bip = NULL;
+   if (unlikely(!bip))
+   return NULL;
 
-   /* Lower order allocations come straight from slab */
-   if (!use_bip_pool(idx))
-   bip = kmem_cache_alloc(bip_slab[idx].slab, gfp_mask);
+   memset(bip, 0, sizeof(struct bio_integrity_payload));
 
-   /* Use mempool if lower order alloc failed or max vecs were requested */
-   if (bip == NULL) {
-   idx = BIOVEC_MAX_IDX;  /* so we free the payload properly later 
*/
-   bip = mempool_alloc(bs-bio_integrity_pool, gfp_mask);
-
-   if (unlikely(bip == NULL)) {
-   printk(KERN_ERR %s: could not alloc bip\n, __func__);
-   return NULL;
-   }
+   if (nr_vecs  inline_vecs) {
+   bip-bip_vec = bvec_alloc_bs(gfp_mask, nr_vecs, idx, bs);
+   if (!bip-bip_vec)
+   goto err;
}
 
-   memset(bip, 0, sizeof(*bip));
-
bip-bip_slab = idx;
bip-bip_bio = bio;
bio-bi_integrity = bip;
 
return bip;
+err:
+   mempool_free(bip, bs-bio_integrity_pool);
+   return NULL;
 }
 EXPORT_SYMBOL(bio_integrity_alloc);
 
@@ -130,20 +94,19 @@ void bio_integrity_free(struct bio *bio)
struct bio_integrity_payload *bip = bio-bi_integrity;
struct bio_set *bs = bio-bi_pool;
 
-   if (!bs)
-   bs = fs_bio_set;
-
-   BUG_ON(bip == NULL);
-
/* A cloned bio doesn't own the integrity metadata */
if (!bio_flagged(bio, BIO_CLONED)  !bio_flagged(bio, BIO_FS_INTEGRITY)
 bip-bip_buf != NULL)
kfree(bip-bip_buf);
 
-   if (use_bip_pool(bip-bip_slab))
+   if (bs) {
+   if (bip-bip_slab != BIO_POOL_NONE)
+   bvec_free_bs(bs, bip-bip_vec, bip-bip_slab);
+
mempool_free(bip, bs-bio_integrity_pool);
-   else
-   kmem_cache_free(bip_slab[bip

Re: [dm-devel] [PATCH v10 8/8] block: Add bio_clone_bioset(), bio_clone_kmalloc()

2012-09-17 Thread Kent Overstreet
On Fri, Sep 14, 2012 at 10:50:59PM +0100, Alasdair G Kergon wrote:
 On Thu, Sep 06, 2012 at 03:35:02PM -0700, Kent Overstreet wrote:
  Previously, there was bio_clone() but it only allocated from the fs bio
  set; as a result various users were open coding it and using
  __bio_clone().
 
 Explain in the header the reasoning behind the change to dm-crypt so that 
 it no longer resets bi_idx to 0 too?
 
  -   clone-bi_idx = 0;

Previously, it was open coding __bio_clone(), that's what the setting
bi_idx to 0 was from. With the change to bio_clone_bioset() that's no
longer necessary (and dangerous, since bi_idx needs to be consistent
with bi_sector/bi_size).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v10 1/8] block: Generalized bio pool freeing

2012-09-17 Thread Kent Overstreet
On Fri, Sep 14, 2012 at 07:28:28PM +0100, Alasdair G Kergon wrote:
 On Thu, Sep 06, 2012 at 03:34:55PM -0700, Kent Overstreet wrote:
  With the old code, when you allocate a bio from a bio pool you have to
  implement your own destructor that knows how to find the bio pool the
  bio was originally allocated from.
  
  This adds a new field to struct bio (bi_pool) and changes
  bio_alloc_bioset() to use it. This makes various bio destructors
  unnecessary, so they're then deleted.
  
  v6: Explain the temporary if statement in bio_put
  
 This patch also silently reverts 
 commit 4d7b38b7d944a79da3793b6c92d38682f3905ac9
 dm: clear bi_end_io on remapping failure
 
 Why?
 
 If it's intentional, please explain it in your patch header and
 copy Hannes to reconsider the matter.

Never noticed that was introduced in its own patch until you pointed it
out.

That isn't a very good patch - it says it's clearing bi_end_io as a
precaution, but as a precaution to what?

As far as I can tell, it was never necessary. The bio is about to be
freed - there shouldn't be any other references on it (__bio_map() is
called on freshly allocated bios, and bio_get() is never called in dm.c)
Nothing else should've been looking at bi_end_io, certainly the
destructor didn't.

Now that there's no destructor, it makes even less sense to have it -
after that bio_put() that bio isn't being touched by dm code anymore.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v2 01/26] block: Convert integrity to bvec_alloc_bs(), and a bugfix

2012-09-17 Thread Kent Overstreet
On Wed, Sep 12, 2012 at 03:39:18PM -0400, Martin K. Petersen wrote:
  Kent == Kent Overstreet koverstr...@google.com writes:
 
 Kent,
 
 Kent To fix the bug first, I'd have to reorder struct bio_pair and then
 Kent just delete two lines of code from bio_integrity_split(). But the
 Kent reordering is unnecessary with the refactoring.
 
 Well, a bug is a bug and the fix needs to go into stable. So we will
 need a patch that does not depend on your changes.

Alright, good point.

 I don't have a problem with adding a pointer so clones can point to the
 parent's vector. But embedding the vector into the bip was a feature.
 If you check the git log you'll see that originally I did use separate
 vector allocations.

Looks like that was 7878cba9f0037f5599004b03a1260b32d9050360 - If I
follow your commit message your primary goal was to back the bip vecs by
a per bio set mempool?

I didn't break that (excepting the issue Vivek noted) - but it is true
that my patch adds another allocation (when nr_vecs  BIP_INLINE_VECS,
anyways).

I don't know how big of a deal you think that extra allocation is. If
you're against it, this patch isn't really necessary for the immutable
bvecs I'm working on - just need it if we want integrity bvecs to be
shared like regular bvecs will be.

Something else I noticed is bio_integrity_add_page() doesn't merge bvecs
when possible, like the regular bio_add_page(). If changing it to merge
bvecs wouldn't break anything, then probably most integrity bvecs would
be under BIP_INLINE_VECS.

Thoughts?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 25/26] bio-integrity: Add explicit field for owner of bip_buf

2012-09-17 Thread Kent Overstreet
On Wed, Sep 12, 2012 at 03:41:36PM -0400, Martin K. Petersen wrote:
  Kent == Kent Overstreet koverstr...@google.com writes:
 
 Kent This was the only real user of BIO_CLONED, which didn't have very
 Kent clear semantics. Convert to its own flag so we can get rid of
 Kent BIO_CLONED.
 
 I already have a patch in my queue that moves all integrity-relevant
 flags from struct bio to the bip. So I'm ok with removing BIO_CLONED but
 I'll send a separate patch for the integrity flags.

Cool - have a git repo I can take a look at?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/26] block: Add bio_advance()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 02:58:27PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:13PM -0700, Kent Overstreet wrote:
  +/**
  + * bio_advance - increment/complete a bio by some number of bytes
  + * @bio:   bio to advance
  + * @bytes: number of bytes to complete
  + *
  + * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
  + * complete doesn't align with a bvec boundary, then bv_len and bv_offset 
  will
  + * be updated on the last bvec as well.
  + *
  + * @bio will then represent the remaining, uncompleted portion of the io.
  + */
  +void bio_advance(struct bio *bio, unsigned bytes)
  +{
  +   if (bio_integrity(bio))
  +   bio_integrity_advance(bio, bytes);
  +
  +   bio-bi_sector += bytes  0;
 
 Hmmm bytes  0?

Whoops...

  +   bio-bi_size -= bytes;
  +
  +   if (!bio-bi_size)
  +   return;
  +
  +   while (bytes) {
  +   if (unlikely(bio-bi_idx = bio-bi_vcnt)) {
  +   printk(KERN_ERR %s: bio idx %d = vcnt %d\n,
 
 pr_err() is preferred but maybe WARN_ON_ONCE() is better fit here?
 This happening would be a bug, right?

I just cut and pasted that from blk_update_request(), which is what the
next patch refactors...

But yes it would be a bug. It gets converted to a BUG_ON() in a later
patch (not in this series), as this gets further abstracted into a
wrapper around bvec_advance_iter() which doesn't know about struct bio
(as bio integrity gets its own iterator).

Might drop it entirely, depending on what exactly I end up doing with
bi_vcnt...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 03/26] block: Refactor blk_update_request()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 04:20:00PM -0700, Tejun Heo wrote:
 Hello,
 
 On Mon, Sep 10, 2012 at 05:22:14PM -0700, Kent Overstreet wrote:
   static void req_bio_endio(struct request *rq, struct bio *bio,
unsigned int nbytes, int error)
   {
  +   /*
  +* XXX: bio_endio() does this. only need this because of the weird
  +* flush seq thing.
  +*/
  if (error)
  clear_bit(BIO_UPTODATE, bio-bi_flags);
  else if (!test_bit(BIO_UPTODATE, bio-bi_flags))
  error = -EIO;
 
 Isn't this also necessary to record errors on partial completions?

Ah yeah, you're right. Meant to delete that comment anyways.

 Other than that, I definitely like this.  It would be nice to note
 that the custom partial bio advancing in blk_update_request() is
 replaced with multiple calls to req_bio_endio().  I don't think it has
 any meaningful performance implications.  It's just nice to future
 readers of the commit.

The number of calls to req_bio_endio() isn't changing...
blk_update_request() called it for partial completions before. It's just
where the bio itself is updated that's getting shuffled around.

Or did you mean that bio_advance() is getting called on every bio
instead of the custom advancing in blk_update_request() before? That is
different, yeah - it's now always looping over the iovec, not just for
partial completions.

Yeah, I will note that in the commit message, in case Jens sees a
performance regression from it :)

 Also, it would be really nice if you can verify this actually works
 with partial blk_update_request().  sector update bug in the previous
 patch scares me a bit.  Implementing some debug hacks in the
 completion path might be the easiest way to verify.  A subtle bug here
 could be pretty painful.

Any suggestions on how to trigger partial updates?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/26] block: Add bio_advance()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 04:25:06PM -0700, Tejun Heo wrote:
 Hello,
 
 On Thu, Sep 20, 2012 at 04:13:08PM -0700, Kent Overstreet wrote:
  I just cut and pasted that from blk_update_request(), which is what the
  next patch refactors...
 
 Yeah, well, that was written when we didn't have WARNs.
 
  But yes it would be a bug. It gets converted to a BUG_ON() in a later
  patch (not in this series), as this gets further abstracted into a
  wrapper around bvec_advance_iter() which doesn't know about struct bio
  (as bio integrity gets its own iterator).
 
 WARN() generally preferable unless there's no way at all to continue.
 Storage layer could be a bit different if immediate danger for data
 corruption exists but the general consensus seems that we're too
 trigger happy with BUG_ON()s.

Yeah. Changed it to a WARN_ONCE().
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 05/26] block: Add bio_end()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 04:32:25PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:16PM -0700, Kent Overstreet wrote:
  Just a little convenience macro - main reason to add it now is preparing
  for immutable bio vecs, it'll reduce the size of the patch that puts
  bi_sector/bi_size/bi_idx into a struct bvec_iter.
  
  Signed-off-by: Kent Overstreet koverstr...@google.com
  CC: Jens Axboe ax...@kernel.dk
  diff --git a/include/linux/bio.h b/include/linux/bio.h
  index 6763cdf..92bff0e 100644
  --- a/include/linux/bio.h
  +++ b/include/linux/bio.h
  @@ -67,6 +67,7 @@
   #define bio_offset(bio)bio_iovec((bio))-bv_offset
   #define bio_segments(bio)  ((bio)-bi_vcnt - (bio)-bi_idx)
   #define bio_sectors(bio)   ((bio)-bi_size  9)
  +#define bio_end(bio)   ((bio)-bi_sector + bio_sectors(bio))
 
 Maybe bio_end_sector() is a better name?  bio_end() looks a bit too
 close to bio_endio().

Bit verbose for my tastes, but I tend to be more terse than most :P I'm
used to bio_end(), but I'll probably change it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 06/26] block: Use bio_sectors() more consistently

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 04:36:18PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:17PM -0700, Kent Overstreet wrote:
  diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
  index 321de7b..6e4420a 100644
  --- a/drivers/block/aoe/aoeblk.c
  +++ b/drivers/block/aoe/aoeblk.c
  @@ -199,7 +199,7 @@ aoeblk_make_request(struct request_queue *q, struct bio 
  *bio)
  buf-bio = bio;
  buf-resid = bio-bi_size;
  buf-sector = bio-bi_sector;
  -   buf-bv = bio-bi_io_vec[bio-bi_idx];
  +   buf-bv = bio_iovec(bio);
 
 Contamination?

Whoops, yes.

 Also, in general, please cc at least the maintainers of the files that
 you modify.

Meant to ask you about these patches that basically just rename things -
will do.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 03/26] block: Refactor blk_update_request()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 04:41:33PM -0700, Tejun Heo wrote:
 Hey,
 
 On Thu, Sep 20, 2012 at 04:36:32PM -0700, Kent Overstreet wrote:
   Other than that, I definitely like this.  It would be nice to note
   that the custom partial bio advancing in blk_update_request() is
   replaced with multiple calls to req_bio_endio().  I don't think it has
   any meaningful performance implications.  It's just nice to future
   readers of the commit.
  
  The number of calls to req_bio_endio() isn't changing...
  blk_update_request() called it for partial completions before. It's just
  where the bio itself is updated that's getting shuffled around.
 
  Or did you mean that bio_advance() is getting called on every bio
  instead of the custom advancing in blk_update_request() before? That is
  different, yeah - it's now always looping over the iovec, not just for
  partial completions.
  
  Yeah, I will note that in the commit message, in case Jens sees a
  performance regression from it :)
 
 I don't think there's any performance implication.  It's just nice to
 explain how the complexity went away.  If for nothing else, to point
 out how silly the original code was. :)

New patch below - that commit message have what you're after?

   Also, it would be really nice if you can verify this actually works
   with partial blk_update_request().  sector update bug in the previous
   patch scares me a bit.  Implementing some debug hacks in the
   completion path might be the easiest way to verify.  A subtle bug here
   could be pretty painful.
  
  Any suggestions on how to trigger partial updates?
 
 ide along with many legacy drivers do it.  Any SCSI driver including
 libata only does full completion.  I don't know.  Even just trying to
 call the function and comparing before  after with the original code
 would be good.  I'd like to see at least some form of verification
 because the manifested bugs could be extremely nasty and difficult to
 track down.

Multiple partial completions should have the same semantics as a single
full completion, so maybe I'll try rigging up some test code that wraps
blk_update_request(), turning full completions into partial completions,
and verifies stuff...


commit fef0ddc82214f87de71ec6fb051eb28a6de0be74
Author: Kent Overstreet koverstr...@google.com
Date:   Thu Sep 20 16:38:30 2012 -0700

block: Refactor blk_update_request()

Converts it to use bio_advance(), simplifying it quite a bit in the
process.

Note that req_bio_endio() now always calls bio_advance() - which means
it always loops over the biovec, not just on partial completions. Don't
expect it to affect performance, but worth noting.

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk

diff --git a/block/blk-core.c b/block/blk-core.c
index 2d739ca..9f8cb16 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -158,20 +158,10 @@ static void req_bio_endio(struct request *rq, struct bio 
*bio,
else if (!test_bit(BIO_UPTODATE, bio-bi_flags))
error = -EIO;
 
-   if (unlikely(nbytes  bio-bi_size)) {
-   printk(KERN_ERR %s: want %u bytes done, %u left\n,
-  __func__, nbytes, bio-bi_size);
-   nbytes = bio-bi_size;
-   }
-
if (unlikely(rq-cmd_flags  REQ_QUIET))
set_bit(BIO_QUIET, bio-bi_flags);
 
-   bio-bi_size -= nbytes;
-   bio-bi_sector += (nbytes  9);
-
-   if (bio_integrity(bio))
-   bio_integrity_advance(bio, nbytes);
+   bio_advance(bio, nbytes);
 
/* don't actually finish bio if it's part of flush sequence */
if (bio-bi_size == 0  !(rq-cmd_flags  REQ_FLUSH_SEQ))
@@ -2214,8 +2204,7 @@ EXPORT_SYMBOL(blk_fetch_request);
  **/
 bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 {
-   int total_bytes, bio_nbytes, next_idx = 0;
-   struct bio *bio;
+   int total_bytes;
 
if (!req-bio)
return false;
@@ -2259,56 +2248,21 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
 
blk_account_io_completion(req, nr_bytes);
 
-   total_bytes = bio_nbytes = 0;
-   while ((bio = req-bio) != NULL) {
-   int nbytes;
+   total_bytes = 0;
+   while (req-bio) {
+   struct bio *bio = req-bio;
+   unsigned bio_bytes = min(bio-bi_size, nr_bytes);
 
-   if (nr_bytes = bio-bi_size) {
+   if (bio_bytes == bio-bi_size)
req-bio = bio-bi_next;
-   nbytes = bio-bi_size;
-   req_bio_endio(req, bio, nbytes, error);
-   next_idx = 0;
-   bio_nbytes = 0;
-   } else {
-   int idx = bio-bi_idx + next_idx;
-
-   if (unlikely(idx = bio-bi_vcnt)) {
-   blk_dump_rq_flags(req, __end_that

Re: [PATCH v2 07/26] block: Don't use bi_idx in bio_split() or require it to be 0

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 04:45:44PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:18PM -0700, Kent Overstreet wrote:
  Prep work for immutable bio_vecs/efficient bio splitting: they require
  auditing and removing most uses of bi_idx.
  
  So here we convert bio_split() to respect the current value of bi_idx
  and use the bio_iovec() macro, instead of assuming bi_idx will be 0.
 
 I find the description a bit cryptic.

Yeah, I wasn't able to come up with a better description at the time...
how's this:

Change bio_split() to respect the current value of bi_idx

In the current code bio_split() won't be seeing partially completed bios
so this doesn't change any behaviour, but this makes the code a bit
clearer as to what bio_split() actually requires.

The immediate purpose of the patch is removing unnecessary bi_idx
references, but the end goal is to allow partial completed bios to be
submitted, which along with immutable biovecs enables effecient bio
splitting.


 
  Signed-off-by: Kent Overstreet koverstr...@google.com
  CC: Jens Axboe ax...@kernel.dk
  ---
   drivers/block/drbd/drbd_req.c | 6 +++---
   drivers/md/raid0.c| 3 +--
   drivers/md/raid10.c   | 3 +--
   fs/bio-integrity.c| 4 ++--
   fs/bio.c  | 7 +++
   5 files changed, 10 insertions(+), 13 deletions(-)
  
  diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
  index af69a96..57eb253 100644
  --- a/drivers/block/drbd/drbd_req.c
  +++ b/drivers/block/drbd/drbd_req.c
  @@ -1155,11 +1155,11 @@ void drbd_make_request(struct request_queue *q, 
  struct bio *bio)
   
  /* can this bio be split generically?
   * Maybe add our own split-arbitrary-bios function. */
  -   if (bio-bi_vcnt != 1 || bio-bi_idx != 0 || bio-bi_size  
  DRBD_MAX_BIO_SIZE) {
  +   if (bio_segments(bio) != 1 || bio-bi_size  DRBD_MAX_BIO_SIZE) {
  /* rather error out here than BUG in bio_split */
  dev_err(DEV, bio would need to, but cannot, be split: 
  -   (vcnt=%u,idx=%u,size=%u,sector=%llu)\n,
  -   bio-bi_vcnt, bio-bi_idx, bio-bi_size,
  +   (segments=%u,size=%u,sector=%llu)\n,
  +   bio_segments(bio), bio-bi_size,
  (unsigned long long)bio-bi_sector);
  bio_endio(bio, -EINVAL);
  } else {
  diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
  index 387cb89..0587450 100644
  --- a/drivers/md/raid0.c
  +++ b/drivers/md/raid0.c
  @@ -509,8 +509,7 @@ static void raid0_make_request(struct mddev *mddev, 
  struct bio *bio)
  sector_t sector = bio-bi_sector;
  struct bio_pair *bp;
  /* Sanity check -- queue functions should prevent this 
  happening */
  -   if (bio-bi_vcnt != 1 ||
  -   bio-bi_idx != 0)
  +   if (bio_segments(bio) != 1)
  goto bad_map;
  /* This is a one page bio that upper layers
   * refuse to split for us, so we need to split it.
  diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
  index 9715aaf..bbd08f5 100644
  --- a/drivers/md/raid10.c
  +++ b/drivers/md/raid10.c
  @@ -1081,8 +1081,7 @@ static void make_request(struct mddev *mddev, struct 
  bio * bio)
   || conf-prev.near_copies  conf-prev.raid_disks))) {
  struct bio_pair *bp;
  /* Sanity check -- queue functions should prevent this 
  happening */
  -   if (bio-bi_vcnt != 1 ||
  -   bio-bi_idx != 0)
  +   if (bio_segments(bio) != 1)
  goto bad_map;
  /* This is a one page bio that upper layers
   * refuse to split for us, so we need to split it.
 
 And wonder how the description applies to the above.
 
  --- a/fs/bio.c
  +++ b/fs/bio.c
  @@ -1616,8 +1616,7 @@ struct bio_pair *bio_split(struct bio *bi, int 
  first_sectors)
  trace_block_split(bdev_get_queue(bi-bi_bdev), bi,
  bi-bi_sector + first_sectors);
   
  -   BUG_ON(bi-bi_vcnt != 1);
  -   BUG_ON(bi-bi_idx != 0);
  +   BUG_ON(bio_segments(bi) != 1);
  atomic_set(bp-cnt, 3);
  bp-error = 0;
  bp-bio1 = *bi;
  @@ -1626,8 +1625,8 @@ struct bio_pair *bio_split(struct bio *bi, int 
  first_sectors)
  bp-bio2.bi_size -= first_sectors  9;
  bp-bio1.bi_size = first_sectors  9;
   
  -   bp-bv1 = bi-bi_io_vec[0];
  -   bp-bv2 = bi-bi_io_vec[0];
  +   bp-bv1 = *bio_iovec(bi);
  +   bp-bv2 = *bio_iovec(bi);
 
 This conflicts with a recent commit from Martin.  You probably wanna
 rebase.
 
 Thanks.
 
 -- 
 tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 08/26] block: Remove bi_idx references

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 04:49:53PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:19PM -0700, Kent Overstreet wrote:
  These were harmless but uneccessary,andt getting rid of them makes the
  code easier to audit since most of them need to be removed.
 
 I find the descriptions a bit too terse.  Why do they need to be
 removed?  So, I suppose you wanted to say explicit initializations to
 0 are unnecessary, but there are bio_segments() conversions too.
 
 The patch is simple and this isn't a big deal but I really hope for
 better (correct) descriptions.

It's because for the bvec iterator stuff and immutable bvecs, direct
bi_idx usage tends to be either wrong or unnecessary - I had to audit
all the uses in the kernel.

Reason for doing it now is a later patch moves bi_idx (also bi_sector
and bi_size) into a different struct - so doing these cleanup patches
first means a bit less code churn.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 15/26] block: Add bio_copy_data()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 05:06:32PM -0700, Tejun Heo wrote:
 Hello,
 
 On Mon, Sep 10, 2012 at 05:22:26PM -0700, Kent Overstreet wrote:
  +void bio_copy_data(struct bio *dst, struct bio *src)
  +{
 ...
  +   src_p = kmap_atomic(src_bv-bv_page);
  +   dst_p = kmap_atomic(dst_bv-bv_page);
  +
  +   memcpy(dst_p + dst_bv-bv_offset,
  +  src_p + src_bv-bv_offset,
  +  bytes);
  +
  +   kunmap_atomic(dst_p);
  +   kunmap_atomic(src_p);
 
 Wrap these in preempt_disable/enable() to allow the function to be
 called from any context?

I checked the implementation of kmap_atomic(), it already does
preempt_disable() so it's safe in process context - if I understand
correctly it needs local_irq_save()/restore() to be safe in any context
and I figured calling this from irq context is not the norm so that
should be the caller's responsibility.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 10/26] block: Add submit_bio_wait(), remove from md

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 04:56:43PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:21PM -0700, Kent Overstreet wrote:
  Random cleanup - this code was duplicated and it's not really specific
  to md.
  
  Also added the ability to return the actual error code.
  
  Signed-off-by: Kent Overstreet koverstr...@google.com
  CC: Jens Axboe ax...@kernel.dk
  CC: NeilBrown ne...@suse.de
 
 Acked-by: Tejun Heo t...@kernel.org
 
  --- a/include/linux/bio.h
  +++ b/include/linux/bio.h
  @@ -249,6 +249,7 @@ extern void bio_endio(struct bio *, int);
   struct request_queue;
   extern int bio_phys_segments(struct request_queue *, struct bio *);
   
  +extern int submit_bio_wait(int rw, struct bio *bio);
   void bio_advance(struct bio *, unsigned);
 
 Heh, this is one of the reasons why I don't like extern on function
 prototypes.  It's not necessary and people end up jumping between the
 two forms.  :(

Yeah, I dislike it too but I was trying to follow the style in that file
- I did fix bio_advance() a few minutes ago.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 15/26] block: Add bio_copy_data()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 05:09:47PM -0700, Tejun Heo wrote:
 On Thu, Sep 20, 2012 at 05:06:32PM -0700, Tejun Heo wrote:
  Hello,
  
  On Mon, Sep 10, 2012 at 05:22:26PM -0700, Kent Overstreet wrote:
   +void bio_copy_data(struct bio *dst, struct bio *src)
   +{
  ...
   + src_p = kmap_atomic(src_bv-bv_page);
   + dst_p = kmap_atomic(dst_bv-bv_page);
   +
   + memcpy(dst_p + dst_bv-bv_offset,
   +src_p + src_bv-bv_offset,
   +bytes);
   +
   + kunmap_atomic(dst_p);
   + kunmap_atomic(src_p);
  
  Wrap these in preempt_disable/enable() to allow the function to be
  called from any context?
 
 Ooh, and maybe return the amount of copied data?

Possibly, but I think I want to wait until a user needs it before adding
something like that.

From looking at other code that copies bio data, a parameter that
specifies the amount of data to be copied might be more useful.

I'm not sure I've seen all the places where bio data is copied yet, so
I've just been waiting until I find more uses to make it do more.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 19/26] bounce: Refactor __blk_queue_bounce to not use bi_io_vec

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 05:25:55PM -0700, Tejun Heo wrote:
 Hello, Kent.
 
 On Mon, Sep 10, 2012 at 05:22:30PM -0700, Kent Overstreet wrote:
  A bunch of what __blk_queue_bounce() was doing was problematic for the
  immutable bvec work; this cleans that up and the code is quite a bit
  smaller, too.
  
  The __bio_for_each_segment() in copy_to_high_bio_irq() was changed
  because that one's looping over the original bio, not the bounce bio -
  since the bounce code doesn't own that bio the __ version wasn't
  correct.
 
 I do like the new implementation.  I think the function is broken
 before and after tho.  Allocating from fs_bio_set from block layer is
 never safe and nothing seems to prevent multiple allocators compete in
 the bounce page mempool.  This will need a separate bioset and the
 multiple mempool allocation would have to be put inside a mutex.

Yeah, I should've at least made a note of that.

I should really add audit all uses of fs_bio_set to my todo list.

 Also, how was this tested?

Changed queue_bounce_pfn() to return 0, forcing all io to be bounced.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 19/26] bounce: Refactor __blk_queue_bounce to not use bi_io_vec

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 05:27:06PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:30PM -0700, Kent Overstreet wrote:
  A bunch of what __blk_queue_bounce() was doing was problematic for the
  immutable bvec work; this cleans that up and the code is quite a bit
  smaller, too.
  
  The __bio_for_each_segment() in copy_to_high_bio_irq() was changed
  because that one's looping over the original bio, not the bounce bio -
  since the bounce code doesn't own that bio the __ version wasn't
  correct.
 
 Also, I can't understand the above at all.  I can think why it
 wouldn't be necessary but why is it wrong because bounce code doesn't
 own it?

Another prep work thing - in current code, it isn't really wrong
(slightly inconsistent though).

But the idea is that anything that doesn't own the bio shouldn't assume
anything about bi_idx; the bounce code should loop over the bio starting
from wherever it was when the bio got to the bounce code, not the start
of the bio.

A later patch makes this clearer - __bio_for_each_segment() gets removed
in favor of bio_for_each_segment_all(), and it documents that
bio_for_each_segment_all() is only for code that owns the bio.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 21/26] block: Convert some code to bio_for_each_segment_all()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 05:38:32PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:32PM -0700, Kent Overstreet wrote:
  A few places in the code were either open coding or using the wrong
  version - fix.
  
  Signed-off-by: Kent Overstreet koverstr...@google.com
  CC: Jens Axboe ax...@kernel.dk
  CC: NeilBrown ne...@suse.de
  ---
  --- a/drivers/md/raid1.c
  +++ b/drivers/md/raid1.c
  @@ -921,7 +921,7 @@ static void alloc_behind_pages(struct bio *bio, struct 
  r1bio *r1_bio)
  if (unlikely(!bvecs))
  return;
   
  -   bio_for_each_segment(bvec, bio, i) {
  +   bio_for_each_segment_all(bvec, bio, i) {
 
 I don't get this conversion.  Why is this necessary?

Not necessary, just a consistency thing - this bio is a clone that md
owns (and the clone was trimmed, so we know bi_idx is 0).

Also, it wasn't an issue here but after the patch that introduces the
bvec iter it's no longer possible to modify the biovec through
bio_for_each_segment_all() - it doesn't increment a pointer to the
current bvec, you pass in a struct bio_vec (not a pointer) which is
updated with what the current biovec would be (taking into account
bi_bvec_done and bi_size).

So because of that it is IMO more worthwhile to be consistent about
bio_for_each_segment()/bio_for_each_segment_all() usage.

Suppose I should stick all that in the patch description.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 22/26] block: Add bio_alloc_pages()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 05:47:11PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:33PM -0700, Kent Overstreet wrote:
  +   bio_for_each_segment_all(bv, bio, i) {
  +   bv-bv_page = alloc_page(gfp_mask);
  +   if (!bv-bv_page) {
  +   while (bv-- != bio-bi_io_vec)
  +   __free_page(bv-bv_page);
 
 I don't know.  I feel stupid.  I think it's because the loop variable
 changes between loop condition test and actual body of loop.  How
 about the following?  It is pointing to the member of the same array
 so I think it's not even violating pointer comparison rules.
 
   while (--bv = bio-bi_io_vec)
   __free_page(bv-bv_page);

I can't remember why I did it that way, but I think I like yours better
- I'll change it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 23/26] raid1: use bio_alloc_pages()

2012-09-20 Thread Kent Overstreet
On Thu, Sep 20, 2012 at 05:48:27PM -0700, Tejun Heo wrote:
 On Mon, Sep 10, 2012 at 05:22:34PM -0700, Kent Overstreet wrote:
  Signed-off-by: Kent Overstreet koverstr...@google.com
  CC: Jens Axboe ax...@kernel.dk
  CC: NeilBrown ne...@suse.de
 
 I think it's better to merge this and the previous patch.  It's not
 like we're converting a lot of users.

Ok, will do.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 12/26] raid1: use bio_reset()

2012-09-11 Thread Kent Overstreet
On Tue, Sep 11, 2012 at 02:59:13PM +1000, NeilBrown wrote:
 On Mon, 10 Sep 2012 17:22:23 -0700 Kent Overstreet koverstr...@google.com
 wrote:
 
  I couldn't figure out what sbio-bi_end_io in process_checks() was
  supposed to be, so I took the easy way out.
 
 Almost.
 You save 'sbio-bi_end_io' to 'bi_end_io', then do nothing with it...

Whoops :) I think I must've gotten distracted and forgot to finish with
that patch, I wasn't setting bi_private either.

 
 A little way above the 'fixup the bio for reuse' comment you'll find:
 
   struct bio *sbio = r1_bio-bios[i];
 
   if (r1_bio-bios[i]-bi_end_io != end_sync_read)
   continue;
 
 which implies that if we don't 'continue', then sbio-bi_end_io ==
 end_sync_read.

Ahh. I remember reading that, but I missed that that was sbio that was
being checked.

 
 So I suspect you want to add
 sbio-bi_end_io = end_sync_read;
 somewhere after the 'bio_reset()'.
 
 If you happened to also fix that 'if' that I quoted so that it reads:
 
  if (sbio-bi_end_io != end_sync_read)
continue;

Will do! How's this look?


commit 40a4645a4346edd040066baedcf2184ac4211ba7
Author: Kent Overstreet koverstr...@google.com
Date:   Tue Sep 11 11:26:12 2012 -0700

raid1: use bio_reset()

Signed-off-by: Kent Overstreet koverstr...@google.com
CC: Jens Axboe ax...@kernel.dk
CC: NeilBrown ne...@suse.de

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ee85154..df68691 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1851,7 +1851,7 @@ static int process_checks(struct r1bio *r1_bio)
struct bio *sbio = r1_bio-bios[i];
int size;
 
-   if (r1_bio-bios[i]-bi_end_io != end_sync_read)
+   if (sbio-bi_end_io != end_sync_read)
continue;
 
if (test_bit(BIO_UPTODATE, sbio-bi_flags)) {
@@ -1876,16 +1876,15 @@ static int process_checks(struct r1bio *r1_bio)
continue;
}
/* fixup the bio for reuse */
+   bio_reset(sbio);
sbio-bi_vcnt = vcnt;
sbio-bi_size = r1_bio-sectors  9;
-   sbio-bi_idx = 0;
-   sbio-bi_phys_segments = 0;
-   sbio-bi_flags = ~(BIO_POOL_MASK - 1);
-   sbio-bi_flags |= 1  BIO_UPTODATE;
-   sbio-bi_next = NULL;
sbio-bi_sector = r1_bio-sector +
conf-mirrors[i].rdev-data_offset;
sbio-bi_bdev = conf-mirrors[i].rdev-bdev;
+   sbio-bi_end_io = end_sync_read;
+   sbio-bi_private = r1_bio;
+
size = sbio-bi_size;
for (j = 0; j  vcnt ; j++) {
struct bio_vec *bi;
@@ -2426,18 +2425,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr, int *skipp
for (i = 0; i  conf-raid_disks * 2; i++) {
struct md_rdev *rdev;
bio = r1_bio-bios[i];
-
-   /* take from bio_init */
-   bio-bi_next = NULL;
-   bio-bi_flags = ~(BIO_POOL_MASK-1);
-   bio-bi_flags |= 1  BIO_UPTODATE;
-   bio-bi_rw = READ;
-   bio-bi_vcnt = 0;
-   bio-bi_idx = 0;
-   bio-bi_phys_segments = 0;
-   bio-bi_size = 0;
-   bio-bi_end_io = NULL;
-   bio-bi_private = NULL;
+   bio_reset(bio);
 
rdev = rcu_dereference(conf-mirrors[i].rdev);
if (rdev == NULL ||
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 13/26] raid5: use bio_reset()

2012-09-11 Thread Kent Overstreet
On Tue, Sep 11, 2012 at 03:03:26PM +1000, NeilBrown wrote:
 On Mon, 10 Sep 2012 17:22:24 -0700 Kent Overstreet koverstr...@google.com
 wrote:
 
  Had to shuffle the code around a bit (where bi_rw and bi_end_io were
  set), but shouldn't really be anything tricky here
  
  Signed-off-by: Kent Overstreet koverstr...@google.com
  CC: Jens Axboe ax...@kernel.dk
  CC: NeilBrown ne...@suse.de
  ---
   drivers/md/raid5.c | 28 ++--
   1 file changed, 14 insertions(+), 14 deletions(-)
  
  diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
  index 7c19dbe..ebe43f7 100644
  --- a/drivers/md/raid5.c
  +++ b/drivers/md/raid5.c
  @@ -561,14 +561,6 @@ static void ops_run_io(struct stripe_head *sh, struct 
  stripe_head_state *s)
  bi = sh-dev[i].req;
  rbi = sh-dev[i].rreq; /* For writing to replacement */
   
  -   bi-bi_rw = rw;
  -   rbi-bi_rw = rw;
  -   if (rw  WRITE) {
  -   bi-bi_end_io = raid5_end_write_request;
  -   rbi-bi_end_io = raid5_end_write_request;
  -   } else
  -   bi-bi_end_io = raid5_end_read_request;
  -
  rcu_read_lock();
  rrdev = rcu_dereference(conf-disks[i].replacement);
  smp_mb(); /* Ensure that if rrdev is NULL, rdev won't be */
  @@ -643,7 +635,14 @@ static void ops_run_io(struct stripe_head *sh, struct 
  stripe_head_state *s)
   
  set_bit(STRIPE_IO_STARTED, sh-state);
   
  +   bio_reset(bi);
  bi-bi_bdev = rdev-bdev;
  +   bi-bi_rw = rw;
  +   bi-bi_end_io = (rw  WRITE)
  +   ? raid5_end_write_request
  +   : raid5_end_read_request;
  +   bi-bi_private = sh;
  +
  pr_debug(%s: for %llu schedule op %ld on disc %d\n,
  __func__, (unsigned long long)sh-sector,
  bi-bi_rw, i);
  @@ -657,12 +656,9 @@ static void ops_run_io(struct stripe_head *sh, struct 
  stripe_head_state *s)
  if (test_bit(R5_ReadNoMerge, sh-dev[i].flags))
  bi-bi_rw |= REQ_FLUSH;
   
  -   bi-bi_flags = 1  BIO_UPTODATE;
  -   bi-bi_idx = 0;
  bi-bi_io_vec[0].bv_len = STRIPE_SIZE;
  bi-bi_io_vec[0].bv_offset = 0;
  bi-bi_size = STRIPE_SIZE;
  -   bi-bi_next = NULL;
  if (rrdev)
  set_bit(R5_DOUBLE_LOCKED, sh-dev[i].flags);
  generic_make_request(bi);
  @@ -674,7 +670,14 @@ static void ops_run_io(struct stripe_head *sh, struct 
  stripe_head_state *s)
   
  set_bit(STRIPE_IO_STARTED, sh-state);
   
  +   bio_reset(rbi);
  rbi-bi_bdev = rrdev-bdev;
  +   rbi-bi_rw = rw;
  +   rbi-bi_end_io = (rw  WRITE)
  +   ? raid5_end_write_request
  +   : raid5_end_read_request;
 
 'rbi-bi_end_io' can only ever be raid5_end_write_request.  We only get here
 on a write.
 I'd be OK with 
 BUG_ON(!(rw  WRITE));
 but I don't want the condition in the assignment.

I was thinking to myself that if I was doing a bit more with that code,
I'd factor out all the code in the if (rdev) {} into a separate function
that was called twice there. But, I didn't do that here so you're right
- I'll change it and stick a BUG_ON() there.

 The rest looks quite sane.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >