from:"Ming Lin"

Re: [RFC PATCH 1/1] nbd: replace kill_bdev() with __invalidate_device()

2017-03-23 Thread Ming Lin

On Thu, Mar 23, 2017 at 10:51 AM, Josef Bacik  wrote:
> Yeah I think this is ok, I’ll throw it on my queue for fixes for this cycle.  
> Thanks,

Great. Thanks.

>
> Josef

Re: [RFC PATCH 1/1] nbd: replace kill_bdev() with __invalidate_device()

2017-03-23 Thread Ming Lin

On Thu, Mar 23, 2017 at 10:51 AM, Josef Bacik  wrote:
> Yeah I think this is ok, I’ll throw it on my queue for fixes for this cycle.  
> Thanks,

Great. Thanks.

>
> Josef

Re: [Nbd] [RFC PATCH 1/1] nbd: replace kill_bdev() with __invalidate_device()

2017-03-22 Thread Ming Lin

On Mon, Mar 20, 2017 at 3:58 PM, Ming Lin <m...@kernel.org> wrote:
> From: Ratna Manoj Bolla <manoj...@gmail.com>
>
> When a filesystem is mounted on a nbd device and on a disconnect, because
> of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
> getting destroyed under mounted filesystem.
>
> After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
> followed by a sys_umount(),
> generic_shutdown_super()->...
> ->__sync_blockdev()->...
> -blkdev_writepages()->...
> ->do_invalidatepage()->...
> -discard_buffer()   is discarding superblock buffer_head assumed
> to be in mapped state by ext4_commit_super().
>
> [mlin: ported to 4.11-rc2]
> Signed-off-by: Ratna Manoj Bolla <manoj...@gmail.com
> ---
>  drivers/block/nbd.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index cb4ccfc..a6a3643 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -125,7 +125,8 @@ static const char *nbdcmd_to_ascii(int cmd)
>
>  static int nbd_size_clear(struct nbd_device *nbd, struct block_device *bdev)
>  {
> -   bd_set_size(bdev, 0);
> +   if (bdev->bd_openers <= 1)
> +   bd_set_size(bdev, 0);
> set_capacity(nbd->disk, 0);
> kobject_uevent(_to_dev(nbd)->kobj, KOBJ_CHANGE);
>
> @@ -603,6 +604,8 @@ static void nbd_reset(struct nbd_device *nbd)
>
>  static void nbd_bdev_reset(struct block_device *bdev)
>  {
> +   if (bdev->bd_openers > 1)
> +   return;
> set_device_ro(bdev, false);
> bdev->bd_inode->i_size = 0;
> if (max_part > 0) {
> @@ -666,7 +669,8 @@ static int nbd_clear_sock(struct nbd_device *nbd, struct 
> block_device *bdev)
>  {
> sock_shutdown(nbd);
> nbd_clear_que(nbd);
> -   kill_bdev(bdev);
> +
> +   __invalidate_device(bdev, true);
> nbd_bdev_reset(bdev);
> /*
>  * We want to give the run thread a chance to wait for everybody
> --
> 1.8.3.1

Hi Josef,

Any comments?

Thanks,
Ming

Re: [Nbd] [RFC PATCH 1/1] nbd: replace kill_bdev() with __invalidate_device()

2017-03-22 Thread Ming Lin

On Mon, Mar 20, 2017 at 3:58 PM, Ming Lin  wrote:
> From: Ratna Manoj Bolla 
>
> When a filesystem is mounted on a nbd device and on a disconnect, because
> of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
> getting destroyed under mounted filesystem.
>
> After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
> followed by a sys_umount(),
> generic_shutdown_super()->...
> ->__sync_blockdev()->...
> -blkdev_writepages()->...
> ->do_invalidatepage()->...
> -discard_buffer()   is discarding superblock buffer_head assumed
> to be in mapped state by ext4_commit_super().
>
> [mlin: ported to 4.11-rc2]
> Signed-off-by: Ratna Manoj Bolla  ---
>  drivers/block/nbd.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index cb4ccfc..a6a3643 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -125,7 +125,8 @@ static const char *nbdcmd_to_ascii(int cmd)
>
>  static int nbd_size_clear(struct nbd_device *nbd, struct block_device *bdev)
>  {
> -   bd_set_size(bdev, 0);
> +   if (bdev->bd_openers <= 1)
> +   bd_set_size(bdev, 0);
> set_capacity(nbd->disk, 0);
> kobject_uevent(_to_dev(nbd)->kobj, KOBJ_CHANGE);
>
> @@ -603,6 +604,8 @@ static void nbd_reset(struct nbd_device *nbd)
>
>  static void nbd_bdev_reset(struct block_device *bdev)
>  {
> +   if (bdev->bd_openers > 1)
> +   return;
> set_device_ro(bdev, false);
> bdev->bd_inode->i_size = 0;
> if (max_part > 0) {
> @@ -666,7 +669,8 @@ static int nbd_clear_sock(struct nbd_device *nbd, struct 
> block_device *bdev)
>  {
> sock_shutdown(nbd);
> nbd_clear_que(nbd);
> -   kill_bdev(bdev);
> +
> +   __invalidate_device(bdev, true);
> nbd_bdev_reset(bdev);
> /*
>  * We want to give the run thread a chance to wait for everybody
> --
> 1.8.3.1

Hi Josef,

Any comments?

Thanks,
Ming

[RFC PATCH 1/1] nbd: replace kill_bdev() with __invalidate_device()

2017-03-20 Thread Ming Lin

From: Ratna Manoj Bolla 

When a filesystem is mounted on a nbd device and on a disconnect, because
of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
getting destroyed under mounted filesystem.

After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
followed by a sys_umount(),
generic_shutdown_super()->...
->__sync_blockdev()->...
-blkdev_writepages()->...
->do_invalidatepage()->...
-discard_buffer()   is discarding superblock buffer_head assumed
to be in mapped state by ext4_commit_super().

[mlin: ported to 4.11-rc2]
Signed-off-by: Ratna Manoj Bolla bd_openers <= 1)
+   bd_set_size(bdev, 0);
set_capacity(nbd->disk, 0);
kobject_uevent(_to_dev(nbd)->kobj, KOBJ_CHANGE);
 
@@ -603,6 +604,8 @@ static void nbd_reset(struct nbd_device *nbd)
 
 static void nbd_bdev_reset(struct block_device *bdev)
 {
+   if (bdev->bd_openers > 1)
+   return;
set_device_ro(bdev, false);
bdev->bd_inode->i_size = 0;
if (max_part > 0) {
@@ -666,7 +669,8 @@ static int nbd_clear_sock(struct nbd_device *nbd, struct 
block_device *bdev)
 {
sock_shutdown(nbd);
nbd_clear_que(nbd);
-   kill_bdev(bdev);
+
+   __invalidate_device(bdev, true);
nbd_bdev_reset(bdev);
/*
 * We want to give the run thread a chance to wait for everybody
-- 
1.8.3.1

[RFC PATCH 0/1] nbd: fix crash when unmaping nbd device with fs still mounted

2017-03-20 Thread Ming Lin

Hi all,

I run into a BUG_ON(!buffer_mapped(bh)) crash with below script.

 $ rbd-nbd map mypool/myimg
 $ mkfs.ext4 /dev/nbd0
 $ mount /dev/nbd0 /mnt/
 $ rbd-nbd unmap /dev/nbd0
 $ umount /mnt

[ 1248.870131] kernel BUG at /home/mlin/linux/fs/buffer.c:3103!
[ 1248.871214] invalid opcode:  [#1] SMP
[ 1248.879468] CPU: 0 PID: 2450 Comm: umount Tainted: GE   
4.11.0-rc2+ #2
[ 1248.896579] Call Trace:
[ 1248.897056]  __sync_dirty_buffer+0x6e/0xe0
[ 1248.897870]  ext4_commit_super+0x1eb/0x290 [ext4]
[ 1248.898795]  ext4_put_super+0x2fa/0x3c0 [ext4]
[ 1248.899662]  generic_shutdown_super+0x6f/0x100
[ 1248.900525]  kill_block_super+0x27/0x70
[ 1248.901257]  deactivate_locked_super+0x43/0x70
[ 1248.902112]  deactivate_super+0x46/0x60
[ 1248.902869]  cleanup_mnt+0x3f/0x80
[ 1248.903526]  __cleanup_mnt+0x12/0x20
[ 1248.904218]  task_work_run+0x83/0xb0
[ 1248.904941]  exit_to_usermode_loop+0x59/0x7b
[ 1248.905769]  do_syscall_64+0x165/0x180
[ 1248.907603]  entry_SYSCALL64_slow_path+0x25/0x25

Last year, Ratna posted a patch to fix it.
https://lkml.org/lkml/2016/4/20/257

Ratna's script to reproduce the bug.

 $ qemu-img create -f qcow2 f.img 1G
 $ mkfs.ext4 f.img
 $ qemu-nbd -c /dev/nbd0 f.img
 $ mount /dev/nbd0 dir
 $ killall -KILL qemu-nbd
 $ sleep 1
 $ ls dir
 $ umount dir

I ported Rantna's patch to 4.11-rc2 and confirmed that it fixes the crash.

Jan Kara had some comments about this bug:
http://www.kernelhub.org/?p=2=361407

I hope to fix this bug in the upstream kernel first and then back port it to 
our production system.

Please see "PATCH 1/1" for detail.

Thanks,
Ming

[RFC PATCH 1/1] nbd: replace kill_bdev() with __invalidate_device()

2017-03-20 Thread Ming Lin

From: Ratna Manoj Bolla 

When a filesystem is mounted on a nbd device and on a disconnect, because
of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
getting destroyed under mounted filesystem.

After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
followed by a sys_umount(),
generic_shutdown_super()->...
->__sync_blockdev()->...
-blkdev_writepages()->...
->do_invalidatepage()->...
-discard_buffer()   is discarding superblock buffer_head assumed
to be in mapped state by ext4_commit_super().

[mlin: ported to 4.11-rc2]
Signed-off-by: Ratna Manoj Bolla bd_openers <= 1)
+   bd_set_size(bdev, 0);
set_capacity(nbd->disk, 0);
kobject_uevent(_to_dev(nbd)->kobj, KOBJ_CHANGE);
 
@@ -603,6 +604,8 @@ static void nbd_reset(struct nbd_device *nbd)
 
 static void nbd_bdev_reset(struct block_device *bdev)
 {
+   if (bdev->bd_openers > 1)
+   return;
set_device_ro(bdev, false);
bdev->bd_inode->i_size = 0;
if (max_part > 0) {
@@ -666,7 +669,8 @@ static int nbd_clear_sock(struct nbd_device *nbd, struct 
block_device *bdev)
 {
sock_shutdown(nbd);
nbd_clear_que(nbd);
-   kill_bdev(bdev);
+
+   __invalidate_device(bdev, true);
nbd_bdev_reset(bdev);
/*
 * We want to give the run thread a chance to wait for everybody
-- 
1.8.3.1

[RFC PATCH 0/1] nbd: fix crash when unmaping nbd device with fs still mounted

2017-03-20 Thread Ming Lin

Hi all,

I run into a BUG_ON(!buffer_mapped(bh)) crash with below script.

 $ rbd-nbd map mypool/myimg
 $ mkfs.ext4 /dev/nbd0
 $ mount /dev/nbd0 /mnt/
 $ rbd-nbd unmap /dev/nbd0
 $ umount /mnt

[ 1248.870131] kernel BUG at /home/mlin/linux/fs/buffer.c:3103!
[ 1248.871214] invalid opcode:  [#1] SMP
[ 1248.879468] CPU: 0 PID: 2450 Comm: umount Tainted: GE   
4.11.0-rc2+ #2
[ 1248.896579] Call Trace:
[ 1248.897056]  __sync_dirty_buffer+0x6e/0xe0
[ 1248.897870]  ext4_commit_super+0x1eb/0x290 [ext4]
[ 1248.898795]  ext4_put_super+0x2fa/0x3c0 [ext4]
[ 1248.899662]  generic_shutdown_super+0x6f/0x100
[ 1248.900525]  kill_block_super+0x27/0x70
[ 1248.901257]  deactivate_locked_super+0x43/0x70
[ 1248.902112]  deactivate_super+0x46/0x60
[ 1248.902869]  cleanup_mnt+0x3f/0x80
[ 1248.903526]  __cleanup_mnt+0x12/0x20
[ 1248.904218]  task_work_run+0x83/0xb0
[ 1248.904941]  exit_to_usermode_loop+0x59/0x7b
[ 1248.905769]  do_syscall_64+0x165/0x180
[ 1248.907603]  entry_SYSCALL64_slow_path+0x25/0x25

Last year, Ratna posted a patch to fix it.
https://lkml.org/lkml/2016/4/20/257

Ratna's script to reproduce the bug.

 $ qemu-img create -f qcow2 f.img 1G
 $ mkfs.ext4 f.img
 $ qemu-nbd -c /dev/nbd0 f.img
 $ mount /dev/nbd0 dir
 $ killall -KILL qemu-nbd
 $ sleep 1
 $ ls dir
 $ umount dir

I ported Rantna's patch to 4.11-rc2 and confirmed that it fixes the crash.

Jan Kara had some comments about this bug:
http://www.kernelhub.org/?p=2=361407

I hope to fix this bug in the upstream kernel first and then back port it to 
our production system.

Please see "PATCH 1/1" for detail.

Thanks,
Ming

Re: [PATCH 4/5] nvmet-rdma: add a NVMe over Fabrics RDMA target driver

2016-06-09 Thread Ming Lin

On Thu, Jun 9, 2016 at 2:42 PM, Steve Wise  wrote:

> Should the above error path actually goto a block that frees the rsps?  Like
> this?
>
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index c184ee5..8aaa36f 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1053,7 +1053,7 @@ nvmet_rdma_alloc_queue(struct nvmet_rdma_device *ndev,
> !queue->host_qid);
> if (IS_ERR(queue->cmds)) {
> ret = NVME_RDMA_CM_NO_RSC;
> -   goto out_free_cmds;
> +   goto out_free_responses;
> }
> }
>
> @@ -1073,6 +1073,8 @@ out_free_cmds:
> queue->recv_queue_size,
> !queue->host_qid);
> }
> +out_free_responses:
> +nvmet_rdma_free_rsps(queue);
>  out_ida_remove:
> ida_simple_remove(_rdma_queue_ida, queue->idx);
>  out_destroy_sq:

Yes. Nice catch.

Re: [PATCH 4/5] nvmet-rdma: add a NVMe over Fabrics RDMA target driver

2016-06-09 Thread Ming Lin

On Thu, Jun 9, 2016 at 2:42 PM, Steve Wise  wrote:

> Should the above error path actually goto a block that frees the rsps?  Like
> this?
>
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index c184ee5..8aaa36f 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1053,7 +1053,7 @@ nvmet_rdma_alloc_queue(struct nvmet_rdma_device *ndev,
> !queue->host_qid);
> if (IS_ERR(queue->cmds)) {
> ret = NVME_RDMA_CM_NO_RSC;
> -   goto out_free_cmds;
> +   goto out_free_responses;
> }
> }
>
> @@ -1073,6 +1073,8 @@ out_free_cmds:
> queue->recv_queue_size,
> !queue->host_qid);
> }
> +out_free_responses:
> +nvmet_rdma_free_rsps(queue);
>  out_ida_remove:
> ida_simple_remove(_rdma_queue_ida, queue->idx);
>  out_destroy_sq:

Yes. Nice catch.

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin

On Tue, 2016-06-07 at 22:49 -0600, Jens Axboe wrote:
> On 06/06/2016 03:21 PM, Christoph Hellwig wrote:
> > From: Ming Lin <min...@ssi.samsung.com>
> > 
> > For some protocols like NVMe over Fabrics we need to be able to
> > send
> > initialization commands to a specific queue.
> > 
> > Based on an earlier patch from Christoph Hellwig <h...@lst.de>.
> > 
> > Signed-off-by: Ming Lin <min...@ssi.samsung.com>
> > Signed-off-by: Christoph Hellwig <h...@lst.de>
> > ---
> >   block/blk-mq.c | 33 +
> >   include/linux/blk-mq.h |  2 ++
> >   2 files changed, 35 insertions(+)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 29cbc1b..7bb45ed 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -266,6 +266,39 @@ struct request *blk_mq_alloc_request(struct
> > request_queue *q, int rw,
> >   }
> >   EXPORT_SYMBOL(blk_mq_alloc_request);
> > 
> > +struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> > int rw,
> > +   unsigned int flags, unsigned int hctx_idx)
> > +{
> > +   struct blk_mq_hw_ctx *hctx;
> > +   struct blk_mq_ctx *ctx;
> > +   struct request *rq;
> > +   struct blk_mq_alloc_data alloc_data;
> > +   int ret;
> > +
> > +   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
> > +   if (ret)
> > +   return ERR_PTR(ret);
> > +
> > +   hctx = q->queue_hw_ctx[hctx_idx];
> > +   ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));
> > +
> > +   blk_mq_set_alloc_data(_data, q, flags, ctx, hctx);
> > +
> > +   rq = __blk_mq_alloc_request(_data, rw);
> > +   if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
> > +   __blk_mq_run_hw_queue(hctx);
> > +
> > +   rq =  __blk_mq_alloc_request(_data, rw);
> > +   }
> 
> Why are we duplicating this code here? If NOWAIT isn't set, then
> we'll
> always return a request. bt_get() will run the queue for us, if it
> needs
> to. blk_mq_alloc_request() does this too, and I'm guessing that code
> was
> just copied. I'll fix that up. Looks like this should just be:
> 
>   rq = __blk_mq_alloc_request(_data, rw);
>   if (rq)
>   return rq;
> 
>   blk_queue_exit(q);
>   return ERR_PTR(-EWOULDBLOCK);
> 
> for this case.

Yes,

But the bt_get() reminds me that this patch actually has a problem.

blk_mq_alloc_request_hctx() ->
  __blk_mq_alloc_request() ->
    blk_mq_get_tag() -> 
      __blk_mq_get_tag() ->
        bt_get() ->
          blk_mq_put_ctx(data->ctx);

Here are blk_mq_get_ctx() and blk_mq_put_ctx().

static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
{   
return __blk_mq_get_ctx(q, get_cpu());
} 

static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
{
put_cpu();
}

blk_mq_alloc_request_hctx() calls __blk_mq_get_ctx() instead
of blk_mq_get_ctx(). Then reason is the "hctx" could belong to other
cpu. So blk_mq_get_ctx() doesn't work.

But then above put_cpu() in blk_mq_put_ctx() will trigger a WARNING
because we didn't do get_cpu() in blk_mq_alloc_request_hctx()

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin

On Tue, 2016-06-07 at 22:49 -0600, Jens Axboe wrote:
> On 06/06/2016 03:21 PM, Christoph Hellwig wrote:
> > From: Ming Lin 
> > 
> > For some protocols like NVMe over Fabrics we need to be able to
> > send
> > initialization commands to a specific queue.
> > 
> > Based on an earlier patch from Christoph Hellwig .
> > 
> > Signed-off-by: Ming Lin 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >   block/blk-mq.c | 33 +
> >   include/linux/blk-mq.h |  2 ++
> >   2 files changed, 35 insertions(+)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 29cbc1b..7bb45ed 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -266,6 +266,39 @@ struct request *blk_mq_alloc_request(struct
> > request_queue *q, int rw,
> >   }
> >   EXPORT_SYMBOL(blk_mq_alloc_request);
> > 
> > +struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> > int rw,
> > +   unsigned int flags, unsigned int hctx_idx)
> > +{
> > +   struct blk_mq_hw_ctx *hctx;
> > +   struct blk_mq_ctx *ctx;
> > +   struct request *rq;
> > +   struct blk_mq_alloc_data alloc_data;
> > +   int ret;
> > +
> > +   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
> > +   if (ret)
> > +   return ERR_PTR(ret);
> > +
> > +   hctx = q->queue_hw_ctx[hctx_idx];
> > +   ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));
> > +
> > +   blk_mq_set_alloc_data(_data, q, flags, ctx, hctx);
> > +
> > +   rq = __blk_mq_alloc_request(_data, rw);
> > +   if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
> > +   __blk_mq_run_hw_queue(hctx);
> > +
> > +   rq =  __blk_mq_alloc_request(_data, rw);
> > +   }
> 
> Why are we duplicating this code here? If NOWAIT isn't set, then
> we'll
> always return a request. bt_get() will run the queue for us, if it
> needs
> to. blk_mq_alloc_request() does this too, and I'm guessing that code
> was
> just copied. I'll fix that up. Looks like this should just be:
> 
>   rq = __blk_mq_alloc_request(_data, rw);
>   if (rq)
>   return rq;
> 
>   blk_queue_exit(q);
>   return ERR_PTR(-EWOULDBLOCK);
> 
> for this case.

Yes,

But the bt_get() reminds me that this patch actually has a problem.

blk_mq_alloc_request_hctx() ->
  __blk_mq_alloc_request() ->
    blk_mq_get_tag() -> 
      __blk_mq_get_tag() ->
        bt_get() ->
          blk_mq_put_ctx(data->ctx);

Here are blk_mq_get_ctx() and blk_mq_put_ctx().

static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
{   
return __blk_mq_get_ctx(q, get_cpu());
} 

static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
{
put_cpu();
}

blk_mq_alloc_request_hctx() calls __blk_mq_get_ctx() instead
of blk_mq_get_ctx(). Then reason is the "hctx" could belong to other
cpu. So blk_mq_get_ctx() doesn't work.

But then above put_cpu() in blk_mq_put_ctx() will trigger a WARNING
because we didn't do get_cpu() in blk_mq_alloc_request_hctx()

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin

On Tue, Jun 7, 2016 at 8:27 AM, Ming Lin <m...@kernel.org> wrote:
> On Tue, Jun 7, 2016 at 7:57 AM, Keith Busch <keith.bu...@intel.com> wrote:
>> On Mon, Jun 06, 2016 at 11:21:52PM +0200, Christoph Hellwig wrote:
>>> +struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
>>> + unsigned int flags, unsigned int hctx_idx)
>>> +{
>>> + struct blk_mq_hw_ctx *hctx;
>>> + struct blk_mq_ctx *ctx;
>>> + struct request *rq;
>>> + struct blk_mq_alloc_data alloc_data;
>>> + int ret;
>>> +
>>> + ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
>>> + if (ret)
>>> + return ERR_PTR(ret);
>>> +
>>> + hctx = q->queue_hw_ctx[hctx_idx];
>>
>> We probably want to check 'if (hctx_idx < q->nr_hw_queues)' before
>> getting the hctx. Even if hctx_idx was origially valid, it's possible
>> (though unlikely) blk_queue_enter waits on reallocating h/w contexts,
>> which can make hctx_idx invalid.
>
> Yes, I'll update it.

How about this?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7bb45ed..b59d2ef 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -279,6 +279,11 @@ struct request *blk_mq_alloc_request_hctx(struct
request_queue *q, int rw,
if (ret)
return ERR_PTR(ret);

+   if (hctx_idx >= q->nr_hw_queues) {
+   blk_queue_exit(q);
+   return ERR_PTR(-EINVAL);
+   }
+
hctx = q->queue_hw_ctx[hctx_idx];
ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin

On Tue, Jun 7, 2016 at 8:27 AM, Ming Lin  wrote:
> On Tue, Jun 7, 2016 at 7:57 AM, Keith Busch  wrote:
>> On Mon, Jun 06, 2016 at 11:21:52PM +0200, Christoph Hellwig wrote:
>>> +struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
>>> + unsigned int flags, unsigned int hctx_idx)
>>> +{
>>> + struct blk_mq_hw_ctx *hctx;
>>> + struct blk_mq_ctx *ctx;
>>> + struct request *rq;
>>> + struct blk_mq_alloc_data alloc_data;
>>> + int ret;
>>> +
>>> + ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
>>> + if (ret)
>>> + return ERR_PTR(ret);
>>> +
>>> + hctx = q->queue_hw_ctx[hctx_idx];
>>
>> We probably want to check 'if (hctx_idx < q->nr_hw_queues)' before
>> getting the hctx. Even if hctx_idx was origially valid, it's possible
>> (though unlikely) blk_queue_enter waits on reallocating h/w contexts,
>> which can make hctx_idx invalid.
>
> Yes, I'll update it.

How about this?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7bb45ed..b59d2ef 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -279,6 +279,11 @@ struct request *blk_mq_alloc_request_hctx(struct
request_queue *q, int rw,
if (ret)
return ERR_PTR(ret);

+   if (hctx_idx >= q->nr_hw_queues) {
+   blk_queue_exit(q);
+   return ERR_PTR(-EINVAL);
+   }
+
hctx = q->queue_hw_ctx[hctx_idx];
ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));

Re: NVMe over Fabrics target implementation

2016-06-07 Thread Ming Lin

On Tue, Jun 7, 2016 at 2:02 PM, Andy Grover  wrote:
> On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote:
>>
>> Hi HCH & Co,
>>
>> On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote:
>>>
>>> This patch set adds a generic NVMe over Fabrics target. The
>>> implementation conforms to the NVMe 1.2b specification (which
>>> includes Fabrics) and provides the NVMe over Fabrics access
>>> to Linux block devices.
>>>
>>
>> Thanks for all of the development work by the fabric_linux_driver team
>> (HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year.
>>
>> Very excited to see this code get a public release now that NVMf
>> specification is out.  Now that it's in the wild, it's a good
>> opportunity to discuss some of the more interesting implementation
>> details, beyond the new NVMf wire-protocol itself.
>
>
> I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)?

http://www.nvmexpress.org/wp-content/uploads/NVM_Express_1_2_1_Gold_20160603.pdf
http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605.pdf

Re: NVMe over Fabrics target implementation

2016-06-07 Thread Ming Lin

On Tue, Jun 7, 2016 at 2:02 PM, Andy Grover  wrote:
> On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote:
>>
>> Hi HCH & Co,
>>
>> On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote:
>>>
>>> This patch set adds a generic NVMe over Fabrics target. The
>>> implementation conforms to the NVMe 1.2b specification (which
>>> includes Fabrics) and provides the NVMe over Fabrics access
>>> to Linux block devices.
>>>
>>
>> Thanks for all of the development work by the fabric_linux_driver team
>> (HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year.
>>
>> Very excited to see this code get a public release now that NVMf
>> specification is out.  Now that it's in the wild, it's a good
>> opportunity to discuss some of the more interesting implementation
>> details, beyond the new NVMf wire-protocol itself.
>
>
> I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)?

http://www.nvmexpress.org/wp-content/uploads/NVM_Express_1_2_1_Gold_20160603.pdf
http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605.pdf

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin

On Tue, Jun 7, 2016 at 7:57 AM, Keith Busch  wrote:
> On Mon, Jun 06, 2016 at 11:21:52PM +0200, Christoph Hellwig wrote:
>> +struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
>> + unsigned int flags, unsigned int hctx_idx)
>> +{
>> + struct blk_mq_hw_ctx *hctx;
>> + struct blk_mq_ctx *ctx;
>> + struct request *rq;
>> + struct blk_mq_alloc_data alloc_data;
>> + int ret;
>> +
>> + ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
>> + if (ret)
>> + return ERR_PTR(ret);
>> +
>> + hctx = q->queue_hw_ctx[hctx_idx];
>
> We probably want to check 'if (hctx_idx < q->nr_hw_queues)' before
> getting the hctx. Even if hctx_idx was origially valid, it's possible
> (though unlikely) blk_queue_enter waits on reallocating h/w contexts,
> which can make hctx_idx invalid.

Yes, I'll update it.

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin

On Tue, Jun 7, 2016 at 7:57 AM, Keith Busch  wrote:
> On Mon, Jun 06, 2016 at 11:21:52PM +0200, Christoph Hellwig wrote:
>> +struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
>> + unsigned int flags, unsigned int hctx_idx)
>> +{
>> + struct blk_mq_hw_ctx *hctx;
>> + struct blk_mq_ctx *ctx;
>> + struct request *rq;
>> + struct blk_mq_alloc_data alloc_data;
>> + int ret;
>> +
>> + ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
>> + if (ret)
>> + return ERR_PTR(ret);
>> +
>> + hctx = q->queue_hw_ctx[hctx_idx];
>
> We probably want to check 'if (hctx_idx < q->nr_hw_queues)' before
> getting the hctx. Even if hctx_idx was origially valid, it's possible
> (though unlikely) blk_queue_enter waits on reallocating h/w contexts,
> which can make hctx_idx invalid.

Yes, I'll update it.

[PATCH] blk-mq: clear q->mq_ops if init fail

2016-05-26 Thread Ming Lin

From: Ming Lin <min...@samsung.com>

blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops
was not cleared when blk_mq_init_allocated_queue() fails.
Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because:
- q->all_q_node is not added to all_q_list yet
- q->tag_set is NULL
- hctx was not setup yet or already freed

Fixed it by clearing q->mq_ops on error path.

Signed-off-by: Ming Lin <min...@samsung.com>
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 67bf8ed..86f08b1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2054,7 +2054,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
 
q->queue_ctx = alloc_percpu(struct blk_mq_ctx);
if (!q->queue_ctx)
-   return ERR_PTR(-ENOMEM);
+   goto err_exit;
 
q->queue_hw_ctx = kzalloc_node(nr_cpu_ids * sizeof(*(q->queue_hw_ctx)),
GFP_KERNEL, set->numa_node);
@@ -2118,6 +2118,8 @@ err_map:
kfree(q->queue_hw_ctx);
 err_percpu:
free_percpu(q->queue_ctx);
+err_exit:
+   q->mq_ops = NULL;
return ERR_PTR(-ENOMEM);
 }
 EXPORT_SYMBOL(blk_mq_init_allocated_queue);
-- 
1.9.1

[PATCH] blk-mq: clear q->mq_ops if init fail

2016-05-26 Thread Ming Lin

From: Ming Lin 

blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops
was not cleared when blk_mq_init_allocated_queue() fails.
Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because:
- q->all_q_node is not added to all_q_list yet
- q->tag_set is NULL
- hctx was not setup yet or already freed

Fixed it by clearing q->mq_ops on error path.

Signed-off-by: Ming Lin 
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 67bf8ed..86f08b1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2054,7 +2054,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
 
q->queue_ctx = alloc_percpu(struct blk_mq_ctx);
if (!q->queue_ctx)
-   return ERR_PTR(-ENOMEM);
+   goto err_exit;
 
q->queue_hw_ctx = kzalloc_node(nr_cpu_ids * sizeof(*(q->queue_hw_ctx)),
GFP_KERNEL, set->numa_node);
@@ -2118,6 +2118,8 @@ err_map:
kfree(q->queue_hw_ctx);
 err_percpu:
free_percpu(q->queue_ctx);
+err_exit:
+   q->mq_ops = NULL;
return ERR_PTR(-ENOMEM);
 }
 EXPORT_SYMBOL(blk_mq_init_allocated_queue);
-- 
1.9.1

Re: [PATCH v2] lib: make sg_pool tristate instead of bool

2016-04-23 Thread Ming Lin

On Sat, Apr 23, 2016 at 7:44 PM, Paul Gortmaker
<paul.gortma...@windriver.com> wrote:
> The recently added Kconfig controlling compilation of this code is:
>
> lib/Kconfig:config SG_POOL
> lib/Kconfig:def_bool n
>
> ...meaning that it currently is not being built as a module by anyone,
> and that tripped my audit looking for modular code that is essentially
> orphaned (i.e. module_exit, and .remove fcns in non-modular drivers.)
>
> In the following discussion, Ming Lin indicated that the original
> intention was to have it tristate, so here we convert it accordingly.
>
> Also fix up a couple spelling issues that appear in the surrounding
> patch context.
>
> Cc: Christoph Hellwig <h...@lst.de>
> Cc: Ming Lin <min...@ssi.samsung.com>
> Cc: Sagi Grimberg <s...@grimberg.me>
> Cc: Martin K. Petersen <martin.peter...@oracle.com>
> Signed-off-by: Paul Gortmaker <paul.gortma...@windriver.com>
> ---
>
> [v2: drop modular code removal patch in favour of supporting a modular
>  build via a one line Kconfig patch as per Ming's comments.  Build tested
>  for allmodconfig on ARM and x86-64 on linux-next.   ]
>
>  lib/Kconfig | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/lib/Kconfig b/lib/Kconfig
> index e04f168f8f42..8de5868804b5 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -528,13 +528,13 @@ config SG_SPLIT
> help
>  Provides a helper to split scatterlists into chunks, each chunk being
>  a scatterlist. This should be selected by a driver or an API which
> -whishes to split a scatterlist amongst multiple DMA channels.
> +wishes to split a scatterlist amongst multiple DMA channels.
>
>  config SG_POOL
> -   def_bool n
> +   def_tristate n
> help
>  Provides a helper to allocate chained scatterlists. This should be
> -selected by a driver or an API which whishes to allocate chained
> +selected by a driver or an API which wishes to allocate chained
>  scatterlist.
>
>  #

Looks good.

Acked-by: Ming Lin <min...@ssi.samsung.com>

Thanks.

Re: [PATCH v2] lib: make sg_pool tristate instead of bool

2016-04-23 Thread Ming Lin

On Sat, Apr 23, 2016 at 7:44 PM, Paul Gortmaker
 wrote:
> The recently added Kconfig controlling compilation of this code is:
>
> lib/Kconfig:config SG_POOL
> lib/Kconfig:def_bool n
>
> ...meaning that it currently is not being built as a module by anyone,
> and that tripped my audit looking for modular code that is essentially
> orphaned (i.e. module_exit, and .remove fcns in non-modular drivers.)
>
> In the following discussion, Ming Lin indicated that the original
> intention was to have it tristate, so here we convert it accordingly.
>
> Also fix up a couple spelling issues that appear in the surrounding
> patch context.
>
> Cc: Christoph Hellwig 
> Cc: Ming Lin 
> Cc: Sagi Grimberg 
> Cc: Martin K. Petersen 
> Signed-off-by: Paul Gortmaker 
> ---
>
> [v2: drop modular code removal patch in favour of supporting a modular
>  build via a one line Kconfig patch as per Ming's comments.  Build tested
>  for allmodconfig on ARM and x86-64 on linux-next.   ]
>
>  lib/Kconfig | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/lib/Kconfig b/lib/Kconfig
> index e04f168f8f42..8de5868804b5 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -528,13 +528,13 @@ config SG_SPLIT
> help
>  Provides a helper to split scatterlists into chunks, each chunk being
>  a scatterlist. This should be selected by a driver or an API which
> -whishes to split a scatterlist amongst multiple DMA channels.
> +wishes to split a scatterlist amongst multiple DMA channels.
>
>  config SG_POOL
> -   def_bool n
> +   def_tristate n
> help
>  Provides a helper to allocate chained scatterlists. This should be
> -selected by a driver or an API which whishes to allocate chained
> +selected by a driver or an API which wishes to allocate chained
>  scatterlist.
>
>  #

Looks good.

Acked-by: Ming Lin 

Thanks.

Re: [PATCH] lib: make sg_pool explicitly non-modular

2016-04-20 Thread Ming Lin

On Wed, 2016-04-20 at 23:02 -0400, Paul Gortmaker wrote:
> 
> > 
> > So the .config will have CONFIG_SG_POOL=m
> 
> That is impossible currently, since as per above, the variable is
> bool
> and not tristate.  Did you mean to make it tristate?

I didn't notice that.

Yes, it should be tristate.

Re: [PATCH] lib: make sg_pool explicitly non-modular

2016-04-20 Thread Ming Lin

On Wed, 2016-04-20 at 23:02 -0400, Paul Gortmaker wrote:
> 
> > 
> > So the .config will have CONFIG_SG_POOL=m
> 
> That is impossible currently, since as per above, the variable is
> bool
> and not tristate.  Did you mean to make it tristate?

I didn't notice that.

Yes, it should be tristate.

Re: [PATCH] lib: make sg_pool explicitly non-modular

2016-04-20 Thread Ming Lin

On Wed, 2016-04-20 at 15:13 -0400, Paul Gortmaker wrote:
> The recently added Kconfig controlling compilation of this code is:
> 
> lib/Kconfig:config SG_POOL
> lib/Kconfig:def_bool n
> 
> ...meaning that it currently is not being built as a module by anyone.
> 
> Lets remove the modular code that is essentially orphaned, so that
> when reading the driver there is no doubt it is builtin-only.
> 
> Since module_init translates to device_initcall in the non-modular
> case, the init ordering remains unchanged with this commit.  However
> one might want to consider moving it to subsys_initcall if it is to
> be ready ahead of SCSI drivers wanting this and using device_initcall.
> 
> Cc: Christoph Hellwig <h...@lst.de>
> Cc: Ming Lin <min...@ssi.samsung.com>
> Cc: Sagi Grimberg <s...@grimberg.me>
> Cc: Martin K. Petersen <martin.peter...@oracle.com>
> Signed-off-by: Paul Gortmaker <paul.gortma...@windriver.com>
> ---
>  lib/sg_pool.c | 17 ++---
>  1 file changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/sg_pool.c b/lib/sg_pool.c
> index 6dd30615a201..e2cf548b9610 100644
> --- a/lib/sg_pool.c
> +++ b/lib/sg_pool.c
> @@ -1,4 +1,4 @@
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -156,17 +156,4 @@ cleanup_sdb:
>  
>   return -ENOMEM;
>  }
> -
> -static __exit void sg_pool_exit(void)
> -{
> - int i;
> -
> - for (i = 0; i < SG_MEMPOOL_NR; i++) {
> - struct sg_pool *sgp = sg_pools + i;
> - mempool_destroy(sgp->pool);
> - kmem_cache_destroy(sgp->slab);
> - }
> -}
> -
> -module_init(sg_pool_init);
> -module_exit(sg_pool_exit);
> +device_initcall(sg_pool_init);

For SCSI it's OK because always CONFIG_SCSI=y

But we may have a kernel .config with !CONFIG_SCSI and other
non-block-device driver may use this sg_pool.

So the .config will have CONFIG_SG_POOL=m

Re: [PATCH] lib: make sg_pool explicitly non-modular

2016-04-20 Thread Ming Lin

On Wed, 2016-04-20 at 15:13 -0400, Paul Gortmaker wrote:
> The recently added Kconfig controlling compilation of this code is:
> 
> lib/Kconfig:config SG_POOL
> lib/Kconfig:def_bool n
> 
> ...meaning that it currently is not being built as a module by anyone.
> 
> Lets remove the modular code that is essentially orphaned, so that
> when reading the driver there is no doubt it is builtin-only.
> 
> Since module_init translates to device_initcall in the non-modular
> case, the init ordering remains unchanged with this commit.  However
> one might want to consider moving it to subsys_initcall if it is to
> be ready ahead of SCSI drivers wanting this and using device_initcall.
> 
> Cc: Christoph Hellwig 
> Cc: Ming Lin 
> Cc: Sagi Grimberg 
> Cc: Martin K. Petersen 
> Signed-off-by: Paul Gortmaker 
> ---
>  lib/sg_pool.c | 17 ++---
>  1 file changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/sg_pool.c b/lib/sg_pool.c
> index 6dd30615a201..e2cf548b9610 100644
> --- a/lib/sg_pool.c
> +++ b/lib/sg_pool.c
> @@ -1,4 +1,4 @@
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -156,17 +156,4 @@ cleanup_sdb:
>  
>   return -ENOMEM;
>  }
> -
> -static __exit void sg_pool_exit(void)
> -{
> - int i;
> -
> - for (i = 0; i < SG_MEMPOOL_NR; i++) {
> - struct sg_pool *sgp = sg_pools + i;
> - mempool_destroy(sgp->pool);
> - kmem_cache_destroy(sgp->slab);
> - }
> -}
> -
> -module_init(sg_pool_init);
> -module_exit(sg_pool_exit);
> +device_initcall(sg_pool_init);

For SCSI it's OK because always CONFIG_SCSI=y

But we may have a kernel .config with !CONFIG_SCSI and other
non-block-device driver may use this sg_pool.

So the .config will have CONFIG_SG_POOL=m

Re: [PATCH v3 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-11 Thread Ming Lin

On Mon, Apr 11, 2016 at 2:34 PM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>>>>>> "Ming" == Ming Lin <m...@kernel.org> writes:
>
> Ming> Are we ready to merge it?
>
> We're still missing an ack from Sagi.

Thought we already had a ack from Bart.
OK, let's get one more from Sagi.

Re: [PATCH v3 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-11 Thread Ming Lin

On Mon, Apr 11, 2016 at 2:34 PM, Martin K. Petersen
 wrote:
>>>>>> "Ming" == Ming Lin  writes:
>
> Ming> Are we ready to merge it?
>
> We're still missing an ack from Sagi.

Thought we already had a ack from Bart.
OK, let's get one more from Sagi.

Re: [PATCH v3 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-10 Thread Ming Lin

On Tue, Apr 5, 2016 at 7:55 AM, Tejun Heo <t...@kernel.org> wrote:
> On Mon, Apr 04, 2016 at 02:48:10PM -0700, Ming Lin wrote:
>> From: Ming Lin <min...@ssi.samsung.com>
>>
>> Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
>> we fit into a single scatterlist chunk.
>>
>> Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.
>>
>> Will move these 2 generic definitions to scatterlist.h later.
>>
>> Reviewed-by: Christoph Hellwig <h...@lst.de>
>> Acked-by: Bart Van Assche <bart.vanass...@sandisk.com> (for ib_srp changes)
>> Signed-off-by: Ming Lin <min...@ssi.samsung.com>
>
> For libata,
>
> Acked-by: Tejun Heo <t...@kernel.org>

Hi James,

Are we ready to merge it?

Thanks,
Ming

Re: [PATCH v3 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-10 Thread Ming Lin

On Tue, Apr 5, 2016 at 7:55 AM, Tejun Heo  wrote:
> On Mon, Apr 04, 2016 at 02:48:10PM -0700, Ming Lin wrote:
>> From: Ming Lin 
>>
>> Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
>> we fit into a single scatterlist chunk.
>>
>> Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.
>>
>> Will move these 2 generic definitions to scatterlist.h later.
>>
>> Reviewed-by: Christoph Hellwig 
>> Acked-by: Bart Van Assche  (for ib_srp changes)
>> Signed-off-by: Ming Lin 
>
> For libata,
>
> Acked-by: Tejun Heo 

Hi James,

Are we ready to merge it?

Thanks,
Ming

Re: [PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-04-07 Thread Ming Lin

On Thu, Apr 7, 2016 at 9:43 AM, Ming Lin <m...@kernel.org> wrote:
> On Thu, Apr 7, 2016 at 7:56 AM, Bart Van Assche
> <bart.vanass...@sandisk.com> wrote:
>> On 03/15/16 15:39, Ming Lin wrote:
>>>
>>> +static void sg_mempoll_free(struct scatterlist *sgl, unsigned int nents)
>>
>>
>> Please change mempoll into mempool.
>
> Good catch. Thanks Bart!

Hi Bart,

This is actually the previous old RFC patch.
The v2 and v3 patch series doesn't have this typo.

Thanks.

Re: [PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-04-07 Thread Ming Lin

On Thu, Apr 7, 2016 at 9:43 AM, Ming Lin  wrote:
> On Thu, Apr 7, 2016 at 7:56 AM, Bart Van Assche
>  wrote:
>> On 03/15/16 15:39, Ming Lin wrote:
>>>
>>> +static void sg_mempoll_free(struct scatterlist *sgl, unsigned int nents)
>>
>>
>> Please change mempoll into mempool.
>
> Good catch. Thanks Bart!

Hi Bart,

This is actually the previous old RFC patch.
The v2 and v3 patch series doesn't have this typo.

Thanks.

Re: [PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-04-07 Thread Ming Lin

On Thu, Apr 7, 2016 at 7:56 AM, Bart Van Assche
<bart.vanass...@sandisk.com> wrote:
> On 03/15/16 15:39, Ming Lin wrote:
>>
>> +static void sg_mempoll_free(struct scatterlist *sgl, unsigned int nents)
>
>
> Please change mempoll into mempool.

Good catch. Thanks Bart!

Re: [PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-04-07 Thread Ming Lin

On Thu, Apr 7, 2016 at 7:56 AM, Bart Van Assche
 wrote:
> On 03/15/16 15:39, Ming Lin wrote:
>>
>> +static void sg_mempoll_free(struct scatterlist *sgl, unsigned int nents)
>
>
> Please change mempoll into mempool.

Good catch. Thanks Bart!

[PATCH v3 0/5] mempool based chained scatterlist alloc/free api

2016-04-04 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

The fist 4 patches make the SG related definitions/structs/functions
in SCSI code generic and the last patch move it to lib/sg_pool.c.

v3:
  - Resend for Tejun to review. No code change since v2.
  - Add review/ack tags

v2:
  - do modification in scsi code first then move to lib/sg_pool.c
  - address Christoph's comments


Ming Lin (5):
  scsi: replace "scsi_data_buffer" with "sg_table" in SG functions
  scsi: replace "mq" with "first_chunk" in SG functions
  scsi: rename SG related struct and functions
  scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS
  lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

 drivers/ata/pata_icside.c   |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |   4 +-
 drivers/scsi/Kconfig|   1 +
 drivers/scsi/arm/cumana_2.c |   2 +-
 drivers/scsi/arm/eesox.c|   2 +-
 drivers/scsi/arm/powertec.c |   2 +-
 drivers/scsi/esas2r/esas2r_main.c   |   4 +-
 drivers/scsi/hisi_sas/hisi_sas.h|   2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c |   4 +-
 drivers/scsi/mpt3sas/mpt3sas_base.h |   2 +-
 drivers/scsi/scsi_debug.c   |   2 +-
 drivers/scsi/scsi_lib.c | 172 +---
 drivers/usb/storage/scsiglue.c  |   2 +-
 include/linux/scatterlist.h |  25 ++
 include/scsi/scsi.h |  19 
 include/scsi/scsi_host.h|   2 +-
 lib/Kconfig |   7 ++
 lib/Makefile|   1 +
 lib/sg_pool.c   | 172 
 19 files changed, 241 insertions(+), 186 deletions(-)
 create mode 100644 lib/sg_pool.c

-- 
1.9.1

[PATCH v3 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-04 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
we fit into a single scatterlist chunk.

Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.

Will move these 2 generic definitions to scatterlist.h later.

Reviewed-by: Christoph Hellwig <h...@lst.de>
Acked-by: Bart Van Assche <bart.vanass...@sandisk.com> (for ib_srp changes)
Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/ata/pata_icside.c   |  2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |  4 ++--
 drivers/scsi/arm/cumana_2.c |  2 +-
 drivers/scsi/arm/eesox.c|  2 +-
 drivers/scsi/arm/powertec.c |  2 +-
 drivers/scsi/esas2r/esas2r_main.c   |  4 ++--
 drivers/scsi/hisi_sas/hisi_sas.h|  2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c |  4 ++--
 drivers/scsi/mpt3sas/mpt3sas_base.h |  2 +-
 drivers/scsi/scsi_debug.c   |  2 +-
 drivers/scsi/scsi_lib.c | 34 +-
 drivers/usb/storage/scsiglue.c  |  2 +-
 include/scsi/scsi.h |  8 
 include/scsi/scsi_host.h|  2 +-
 14 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/ata/pata_icside.c b/drivers/ata/pata_icside.c
index d7c7320..188f2f2 100644
--- a/drivers/ata/pata_icside.c
+++ b/drivers/ata/pata_icside.c
@@ -294,7 +294,7 @@ static int icside_dma_init(struct pata_icside_info *info)
 
 static struct scsi_host_template pata_icside_sht = {
ATA_BASE_SHT(DRV_NAME),
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
 };
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index ff21597..369a75e 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -81,7 +81,7 @@ MODULE_PARM_DESC(cmd_sg_entries,
 
 module_param(indirect_sg_entries, uint, 0444);
 MODULE_PARM_DESC(indirect_sg_entries,
-"Default max number of gather/scatter entries (default is 12, 
max is " __stringify(SCSI_MAX_SG_CHAIN_SEGMENTS) ")");
+"Default max number of gather/scatter entries (default is 12, 
max is " __stringify(SG_MAX_SEGMENTS) ")");
 
 module_param(allow_ext_sg, bool, 0444);
 MODULE_PARM_DESC(allow_ext_sg,
@@ -3097,7 +3097,7 @@ static int srp_parse_options(const char *buf, struct 
srp_target_port *target)
 
case SRP_OPT_SG_TABLESIZE:
if (match_int(args, ) || token < 1 ||
-   token > SCSI_MAX_SG_CHAIN_SEGMENTS) {
+   token > SG_MAX_SEGMENTS) {
pr_warn("bad max sg_tablesize parameter '%s'\n",
p);
goto out;
diff --git a/drivers/scsi/arm/cumana_2.c b/drivers/scsi/arm/cumana_2.c
index faa1bee..edce5f3 100644
--- a/drivers/scsi/arm/cumana_2.c
+++ b/drivers/scsi/arm/cumana_2.c
@@ -365,7 +365,7 @@ static struct scsi_host_template cumanascsi2_template = {
.eh_abort_handler   = fas216_eh_abort,
.can_queue  = 1,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = "cumanascsi2",
diff --git a/drivers/scsi/arm/eesox.c b/drivers/scsi/arm/eesox.c
index a8ad688..e93e047 100644
--- a/drivers/scsi/arm/eesox.c
+++ b/drivers/scsi/arm/eesox.c
@@ -484,7 +484,7 @@ static struct scsi_host_template eesox_template = {
.eh_abort_handler   = fas216_eh_abort,
.can_queue  = 1,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = "eesox",
diff --git a/drivers/scsi/arm/powertec.c b/drivers/scsi/arm/powertec.c
index 5e1b73e..79aa889 100644
--- a/drivers/scsi/arm/powertec.c
+++ b/drivers/scsi/arm/powertec.c
@@ -291,7 +291,7 @@ static struct scsi_host_template powertecscsi_template = {
 
.can_queue  = 8,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.cmd_per_lun= 2,
.use_clustering = ENABLE_CLUSTERING,
dif

[PATCH v3 0/5] mempool based chained scatterlist alloc/free api

2016-04-04 Thread Ming Lin

From: Ming Lin 

The fist 4 patches make the SG related definitions/structs/functions
in SCSI code generic and the last patch move it to lib/sg_pool.c.

v3:
  - Resend for Tejun to review. No code change since v2.
  - Add review/ack tags

v2:
  - do modification in scsi code first then move to lib/sg_pool.c
  - address Christoph's comments


Ming Lin (5):
  scsi: replace "scsi_data_buffer" with "sg_table" in SG functions
  scsi: replace "mq" with "first_chunk" in SG functions
  scsi: rename SG related struct and functions
  scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS
  lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

 drivers/ata/pata_icside.c   |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |   4 +-
 drivers/scsi/Kconfig|   1 +
 drivers/scsi/arm/cumana_2.c |   2 +-
 drivers/scsi/arm/eesox.c|   2 +-
 drivers/scsi/arm/powertec.c |   2 +-
 drivers/scsi/esas2r/esas2r_main.c   |   4 +-
 drivers/scsi/hisi_sas/hisi_sas.h|   2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c |   4 +-
 drivers/scsi/mpt3sas/mpt3sas_base.h |   2 +-
 drivers/scsi/scsi_debug.c   |   2 +-
 drivers/scsi/scsi_lib.c | 172 +---
 drivers/usb/storage/scsiglue.c  |   2 +-
 include/linux/scatterlist.h |  25 ++
 include/scsi/scsi.h |  19 
 include/scsi/scsi_host.h|   2 +-
 lib/Kconfig |   7 ++
 lib/Makefile|   1 +
 lib/sg_pool.c   | 172 
 19 files changed, 241 insertions(+), 186 deletions(-)
 create mode 100644 lib/sg_pool.c

-- 
1.9.1

[PATCH v3 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-04 Thread Ming Lin

From: Ming Lin 

Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
we fit into a single scatterlist chunk.

Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.

Will move these 2 generic definitions to scatterlist.h later.

Reviewed-by: Christoph Hellwig 
Acked-by: Bart Van Assche  (for ib_srp changes)
Signed-off-by: Ming Lin 
---
 drivers/ata/pata_icside.c   |  2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |  4 ++--
 drivers/scsi/arm/cumana_2.c |  2 +-
 drivers/scsi/arm/eesox.c|  2 +-
 drivers/scsi/arm/powertec.c |  2 +-
 drivers/scsi/esas2r/esas2r_main.c   |  4 ++--
 drivers/scsi/hisi_sas/hisi_sas.h|  2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c |  4 ++--
 drivers/scsi/mpt3sas/mpt3sas_base.h |  2 +-
 drivers/scsi/scsi_debug.c   |  2 +-
 drivers/scsi/scsi_lib.c | 34 +-
 drivers/usb/storage/scsiglue.c  |  2 +-
 include/scsi/scsi.h |  8 
 include/scsi/scsi_host.h|  2 +-
 14 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/ata/pata_icside.c b/drivers/ata/pata_icside.c
index d7c7320..188f2f2 100644
--- a/drivers/ata/pata_icside.c
+++ b/drivers/ata/pata_icside.c
@@ -294,7 +294,7 @@ static int icside_dma_init(struct pata_icside_info *info)
 
 static struct scsi_host_template pata_icside_sht = {
ATA_BASE_SHT(DRV_NAME),
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
 };
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index ff21597..369a75e 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -81,7 +81,7 @@ MODULE_PARM_DESC(cmd_sg_entries,
 
 module_param(indirect_sg_entries, uint, 0444);
 MODULE_PARM_DESC(indirect_sg_entries,
-"Default max number of gather/scatter entries (default is 12, 
max is " __stringify(SCSI_MAX_SG_CHAIN_SEGMENTS) ")");
+"Default max number of gather/scatter entries (default is 12, 
max is " __stringify(SG_MAX_SEGMENTS) ")");
 
 module_param(allow_ext_sg, bool, 0444);
 MODULE_PARM_DESC(allow_ext_sg,
@@ -3097,7 +3097,7 @@ static int srp_parse_options(const char *buf, struct 
srp_target_port *target)
 
case SRP_OPT_SG_TABLESIZE:
if (match_int(args, ) || token < 1 ||
-   token > SCSI_MAX_SG_CHAIN_SEGMENTS) {
+   token > SG_MAX_SEGMENTS) {
pr_warn("bad max sg_tablesize parameter '%s'\n",
p);
goto out;
diff --git a/drivers/scsi/arm/cumana_2.c b/drivers/scsi/arm/cumana_2.c
index faa1bee..edce5f3 100644
--- a/drivers/scsi/arm/cumana_2.c
+++ b/drivers/scsi/arm/cumana_2.c
@@ -365,7 +365,7 @@ static struct scsi_host_template cumanascsi2_template = {
.eh_abort_handler   = fas216_eh_abort,
.can_queue  = 1,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = "cumanascsi2",
diff --git a/drivers/scsi/arm/eesox.c b/drivers/scsi/arm/eesox.c
index a8ad688..e93e047 100644
--- a/drivers/scsi/arm/eesox.c
+++ b/drivers/scsi/arm/eesox.c
@@ -484,7 +484,7 @@ static struct scsi_host_template eesox_template = {
.eh_abort_handler   = fas216_eh_abort,
.can_queue  = 1,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = "eesox",
diff --git a/drivers/scsi/arm/powertec.c b/drivers/scsi/arm/powertec.c
index 5e1b73e..79aa889 100644
--- a/drivers/scsi/arm/powertec.c
+++ b/drivers/scsi/arm/powertec.c
@@ -291,7 +291,7 @@ static struct scsi_host_template powertecscsi_template = {
 
.can_queue  = 8,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.cmd_per_lun= 2,
.use_clustering = ENABLE_CLUSTERING,
diff --git a/drivers/scsi/esas2r/esas2r_main.c 
b/drivers/scsi/esas2r/esas2r_main.c
index 33581ba..2aca4d1 100644

[PATCH v3 2/5] scsi: replace "mq" with "first_chunk" in SG functions

2016-04-04 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Parameter "bool mq" is block driver specific.
Change it to "first_chunk" to make it more generic.

Reviewed-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/scsi_lib.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4229c18..9675353 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -583,33 +583,32 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct sg_table *table, bool mq)
+static void scsi_free_sgtable(struct sg_table *table, bool first_chunk)
 {
-   if (mq && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
+   if (first_chunk && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, scsi_sg_free);
 }
 
-static int scsi_alloc_sgtable(struct sg_table *table, int nents, bool mq)
+static int scsi_alloc_sgtable(struct sg_table *table, int nents,
+   struct scatterlist *first_chunk)
 {
-   struct scatterlist *first_chunk = NULL;
int ret;
 
BUG_ON(!nents);
 
-   if (mq) {
+   if (first_chunk) {
if (nents <= SCSI_MAX_SG_SEGMENTS) {
table->nents = table->orig_nents = nents;
sg_init_table(table->sgl, nents);
return 0;
}
-   first_chunk = table->sgl;
}
 
ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
   first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(table, mq);
+   scsi_free_sgtable(table, (bool)first_chunk);
return ret;
 }
 
@@ -1091,7 +1090,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
 * If sg table allocation fails, requeue request later.
 */
if (unlikely(scsi_alloc_sgtable(>table, req->nr_phys_segments,
-   req->mq_ctx != NULL)))
+   sdb->table.sgl)))
return BLKPREP_DEFER;
 
/* 
@@ -1163,7 +1162,8 @@ int scsi_init_io(struct scsi_cmnd *cmd)
 
ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-   if (scsi_alloc_sgtable(_sdb->table, ivecs, is_mq)) {
+   if (scsi_alloc_sgtable(_sdb->table, ivecs,
+   prot_sdb->table.sgl)) {
error = BLKPREP_DEFER;
goto err_exit;
}
-- 
1.9.1

[PATCH v3 2/5] scsi: replace "mq" with "first_chunk" in SG functions

2016-04-04 Thread Ming Lin

From: Ming Lin 

Parameter "bool mq" is block driver specific.
Change it to "first_chunk" to make it more generic.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lin 
---
 drivers/scsi/scsi_lib.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4229c18..9675353 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -583,33 +583,32 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct sg_table *table, bool mq)
+static void scsi_free_sgtable(struct sg_table *table, bool first_chunk)
 {
-   if (mq && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
+   if (first_chunk && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, scsi_sg_free);
 }
 
-static int scsi_alloc_sgtable(struct sg_table *table, int nents, bool mq)
+static int scsi_alloc_sgtable(struct sg_table *table, int nents,
+   struct scatterlist *first_chunk)
 {
-   struct scatterlist *first_chunk = NULL;
int ret;
 
BUG_ON(!nents);
 
-   if (mq) {
+   if (first_chunk) {
if (nents <= SCSI_MAX_SG_SEGMENTS) {
table->nents = table->orig_nents = nents;
sg_init_table(table->sgl, nents);
return 0;
}
-   first_chunk = table->sgl;
}
 
ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
   first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(table, mq);
+   scsi_free_sgtable(table, (bool)first_chunk);
return ret;
 }
 
@@ -1091,7 +1090,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
 * If sg table allocation fails, requeue request later.
 */
if (unlikely(scsi_alloc_sgtable(>table, req->nr_phys_segments,
-   req->mq_ctx != NULL)))
+   sdb->table.sgl)))
return BLKPREP_DEFER;
 
/* 
@@ -1163,7 +1162,8 @@ int scsi_init_io(struct scsi_cmnd *cmd)
 
ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-   if (scsi_alloc_sgtable(_sdb->table, ivecs, is_mq)) {
+   if (scsi_alloc_sgtable(_sdb->table, ivecs,
+   prot_sdb->table.sgl)) {
error = BLKPREP_DEFER;
goto err_exit;
}
-- 
1.9.1

[PATCH v3 1/5] scsi: replace "scsi_data_buffer" with "sg_table" in SG functions

2016-04-04 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Replace parameter "struct scsi_data_buffer" with "struct sg_table" in
SG alloc/free functions to make them generic.

Reviewed-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/scsi_lib.c | 41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8106515..4229c18 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -583,14 +583,14 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct scsi_data_buffer *sdb, bool mq)
+static void scsi_free_sgtable(struct sg_table *table, bool mq)
 {
-   if (mq && sdb->table.orig_nents <= SCSI_MAX_SG_SEGMENTS)
+   if (mq && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(>table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
 }
 
-static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents, bool mq)
+static int scsi_alloc_sgtable(struct sg_table *table, int nents, bool mq)
 {
struct scatterlist *first_chunk = NULL;
int ret;
@@ -599,17 +599,17 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer 
*sdb, int nents, bool mq)
 
if (mq) {
if (nents <= SCSI_MAX_SG_SEGMENTS) {
-   sdb->table.nents = sdb->table.orig_nents = nents;
-   sg_init_table(sdb->table.sgl, nents);
+   table->nents = table->orig_nents = nents;
+   sg_init_table(table->sgl, nents);
return 0;
}
-   first_chunk = sdb->table.sgl;
+   first_chunk = table->sgl;
}
 
-   ret = __sg_alloc_table(>table, nents, SCSI_MAX_SG_SEGMENTS,
+   ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
   first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(sdb, mq);
+   scsi_free_sgtable(table, mq);
return ret;
 }
 
@@ -625,12 +625,17 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 
 static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
 {
+   struct scsi_data_buffer *sdb;
+
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb, true);
-   if (cmd->request->next_rq && cmd->request->next_rq->special)
-   scsi_free_sgtable(cmd->request->next_rq->special, true);
+   scsi_free_sgtable(>sdb.table, true);
+   if (cmd->request->next_rq) {
+   sdb = cmd->request->next_rq->special;
+   if (sdb)
+   scsi_free_sgtable(>table, true);
+   }
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(cmd->prot_sdb, true);
+   scsi_free_sgtable(>prot_sdb->table, true);
 }
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
@@ -669,19 +674,19 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb, false);
+   scsi_free_sgtable(>sdb.table, false);
 
memset(>sdb, 0, sizeof(cmd->sdb));
 
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(cmd->prot_sdb, false);
+   scsi_free_sgtable(>prot_sdb->table, false);
 }
 
 static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-   scsi_free_sgtable(bidi_sdb, false);
+   scsi_free_sgtable(_sdb->table, false);
kmem_cache_free(scsi_sdb_cache, bidi_sdb);
cmd->request->next_rq->special = NULL;
 }
@@ -1085,7 +1090,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
/*
 * If sg table allocation fails, requeue request later.
 */
-   if (unlikely(scsi_alloc_sgtable(sdb, req->nr_phys_segments,
+   if (unlikely(scsi_alloc_sgtable(>table, req->nr_phys_segments,
req->mq_ctx != NULL)))
return BLKPREP_DEFER;
 
@@ -1158,7 +1163,7 @@ int scsi_init_io(struct scsi_cmnd *cmd)
 
ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-   if (scsi_alloc_sgtable(prot_sdb, ivecs, is_mq)) {
+   if (scsi_alloc_sgtable(_sdb->table, ivecs, is_mq)) {
error = BLKPREP_DEFER;
goto err_exit;
}
-- 
1.9.1

[PATCH v3 3/5] scsi: rename SG related struct and functions

2016-04-04 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Rename SCSI specific struct and functions to more genenic names.

Reviewed-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/scsi_lib.c | 52 -
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9675353..08134f6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -40,10 +40,10 @@
 #include "scsi_logging.h"
 
 
-#define SG_MEMPOOL_NR  ARRAY_SIZE(scsi_sg_pools)
+#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
 #define SG_MEMPOOL_SIZE2
 
-struct scsi_host_sg_pool {
+struct sg_pool {
size_t  size;
char*name;
struct kmem_cache   *slab;
@@ -54,7 +54,7 @@ struct scsi_host_sg_pool {
 #if (SCSI_MAX_SG_SEGMENTS < 32)
 #error SCSI_MAX_SG_SEGMENTS is too small (must be 32 or greater)
 #endif
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
+static struct sg_pool sg_pools[] = {
SP(8),
SP(16),
 #if (SCSI_MAX_SG_SEGMENTS > 32)
@@ -553,7 +553,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int scsi_sgtable_index(unsigned short nents)
+static inline unsigned int sg_pool_index(unsigned short nents)
 {
unsigned int index;
 
@@ -567,30 +567,30 @@ static inline unsigned int scsi_sgtable_index(unsigned 
short nents)
return index;
 }
 
-static void scsi_sg_free(struct scatterlist *sgl, unsigned int nents)
+static void sg_pool_free(struct scatterlist *sgl, unsigned int nents)
 {
-   struct scsi_host_sg_pool *sgp;
+   struct sg_pool *sgp;
 
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
+   sgp = sg_pools + sg_pool_index(nents);
mempool_free(sgl, sgp->pool);
 }
 
-static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
+static struct scatterlist *sg_pool_alloc(unsigned int nents, gfp_t gfp_mask)
 {
-   struct scsi_host_sg_pool *sgp;
+   struct sg_pool *sgp;
 
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
+   sgp = sg_pools + sg_pool_index(nents);
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct sg_table *table, bool first_chunk)
+static void sg_free_table_chained(struct sg_table *table, bool first_chunk)
 {
if (first_chunk && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, sg_pool_free);
 }
 
-static int scsi_alloc_sgtable(struct sg_table *table, int nents,
+static int sg_alloc_table_chained(struct sg_table *table, int nents,
struct scatterlist *first_chunk)
 {
int ret;
@@ -606,9 +606,9 @@ static int scsi_alloc_sgtable(struct sg_table *table, int 
nents,
}
 
ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
-  first_chunk, GFP_ATOMIC, scsi_sg_alloc);
+  first_chunk, GFP_ATOMIC, sg_pool_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(table, (bool)first_chunk);
+   sg_free_table_chained(table, (bool)first_chunk);
return ret;
 }
 
@@ -627,14 +627,14 @@ static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
struct scsi_data_buffer *sdb;
 
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb.table, true);
+   sg_free_table_chained(>sdb.table, true);
if (cmd->request->next_rq) {
sdb = cmd->request->next_rq->special;
if (sdb)
-   scsi_free_sgtable(>table, true);
+   sg_free_table_chained(>table, true);
}
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(>prot_sdb->table, true);
+   sg_free_table_chained(>prot_sdb->table, true);
 }
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
@@ -673,19 +673,19 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb.table, false);
+   sg_free_table_chained(>sdb.table, false);
 
memset(>sdb, 0, sizeof(cmd->sdb));
 
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(>prot_sdb->table, false);
+   sg_free_table_chained(>prot_sdb->table, false);
 }
 
 static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-   scsi_free_sgtable(_sdb->table, false);
+   sg_free_table_ch

[PATCH v3 1/5] scsi: replace "scsi_data_buffer" with "sg_table" in SG functions

2016-04-04 Thread Ming Lin

From: Ming Lin 

Replace parameter "struct scsi_data_buffer" with "struct sg_table" in
SG alloc/free functions to make them generic.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lin 
---
 drivers/scsi/scsi_lib.c | 41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8106515..4229c18 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -583,14 +583,14 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct scsi_data_buffer *sdb, bool mq)
+static void scsi_free_sgtable(struct sg_table *table, bool mq)
 {
-   if (mq && sdb->table.orig_nents <= SCSI_MAX_SG_SEGMENTS)
+   if (mq && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(>table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
 }
 
-static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents, bool mq)
+static int scsi_alloc_sgtable(struct sg_table *table, int nents, bool mq)
 {
struct scatterlist *first_chunk = NULL;
int ret;
@@ -599,17 +599,17 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer 
*sdb, int nents, bool mq)
 
if (mq) {
if (nents <= SCSI_MAX_SG_SEGMENTS) {
-   sdb->table.nents = sdb->table.orig_nents = nents;
-   sg_init_table(sdb->table.sgl, nents);
+   table->nents = table->orig_nents = nents;
+   sg_init_table(table->sgl, nents);
return 0;
}
-   first_chunk = sdb->table.sgl;
+   first_chunk = table->sgl;
}
 
-   ret = __sg_alloc_table(>table, nents, SCSI_MAX_SG_SEGMENTS,
+   ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
   first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(sdb, mq);
+   scsi_free_sgtable(table, mq);
return ret;
 }
 
@@ -625,12 +625,17 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 
 static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
 {
+   struct scsi_data_buffer *sdb;
+
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb, true);
-   if (cmd->request->next_rq && cmd->request->next_rq->special)
-   scsi_free_sgtable(cmd->request->next_rq->special, true);
+   scsi_free_sgtable(>sdb.table, true);
+   if (cmd->request->next_rq) {
+   sdb = cmd->request->next_rq->special;
+   if (sdb)
+   scsi_free_sgtable(>table, true);
+   }
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(cmd->prot_sdb, true);
+   scsi_free_sgtable(>prot_sdb->table, true);
 }
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
@@ -669,19 +674,19 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb, false);
+   scsi_free_sgtable(>sdb.table, false);
 
memset(>sdb, 0, sizeof(cmd->sdb));
 
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(cmd->prot_sdb, false);
+   scsi_free_sgtable(>prot_sdb->table, false);
 }
 
 static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-   scsi_free_sgtable(bidi_sdb, false);
+   scsi_free_sgtable(_sdb->table, false);
kmem_cache_free(scsi_sdb_cache, bidi_sdb);
cmd->request->next_rq->special = NULL;
 }
@@ -1085,7 +1090,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
/*
 * If sg table allocation fails, requeue request later.
 */
-   if (unlikely(scsi_alloc_sgtable(sdb, req->nr_phys_segments,
+   if (unlikely(scsi_alloc_sgtable(>table, req->nr_phys_segments,
req->mq_ctx != NULL)))
return BLKPREP_DEFER;
 
@@ -1158,7 +1163,7 @@ int scsi_init_io(struct scsi_cmnd *cmd)
 
ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-   if (scsi_alloc_sgtable(prot_sdb, ivecs, is_mq)) {
+   if (scsi_alloc_sgtable(_sdb->table, ivecs, is_mq)) {
error = BLKPREP_DEFER;
goto err_exit;
}
-- 
1.9.1

[PATCH v3 3/5] scsi: rename SG related struct and functions

2016-04-04 Thread Ming Lin

From: Ming Lin 

Rename SCSI specific struct and functions to more genenic names.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lin 
---
 drivers/scsi/scsi_lib.c | 52 -
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9675353..08134f6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -40,10 +40,10 @@
 #include "scsi_logging.h"
 
 
-#define SG_MEMPOOL_NR  ARRAY_SIZE(scsi_sg_pools)
+#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
 #define SG_MEMPOOL_SIZE2
 
-struct scsi_host_sg_pool {
+struct sg_pool {
size_t  size;
char*name;
struct kmem_cache   *slab;
@@ -54,7 +54,7 @@ struct scsi_host_sg_pool {
 #if (SCSI_MAX_SG_SEGMENTS < 32)
 #error SCSI_MAX_SG_SEGMENTS is too small (must be 32 or greater)
 #endif
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
+static struct sg_pool sg_pools[] = {
SP(8),
SP(16),
 #if (SCSI_MAX_SG_SEGMENTS > 32)
@@ -553,7 +553,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int scsi_sgtable_index(unsigned short nents)
+static inline unsigned int sg_pool_index(unsigned short nents)
 {
unsigned int index;
 
@@ -567,30 +567,30 @@ static inline unsigned int scsi_sgtable_index(unsigned 
short nents)
return index;
 }
 
-static void scsi_sg_free(struct scatterlist *sgl, unsigned int nents)
+static void sg_pool_free(struct scatterlist *sgl, unsigned int nents)
 {
-   struct scsi_host_sg_pool *sgp;
+   struct sg_pool *sgp;
 
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
+   sgp = sg_pools + sg_pool_index(nents);
mempool_free(sgl, sgp->pool);
 }
 
-static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
+static struct scatterlist *sg_pool_alloc(unsigned int nents, gfp_t gfp_mask)
 {
-   struct scsi_host_sg_pool *sgp;
+   struct sg_pool *sgp;
 
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
+   sgp = sg_pools + sg_pool_index(nents);
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct sg_table *table, bool first_chunk)
+static void sg_free_table_chained(struct sg_table *table, bool first_chunk)
 {
if (first_chunk && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, sg_pool_free);
 }
 
-static int scsi_alloc_sgtable(struct sg_table *table, int nents,
+static int sg_alloc_table_chained(struct sg_table *table, int nents,
struct scatterlist *first_chunk)
 {
int ret;
@@ -606,9 +606,9 @@ static int scsi_alloc_sgtable(struct sg_table *table, int 
nents,
}
 
ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
-  first_chunk, GFP_ATOMIC, scsi_sg_alloc);
+  first_chunk, GFP_ATOMIC, sg_pool_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(table, (bool)first_chunk);
+   sg_free_table_chained(table, (bool)first_chunk);
return ret;
 }
 
@@ -627,14 +627,14 @@ static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
struct scsi_data_buffer *sdb;
 
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb.table, true);
+   sg_free_table_chained(>sdb.table, true);
if (cmd->request->next_rq) {
sdb = cmd->request->next_rq->special;
if (sdb)
-   scsi_free_sgtable(>table, true);
+   sg_free_table_chained(>table, true);
}
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(>prot_sdb->table, true);
+   sg_free_table_chained(>prot_sdb->table, true);
 }
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
@@ -673,19 +673,19 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb.table, false);
+   sg_free_table_chained(>sdb.table, false);
 
memset(>sdb, 0, sizeof(cmd->sdb));
 
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(>prot_sdb->table, false);
+   sg_free_table_chained(>prot_sdb->table, false);
 }
 
 static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-   scsi_free_sgtable(_sdb->table, false);
+   sg_free_table_chained(_sdb->table, false);
kmem_cache_free(scsi_sdb_cache, bidi_sdb

[PATCH v3 5/5] lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

2016-04-04 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Now it's ready to move the mempool based SG chained allocator code from
SCSI driver to lib/sg_pool.c, which will be compiled only based on a Kconfig
symbol CONFIG_SG_POOL.

SCSI selects CONFIG_SG_POOL.

Reviewed-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/Kconfig|   1 +
 drivers/scsi/scsi_lib.c | 137 ---
 include/linux/scatterlist.h |  25 +++
 include/scsi/scsi.h |  19 -
 lib/Kconfig |   7 ++
 lib/Makefile|   1 +
 lib/sg_pool.c   | 172 
 7 files changed, 206 insertions(+), 156 deletions(-)
 create mode 100644 lib/sg_pool.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 0950567..98e5d51 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -17,6 +17,7 @@ config SCSI
tristate "SCSI device support"
depends on BLOCK
select SCSI_DMA if HAS_DMA
+   select SG_POOL
---help---
  If you want to use a SCSI hard disk, SCSI tape drive, SCSI CD-ROM or
  any other SCSI device under Linux, say Y and make sure that you know
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8f776f1..b920c5d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -14,8 +14,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -40,39 +38,6 @@
 #include "scsi_logging.h"
 
 
-#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
-#define SG_MEMPOOL_SIZE2
-
-struct sg_pool {
-   size_t  size;
-   char*name;
-   struct kmem_cache   *slab;
-   mempool_t   *pool;
-};
-
-#define SP(x) { .size = x, "sgpool-" __stringify(x) }
-#if (SG_CHUNK_SIZE < 32)
-#error SG_CHUNK_SIZE is too small (must be 32 or greater)
-#endif
-static struct sg_pool sg_pools[] = {
-   SP(8),
-   SP(16),
-#if (SG_CHUNK_SIZE > 32)
-   SP(32),
-#if (SG_CHUNK_SIZE > 64)
-   SP(64),
-#if (SG_CHUNK_SIZE > 128)
-   SP(128),
-#if (SG_CHUNK_SIZE > 256)
-#error SG_CHUNK_SIZE is too large (256 MAX)
-#endif
-#endif
-#endif
-#endif
-   SP(SG_CHUNK_SIZE)
-};
-#undef SP
-
 struct kmem_cache *scsi_sdb_cache;
 
 /*
@@ -553,65 +518,6 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int sg_pool_index(unsigned short nents)
-{
-   unsigned int index;
-
-   BUG_ON(nents > SG_CHUNK_SIZE);
-
-   if (nents <= 8)
-   index = 0;
-   else
-   index = get_count_order(nents) - 3;
-
-   return index;
-}
-
-static void sg_pool_free(struct scatterlist *sgl, unsigned int nents)
-{
-   struct sg_pool *sgp;
-
-   sgp = sg_pools + sg_pool_index(nents);
-   mempool_free(sgl, sgp->pool);
-}
-
-static struct scatterlist *sg_pool_alloc(unsigned int nents, gfp_t gfp_mask)
-{
-   struct sg_pool *sgp;
-
-   sgp = sg_pools + sg_pool_index(nents);
-   return mempool_alloc(sgp->pool, gfp_mask);
-}
-
-static void sg_free_table_chained(struct sg_table *table, bool first_chunk)
-{
-   if (first_chunk && table->orig_nents <= SG_CHUNK_SIZE)
-   return;
-   __sg_free_table(table, SG_CHUNK_SIZE, first_chunk, sg_pool_free);
-}
-
-static int sg_alloc_table_chained(struct sg_table *table, int nents,
-   struct scatterlist *first_chunk)
-{
-   int ret;
-
-   BUG_ON(!nents);
-
-   if (first_chunk) {
-   if (nents <= SG_CHUNK_SIZE) {
-   table->nents = table->orig_nents = nents;
-   sg_init_table(table->sgl, nents);
-   return 0;
-   }
-   }
-
-   ret = __sg_alloc_table(table, nents, SG_CHUNK_SIZE,
-  first_chunk, GFP_ATOMIC, sg_pool_alloc);
-   if (unlikely(ret))
-   sg_free_table_chained(table, (bool)first_chunk);
-   return ret;
-}
-
 static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 {
if (cmd->request->cmd_type == REQ_TYPE_FS) {
@@ -2269,8 +2175,6 @@ EXPORT_SYMBOL(scsi_unblock_requests);
 
 int __init scsi_init_queue(void)
 {
-   int i;
-
scsi_sdb_cache = kmem_cache_create("scsi_data_buffer",
   sizeof(struct scsi_data_buffer),
   0, 0, NULL);
@@ -2279,53 +2183,12 @@ int __init scsi_init_queue(void)
return -ENOMEM;
}
 
-   for (i = 0; i < SG_MEMPOOL_NR; i++) {
-   struct sg_pool *sgp = sg_pools + i;
-   int size = sgp->size * sizeof(struct scatterlist);
-
-   sgp->slab = kmem_cache_create(sgp->name, size, 0,
-

[PATCH v3 5/5] lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

2016-04-04 Thread Ming Lin

From: Ming Lin 

Now it's ready to move the mempool based SG chained allocator code from
SCSI driver to lib/sg_pool.c, which will be compiled only based on a Kconfig
symbol CONFIG_SG_POOL.

SCSI selects CONFIG_SG_POOL.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lin 
---
 drivers/scsi/Kconfig|   1 +
 drivers/scsi/scsi_lib.c | 137 ---
 include/linux/scatterlist.h |  25 +++
 include/scsi/scsi.h |  19 -
 lib/Kconfig |   7 ++
 lib/Makefile|   1 +
 lib/sg_pool.c   | 172 
 7 files changed, 206 insertions(+), 156 deletions(-)
 create mode 100644 lib/sg_pool.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 0950567..98e5d51 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -17,6 +17,7 @@ config SCSI
tristate "SCSI device support"
depends on BLOCK
select SCSI_DMA if HAS_DMA
+   select SG_POOL
---help---
  If you want to use a SCSI hard disk, SCSI tape drive, SCSI CD-ROM or
  any other SCSI device under Linux, say Y and make sure that you know
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8f776f1..b920c5d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -14,8 +14,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -40,39 +38,6 @@
 #include "scsi_logging.h"
 
 
-#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
-#define SG_MEMPOOL_SIZE2
-
-struct sg_pool {
-   size_t  size;
-   char*name;
-   struct kmem_cache   *slab;
-   mempool_t   *pool;
-};
-
-#define SP(x) { .size = x, "sgpool-" __stringify(x) }
-#if (SG_CHUNK_SIZE < 32)
-#error SG_CHUNK_SIZE is too small (must be 32 or greater)
-#endif
-static struct sg_pool sg_pools[] = {
-   SP(8),
-   SP(16),
-#if (SG_CHUNK_SIZE > 32)
-   SP(32),
-#if (SG_CHUNK_SIZE > 64)
-   SP(64),
-#if (SG_CHUNK_SIZE > 128)
-   SP(128),
-#if (SG_CHUNK_SIZE > 256)
-#error SG_CHUNK_SIZE is too large (256 MAX)
-#endif
-#endif
-#endif
-#endif
-   SP(SG_CHUNK_SIZE)
-};
-#undef SP
-
 struct kmem_cache *scsi_sdb_cache;
 
 /*
@@ -553,65 +518,6 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int sg_pool_index(unsigned short nents)
-{
-   unsigned int index;
-
-   BUG_ON(nents > SG_CHUNK_SIZE);
-
-   if (nents <= 8)
-   index = 0;
-   else
-   index = get_count_order(nents) - 3;
-
-   return index;
-}
-
-static void sg_pool_free(struct scatterlist *sgl, unsigned int nents)
-{
-   struct sg_pool *sgp;
-
-   sgp = sg_pools + sg_pool_index(nents);
-   mempool_free(sgl, sgp->pool);
-}
-
-static struct scatterlist *sg_pool_alloc(unsigned int nents, gfp_t gfp_mask)
-{
-   struct sg_pool *sgp;
-
-   sgp = sg_pools + sg_pool_index(nents);
-   return mempool_alloc(sgp->pool, gfp_mask);
-}
-
-static void sg_free_table_chained(struct sg_table *table, bool first_chunk)
-{
-   if (first_chunk && table->orig_nents <= SG_CHUNK_SIZE)
-   return;
-   __sg_free_table(table, SG_CHUNK_SIZE, first_chunk, sg_pool_free);
-}
-
-static int sg_alloc_table_chained(struct sg_table *table, int nents,
-   struct scatterlist *first_chunk)
-{
-   int ret;
-
-   BUG_ON(!nents);
-
-   if (first_chunk) {
-   if (nents <= SG_CHUNK_SIZE) {
-   table->nents = table->orig_nents = nents;
-   sg_init_table(table->sgl, nents);
-   return 0;
-   }
-   }
-
-   ret = __sg_alloc_table(table, nents, SG_CHUNK_SIZE,
-  first_chunk, GFP_ATOMIC, sg_pool_alloc);
-   if (unlikely(ret))
-   sg_free_table_chained(table, (bool)first_chunk);
-   return ret;
-}
-
 static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 {
if (cmd->request->cmd_type == REQ_TYPE_FS) {
@@ -2269,8 +2175,6 @@ EXPORT_SYMBOL(scsi_unblock_requests);
 
 int __init scsi_init_queue(void)
 {
-   int i;
-
scsi_sdb_cache = kmem_cache_create("scsi_data_buffer",
   sizeof(struct scsi_data_buffer),
   0, 0, NULL);
@@ -2279,53 +2183,12 @@ int __init scsi_init_queue(void)
return -ENOMEM;
}
 
-   for (i = 0; i < SG_MEMPOOL_NR; i++) {
-   struct sg_pool *sgp = sg_pools + i;
-   int size = sgp->size * sizeof(struct scatterlist);
-
-   sgp->slab = kmem_cache_create(sgp->name, size, 0,
-   SLAB_HWCACHE_ALIGN, NULL);
-   if (!sgp->slab) {
-

Re: [PATCH v2 5/5] lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

2016-04-04 Thread Ming Lin

On Mon, Apr 4, 2016 at 1:17 PM, Christoph Hellwig <h...@lst.de> wrote:
> On Mon, Apr 04, 2016 at 01:15:45PM -0700, Ming Lin wrote:
>> cleanup_sdb:
>> for (i = 0; i < SG_MEMPOOL_NR; i++) {
>> struct sg_pool *sgp = sg_pools + i;
>> if (sgp->pool)
>> mempool_destroy(sgp->pool);
>> if (sgp->slab)
>> kmem_cache_destroy(sgp->slab);
>> }
>>
>> I'll keep the NULL check if no objection.
>
> I don't necessarily, but given that this is a code move I'd prefer
> to keep the code as similar as possible in the actual move patch..

So I'll just keep it.
And I can send a cleanup patch after this series applied.

Re: [PATCH v2 5/5] lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

2016-04-04 Thread Ming Lin

On Mon, Apr 4, 2016 at 1:17 PM, Christoph Hellwig  wrote:
> On Mon, Apr 04, 2016 at 01:15:45PM -0700, Ming Lin wrote:
>> cleanup_sdb:
>> for (i = 0; i < SG_MEMPOOL_NR; i++) {
>> struct sg_pool *sgp = sg_pools + i;
>> if (sgp->pool)
>> mempool_destroy(sgp->pool);
>> if (sgp->slab)
>> kmem_cache_destroy(sgp->slab);
>> }
>>
>> I'll keep the NULL check if no objection.
>
> I don't necessarily, but given that this is a code move I'd prefer
> to keep the code as similar as possible in the actual move patch..

So I'll just keep it.
And I can send a cleanup patch after this series applied.

Re: [PATCH v2 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-04 Thread Ming Lin

On Tue, Mar 22, 2016 at 3:03 PM, Ming Lin <m...@kernel.org> wrote:
> From: Ming Lin <min...@ssi.samsung.com>
>
> Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
> we fit into a single scatterlist chunk.
>
> Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.
>
> Will move these 2 generic definitions to scatterlist.h later.
>
> Signed-off-by: Ming Lin <min...@ssi.samsung.com>
> ---
>  drivers/ata/pata_icside.c   |  2 +-
>  drivers/infiniband/ulp/srp/ib_srp.c |  4 ++--
>  drivers/scsi/arm/cumana_2.c |  2 +-
>  drivers/scsi/arm/eesox.c|  2 +-
>  drivers/scsi/arm/powertec.c |  2 +-
>  drivers/scsi/esas2r/esas2r_main.c   |  4 ++--
>  drivers/scsi/hisi_sas/hisi_sas.h|  2 +-
>  drivers/scsi/mpt3sas/mpt3sas_base.c |  4 ++--
>  drivers/scsi/mpt3sas/mpt3sas_base.h |  2 +-
>  drivers/scsi/scsi_debug.c   |  2 +-
>  drivers/scsi/scsi_lib.c | 34 +-
>  drivers/usb/storage/scsiglue.c  |  2 +-
>  include/scsi/scsi.h |  8 
>  include/scsi/scsi_host.h|  2 +-
>  14 files changed, 36 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/ata/pata_icside.c b/drivers/ata/pata_icside.c
> index d7c7320..188f2f2 100644
> --- a/drivers/ata/pata_icside.c
> +++ b/drivers/ata/pata_icside.c
> @@ -294,7 +294,7 @@ static int icside_dma_init(struct pata_icside_info *info)
>
>  static struct scsi_host_template pata_icside_sht = {
> ATA_BASE_SHT(DRV_NAME),
> -   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
> +   .sg_tablesize   = SG_MAX_SEGMENTS,
> .dma_boundary   = IOMD_DMA_BOUNDARY,
>  };

Hi Tejun,

Could you help to review/ack this ATA part?
Thanks.

> diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
> index e0a3398..74dafa7 100644
> --- a/include/scsi/scsi.h
> +++ b/include/scsi/scsi.h
> @@ -24,16 +24,16 @@ enum scsi_timeouts {
>   * to SG_MAX_SINGLE_ALLOC to pack correctly at the highest order.  The
>   * minimum value is 32
>   */
> -#define SCSI_MAX_SG_SEGMENTS   128
> +#define SG_CHUNK_SIZE  128
>
>  /*
> - * Like SCSI_MAX_SG_SEGMENTS, but for archs that have sg chaining. This limit
> + * Like SG_CHUNK_SIZE, but for archs that have sg chaining. This limit
>   * is totally arbitrary, a setting of 2048 will get you at least 8mb ios.
>   */
>  #ifdef CONFIG_ARCH_HAS_SG_CHAIN
> -#define SCSI_MAX_SG_CHAIN_SEGMENTS 2048
> +#define SG_MAX_SEGMENTS2048
>  #else
> -#define SCSI_MAX_SG_CHAIN_SEGMENTS SCSI_MAX_SG_SEGMENTS
> +#define SG_MAX_SEGMENTSSG_CHUNK_SIZE
>  #endif
>
>  /*
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index fcfa3d7..76e9d27 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -37,7 +37,7 @@ struct blk_queue_tags;
>   *  used in one scatter-gather request.
>   */
>  #define SG_NONE 0
> -#define SG_ALL SCSI_MAX_SG_SEGMENTS
> +#define SG_ALL SG_CHUNK_SIZE
>
>  #define MODE_UNKNOWN 0x00
>  #define MODE_INITIATOR 0x01
> --
> 1.9.1

Re: [PATCH v2 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-04 Thread Ming Lin

On Tue, Mar 22, 2016 at 3:03 PM, Ming Lin  wrote:
> From: Ming Lin 
>
> Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
> we fit into a single scatterlist chunk.
>
> Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.
>
> Will move these 2 generic definitions to scatterlist.h later.
>
> Signed-off-by: Ming Lin 
> ---
>  drivers/ata/pata_icside.c   |  2 +-
>  drivers/infiniband/ulp/srp/ib_srp.c |  4 ++--
>  drivers/scsi/arm/cumana_2.c |  2 +-
>  drivers/scsi/arm/eesox.c|  2 +-
>  drivers/scsi/arm/powertec.c |  2 +-
>  drivers/scsi/esas2r/esas2r_main.c   |  4 ++--
>  drivers/scsi/hisi_sas/hisi_sas.h|  2 +-
>  drivers/scsi/mpt3sas/mpt3sas_base.c |  4 ++--
>  drivers/scsi/mpt3sas/mpt3sas_base.h |  2 +-
>  drivers/scsi/scsi_debug.c   |  2 +-
>  drivers/scsi/scsi_lib.c | 34 +-
>  drivers/usb/storage/scsiglue.c  |  2 +-
>  include/scsi/scsi.h |  8 
>  include/scsi/scsi_host.h|  2 +-
>  14 files changed, 36 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/ata/pata_icside.c b/drivers/ata/pata_icside.c
> index d7c7320..188f2f2 100644
> --- a/drivers/ata/pata_icside.c
> +++ b/drivers/ata/pata_icside.c
> @@ -294,7 +294,7 @@ static int icside_dma_init(struct pata_icside_info *info)
>
>  static struct scsi_host_template pata_icside_sht = {
> ATA_BASE_SHT(DRV_NAME),
> -   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
> +   .sg_tablesize   = SG_MAX_SEGMENTS,
> .dma_boundary   = IOMD_DMA_BOUNDARY,
>  };

Hi Tejun,

Could you help to review/ack this ATA part?
Thanks.

> diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
> index e0a3398..74dafa7 100644
> --- a/include/scsi/scsi.h
> +++ b/include/scsi/scsi.h
> @@ -24,16 +24,16 @@ enum scsi_timeouts {
>   * to SG_MAX_SINGLE_ALLOC to pack correctly at the highest order.  The
>   * minimum value is 32
>   */
> -#define SCSI_MAX_SG_SEGMENTS   128
> +#define SG_CHUNK_SIZE  128
>
>  /*
> - * Like SCSI_MAX_SG_SEGMENTS, but for archs that have sg chaining. This limit
> + * Like SG_CHUNK_SIZE, but for archs that have sg chaining. This limit
>   * is totally arbitrary, a setting of 2048 will get you at least 8mb ios.
>   */
>  #ifdef CONFIG_ARCH_HAS_SG_CHAIN
> -#define SCSI_MAX_SG_CHAIN_SEGMENTS 2048
> +#define SG_MAX_SEGMENTS2048
>  #else
> -#define SCSI_MAX_SG_CHAIN_SEGMENTS SCSI_MAX_SG_SEGMENTS
> +#define SG_MAX_SEGMENTSSG_CHUNK_SIZE
>  #endif
>
>  /*
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index fcfa3d7..76e9d27 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -37,7 +37,7 @@ struct blk_queue_tags;
>   *  used in one scatter-gather request.
>   */
>  #define SG_NONE 0
> -#define SG_ALL SCSI_MAX_SG_SEGMENTS
> +#define SG_ALL SG_CHUNK_SIZE
>
>  #define MODE_UNKNOWN 0x00
>  #define MODE_INITIATOR 0x01
> --
> 1.9.1

Re: [PATCH v2 5/5] lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

2016-04-04 Thread Ming Lin

On Tue, Mar 22, 2016 at 7:38 PM, kbuild test robot <l...@intel.com> wrote:
> Hi Ming,
>
> [auto build test WARNING on scsi/for-next]
> [also build test WARNING on v4.5 next-20160322]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improving the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Ming-Lin/mempool-based-chained-scatterlist-alloc-free-api/20160323-060710
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
>
>
> coccinelle warnings: (new ones prefixed by >>)
>
>>> lib/sg_pool.c:152:3-18: WARNING: NULL check before freeing functions like 
>>> kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not 
>>> needed. Maybe consider reorganizing relevant code to avoid passing NULL 
>>> values.
>lib/sg_pool.c:154:3-21: WARNING: NULL check before freeing functions like 
> kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not 
> needed. Maybe consider reorganizing relevant code to avoid passing NULL 
> values.

mempool_destroy()/kmem_cache_destroy() is OK to accept NULL pointer.
But the logic is more readable that we do NULL check when cleanup due to error.

cleanup_sdb:
for (i = 0; i < SG_MEMPOOL_NR; i++) {
struct sg_pool *sgp = sg_pools + i;
if (sgp->pool)
mempool_destroy(sgp->pool);
if (sgp->slab)
kmem_cache_destroy(sgp->slab);
}

I'll keep the NULL check if no objection.

Thanks.

Re: [PATCH v2 5/5] lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

2016-04-04 Thread Ming Lin

On Tue, Mar 22, 2016 at 7:38 PM, kbuild test robot  wrote:
> Hi Ming,
>
> [auto build test WARNING on scsi/for-next]
> [also build test WARNING on v4.5 next-20160322]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improving the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Ming-Lin/mempool-based-chained-scatterlist-alloc-free-api/20160323-060710
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
>
>
> coccinelle warnings: (new ones prefixed by >>)
>
>>> lib/sg_pool.c:152:3-18: WARNING: NULL check before freeing functions like 
>>> kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not 
>>> needed. Maybe consider reorganizing relevant code to avoid passing NULL 
>>> values.
>lib/sg_pool.c:154:3-21: WARNING: NULL check before freeing functions like 
> kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not 
> needed. Maybe consider reorganizing relevant code to avoid passing NULL 
> values.

mempool_destroy()/kmem_cache_destroy() is OK to accept NULL pointer.
But the logic is more readable that we do NULL check when cleanup due to error.

cleanup_sdb:
for (i = 0; i < SG_MEMPOOL_NR; i++) {
struct sg_pool *sgp = sg_pools + i;
if (sgp->pool)
mempool_destroy(sgp->pool);
if (sgp->slab)
kmem_cache_destroy(sgp->slab);
}

I'll keep the NULL check if no objection.

Thanks.

Re: [PATCH v2 0/5] mempool based chained scatterlist alloc/free api

2016-04-04 Thread Ming Lin

On Mon, Mar 28, 2016 at 7:48 AM, Ming Lin <m...@kernel.org> wrote:
> On Thu, Mar 24, 2016 at 8:46 AM, James Bottomley
> <james.bottom...@hansenpartnership.com> wrote:
>> On Thu, 2016-03-24 at 08:09 -0700, Ming Lin wrote:
>>> On Wed, Mar 23, 2016 at 12:40 AM, Christoph Hellwig <h...@lst.de>
>>> wrote:
>>> > On Tue, Mar 22, 2016 at 03:03:11PM -0700, Ming Lin wrote:
>>> > > From: Ming Lin <min...@ssi.samsung.com>
>>> > >
>>> > > The fist 4 patches make the SG related
>>> > > definitions/structs/functions
>>> > > in SCSI code generic and the last patch move it to lib/sg_pool.c.
>>> > >
>>> > > I still keep the macro "SG_MEMPOOL_NR" since it's used in 3
>>> > > places.
>>> >
>>> > I don't ѕee the point, but I'm not going to block the series over
>>> > it either.
>>> >
>>> > The new series looks really nice to me!
>>> >
>>> > Reviewed-by: Christoph Hellwig <h...@lst.de>
>>>
>>> Hi James,
>>>
>>> This series touches several sub-systems.
>>> What's the best way to merge it?
>>
>> It has a minor intrusion into
>>
>>  drivers/ata/pata_icside.c   |   2 +-
>>  drivers/infiniband/ulp/srp/ib_srp.c |   4 +-
>>  drivers/usb/storage/scsiglue.c  |   2 +-
>>
>> Apart from that, it's all SCSI, so the SCSI tree would seem to be the
>> best one.
>
> Are you OK to merge it?

Hi James,

Ping ...

Are you OK to apply this series?
Or any changes needed?

Thanks.

>
> Thanks.

Re: [PATCH v2 0/5] mempool based chained scatterlist alloc/free api

2016-04-04 Thread Ming Lin

On Mon, Mar 28, 2016 at 7:48 AM, Ming Lin  wrote:
> On Thu, Mar 24, 2016 at 8:46 AM, James Bottomley
>  wrote:
>> On Thu, 2016-03-24 at 08:09 -0700, Ming Lin wrote:
>>> On Wed, Mar 23, 2016 at 12:40 AM, Christoph Hellwig 
>>> wrote:
>>> > On Tue, Mar 22, 2016 at 03:03:11PM -0700, Ming Lin wrote:
>>> > > From: Ming Lin 
>>> > >
>>> > > The fist 4 patches make the SG related
>>> > > definitions/structs/functions
>>> > > in SCSI code generic and the last patch move it to lib/sg_pool.c.
>>> > >
>>> > > I still keep the macro "SG_MEMPOOL_NR" since it's used in 3
>>> > > places.
>>> >
>>> > I don't ѕee the point, but I'm not going to block the series over
>>> > it either.
>>> >
>>> > The new series looks really nice to me!
>>> >
>>> > Reviewed-by: Christoph Hellwig 
>>>
>>> Hi James,
>>>
>>> This series touches several sub-systems.
>>> What's the best way to merge it?
>>
>> It has a minor intrusion into
>>
>>  drivers/ata/pata_icside.c   |   2 +-
>>  drivers/infiniband/ulp/srp/ib_srp.c |   4 +-
>>  drivers/usb/storage/scsiglue.c  |   2 +-
>>
>> Apart from that, it's all SCSI, so the SCSI tree would seem to be the
>> best one.
>
> Are you OK to merge it?

Hi James,

Ping ...

Are you OK to apply this series?
Or any changes needed?

Thanks.

>
> Thanks.

Re: [PATCH v2 0/5] mempool based chained scatterlist alloc/free api

2016-03-28 Thread Ming Lin

On Thu, Mar 24, 2016 at 8:46 AM, James Bottomley
<james.bottom...@hansenpartnership.com> wrote:
> On Thu, 2016-03-24 at 08:09 -0700, Ming Lin wrote:
>> On Wed, Mar 23, 2016 at 12:40 AM, Christoph Hellwig <h...@lst.de>
>> wrote:
>> > On Tue, Mar 22, 2016 at 03:03:11PM -0700, Ming Lin wrote:
>> > > From: Ming Lin <min...@ssi.samsung.com>
>> > >
>> > > The fist 4 patches make the SG related
>> > > definitions/structs/functions
>> > > in SCSI code generic and the last patch move it to lib/sg_pool.c.
>> > >
>> > > I still keep the macro "SG_MEMPOOL_NR" since it's used in 3
>> > > places.
>> >
>> > I don't ѕee the point, but I'm not going to block the series over
>> > it either.
>> >
>> > The new series looks really nice to me!
>> >
>> > Reviewed-by: Christoph Hellwig <h...@lst.de>
>>
>> Hi James,
>>
>> This series touches several sub-systems.
>> What's the best way to merge it?
>
> It has a minor intrusion into
>
>  drivers/ata/pata_icside.c   |   2 +-
>  drivers/infiniband/ulp/srp/ib_srp.c |   4 +-
>  drivers/usb/storage/scsiglue.c  |   2 +-
>
> Apart from that, it's all SCSI, so the SCSI tree would seem to be the
> best one.

Are you OK to merge it?

Thanks.

Re: [PATCH v2 0/5] mempool based chained scatterlist alloc/free api

2016-03-28 Thread Ming Lin

On Thu, Mar 24, 2016 at 8:46 AM, James Bottomley
 wrote:
> On Thu, 2016-03-24 at 08:09 -0700, Ming Lin wrote:
>> On Wed, Mar 23, 2016 at 12:40 AM, Christoph Hellwig 
>> wrote:
>> > On Tue, Mar 22, 2016 at 03:03:11PM -0700, Ming Lin wrote:
>> > > From: Ming Lin 
>> > >
>> > > The fist 4 patches make the SG related
>> > > definitions/structs/functions
>> > > in SCSI code generic and the last patch move it to lib/sg_pool.c.
>> > >
>> > > I still keep the macro "SG_MEMPOOL_NR" since it's used in 3
>> > > places.
>> >
>> > I don't ѕee the point, but I'm not going to block the series over
>> > it either.
>> >
>> > The new series looks really nice to me!
>> >
>> > Reviewed-by: Christoph Hellwig 
>>
>> Hi James,
>>
>> This series touches several sub-systems.
>> What's the best way to merge it?
>
> It has a minor intrusion into
>
>  drivers/ata/pata_icside.c   |   2 +-
>  drivers/infiniband/ulp/srp/ib_srp.c |   4 +-
>  drivers/usb/storage/scsiglue.c  |   2 +-
>
> Apart from that, it's all SCSI, so the SCSI tree would seem to be the
> best one.

Are you OK to merge it?

Thanks.

Re: [PATCH v2 0/5] mempool based chained scatterlist alloc/free api

2016-03-24 Thread Ming Lin

On Wed, Mar 23, 2016 at 12:40 AM, Christoph Hellwig <h...@lst.de> wrote:
> On Tue, Mar 22, 2016 at 03:03:11PM -0700, Ming Lin wrote:
>> From: Ming Lin <min...@ssi.samsung.com>
>>
>> The fist 4 patches make the SG related definitions/structs/functions
>> in SCSI code generic and the last patch move it to lib/sg_pool.c.
>>
>> I still keep the macro "SG_MEMPOOL_NR" since it's used in 3 places.
>
> I don't ѕee the point, but I'm not going to block the series over
> it either.
>
> The new series looks really nice to me!
>
> Reviewed-by: Christoph Hellwig <h...@lst.de>

Hi James,

This series touches several sub-systems.
What's the best way to merge it?

Thanks.

Re: [PATCH v2 0/5] mempool based chained scatterlist alloc/free api

2016-03-24 Thread Ming Lin

On Wed, Mar 23, 2016 at 12:40 AM, Christoph Hellwig  wrote:
> On Tue, Mar 22, 2016 at 03:03:11PM -0700, Ming Lin wrote:
>> From: Ming Lin 
>>
>> The fist 4 patches make the SG related definitions/structs/functions
>> in SCSI code generic and the last patch move it to lib/sg_pool.c.
>>
>> I still keep the macro "SG_MEMPOOL_NR" since it's used in 3 places.
>
> I don't ѕee the point, but I'm not going to block the series over
> it either.
>
> The new series looks really nice to me!
>
> Reviewed-by: Christoph Hellwig 

Hi James,

This series touches several sub-systems.
What's the best way to merge it?

Thanks.

[PATCH v2 1/5] scsi: replace "scsi_data_buffer" with "sg_table" in SG functions

2016-03-22 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Replace parameter "struct scsi_data_buffer" with "struct sg_table" in
SG alloc/free functions to make them generic.

Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/scsi_lib.c | 41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8c6e318..5eaddc7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -583,14 +583,14 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct scsi_data_buffer *sdb, bool mq)
+static void scsi_free_sgtable(struct sg_table *table, bool mq)
 {
-   if (mq && sdb->table.orig_nents <= SCSI_MAX_SG_SEGMENTS)
+   if (mq && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(>table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
 }
 
-static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents, bool mq)
+static int scsi_alloc_sgtable(struct sg_table *table, int nents, bool mq)
 {
struct scatterlist *first_chunk = NULL;
int ret;
@@ -599,17 +599,17 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer 
*sdb, int nents, bool mq)
 
if (mq) {
if (nents <= SCSI_MAX_SG_SEGMENTS) {
-   sdb->table.nents = sdb->table.orig_nents = nents;
-   sg_init_table(sdb->table.sgl, nents);
+   table->nents = table->orig_nents = nents;
+   sg_init_table(table->sgl, nents);
return 0;
}
-   first_chunk = sdb->table.sgl;
+   first_chunk = table->sgl;
}
 
-   ret = __sg_alloc_table(>table, nents, SCSI_MAX_SG_SEGMENTS,
+   ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
   first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(sdb, mq);
+   scsi_free_sgtable(table, mq);
return ret;
 }
 
@@ -625,12 +625,17 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 
 static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
 {
+   struct scsi_data_buffer *sdb;
+
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb, true);
-   if (cmd->request->next_rq && cmd->request->next_rq->special)
-   scsi_free_sgtable(cmd->request->next_rq->special, true);
+   scsi_free_sgtable(>sdb.table, true);
+   if (cmd->request->next_rq) {
+   sdb = cmd->request->next_rq->special;
+   if (sdb)
+   scsi_free_sgtable(>table, true);
+   }
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(cmd->prot_sdb, true);
+   scsi_free_sgtable(>prot_sdb->table, true);
 }
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
@@ -669,19 +674,19 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb, false);
+   scsi_free_sgtable(>sdb.table, false);
 
memset(>sdb, 0, sizeof(cmd->sdb));
 
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(cmd->prot_sdb, false);
+   scsi_free_sgtable(>prot_sdb->table, false);
 }
 
 static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-   scsi_free_sgtable(bidi_sdb, false);
+   scsi_free_sgtable(_sdb->table, false);
kmem_cache_free(scsi_sdb_cache, bidi_sdb);
cmd->request->next_rq->special = NULL;
 }
@@ -1085,7 +1090,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
/*
 * If sg table allocation fails, requeue request later.
 */
-   if (unlikely(scsi_alloc_sgtable(sdb, req->nr_phys_segments,
+   if (unlikely(scsi_alloc_sgtable(>table, req->nr_phys_segments,
req->mq_ctx != NULL)))
return BLKPREP_DEFER;
 
@@ -1158,7 +1163,7 @@ int scsi_init_io(struct scsi_cmnd *cmd)
 
ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-   if (scsi_alloc_sgtable(prot_sdb, ivecs, is_mq)) {
+   if (scsi_alloc_sgtable(_sdb->table, ivecs, is_mq)) {
error = BLKPREP_DEFER;
goto err_exit;
}
-- 
1.9.1

[PATCH v2 0/5] mempool based chained scatterlist alloc/free api

2016-03-22 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

The fist 4 patches make the SG related definitions/structs/functions
in SCSI code generic and the last patch move it to lib/sg_pool.c.

I still keep the macro "SG_MEMPOOL_NR" since it's used in 3 places.

v2:
  - do modification in scsi code first then move to lib/sg_pool.c
  - address Christoph's comments

Ming Lin (5):
  scsi: replace "scsi_data_buffer" with "sg_table" in SG functions
  scsi: replace "mq" with "first_chunk" in SG functions
  scsi: rename SG related struct and functions
  scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS
  lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

 drivers/ata/pata_icside.c   |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |   4 +-
 drivers/scsi/Kconfig|   1 +
 drivers/scsi/arm/cumana_2.c |   2 +-
 drivers/scsi/arm/eesox.c|   2 +-
 drivers/scsi/arm/powertec.c |   2 +-
 drivers/scsi/esas2r/esas2r_main.c   |   4 +-
 drivers/scsi/hisi_sas/hisi_sas.h|   2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c |   4 +-
 drivers/scsi/mpt3sas/mpt3sas_base.h |   2 +-
 drivers/scsi/scsi_debug.c   |   2 +-
 drivers/scsi/scsi_lib.c | 172 +---
 drivers/usb/storage/scsiglue.c  |   2 +-
 include/linux/scatterlist.h |  25 ++
 include/scsi/scsi.h |  19 
 include/scsi/scsi_host.h|   2 +-
 lib/Kconfig |   7 ++
 lib/Makefile|   1 +
 lib/sg_pool.c   | 172 
 19 files changed, 241 insertions(+), 186 deletions(-)
 create mode 100644 lib/sg_pool.c

-- 
1.9.1

[PATCH v2 1/5] scsi: replace "scsi_data_buffer" with "sg_table" in SG functions

2016-03-22 Thread Ming Lin

From: Ming Lin 

Replace parameter "struct scsi_data_buffer" with "struct sg_table" in
SG alloc/free functions to make them generic.

Signed-off-by: Ming Lin 
---
 drivers/scsi/scsi_lib.c | 41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8c6e318..5eaddc7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -583,14 +583,14 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct scsi_data_buffer *sdb, bool mq)
+static void scsi_free_sgtable(struct sg_table *table, bool mq)
 {
-   if (mq && sdb->table.orig_nents <= SCSI_MAX_SG_SEGMENTS)
+   if (mq && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(>table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
 }
 
-static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents, bool mq)
+static int scsi_alloc_sgtable(struct sg_table *table, int nents, bool mq)
 {
struct scatterlist *first_chunk = NULL;
int ret;
@@ -599,17 +599,17 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer 
*sdb, int nents, bool mq)
 
if (mq) {
if (nents <= SCSI_MAX_SG_SEGMENTS) {
-   sdb->table.nents = sdb->table.orig_nents = nents;
-   sg_init_table(sdb->table.sgl, nents);
+   table->nents = table->orig_nents = nents;
+   sg_init_table(table->sgl, nents);
return 0;
}
-   first_chunk = sdb->table.sgl;
+   first_chunk = table->sgl;
}
 
-   ret = __sg_alloc_table(>table, nents, SCSI_MAX_SG_SEGMENTS,
+   ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
   first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(sdb, mq);
+   scsi_free_sgtable(table, mq);
return ret;
 }
 
@@ -625,12 +625,17 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 
 static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
 {
+   struct scsi_data_buffer *sdb;
+
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb, true);
-   if (cmd->request->next_rq && cmd->request->next_rq->special)
-   scsi_free_sgtable(cmd->request->next_rq->special, true);
+   scsi_free_sgtable(>sdb.table, true);
+   if (cmd->request->next_rq) {
+   sdb = cmd->request->next_rq->special;
+   if (sdb)
+   scsi_free_sgtable(>table, true);
+   }
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(cmd->prot_sdb, true);
+   scsi_free_sgtable(>prot_sdb->table, true);
 }
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
@@ -669,19 +674,19 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb, false);
+   scsi_free_sgtable(>sdb.table, false);
 
memset(>sdb, 0, sizeof(cmd->sdb));
 
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(cmd->prot_sdb, false);
+   scsi_free_sgtable(>prot_sdb->table, false);
 }
 
 static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-   scsi_free_sgtable(bidi_sdb, false);
+   scsi_free_sgtable(_sdb->table, false);
kmem_cache_free(scsi_sdb_cache, bidi_sdb);
cmd->request->next_rq->special = NULL;
 }
@@ -1085,7 +1090,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
/*
 * If sg table allocation fails, requeue request later.
 */
-   if (unlikely(scsi_alloc_sgtable(sdb, req->nr_phys_segments,
+   if (unlikely(scsi_alloc_sgtable(>table, req->nr_phys_segments,
req->mq_ctx != NULL)))
return BLKPREP_DEFER;
 
@@ -1158,7 +1163,7 @@ int scsi_init_io(struct scsi_cmnd *cmd)
 
ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-   if (scsi_alloc_sgtable(prot_sdb, ivecs, is_mq)) {
+   if (scsi_alloc_sgtable(_sdb->table, ivecs, is_mq)) {
error = BLKPREP_DEFER;
goto err_exit;
}
-- 
1.9.1

[PATCH v2 0/5] mempool based chained scatterlist alloc/free api

2016-03-22 Thread Ming Lin

From: Ming Lin 

The fist 4 patches make the SG related definitions/structs/functions
in SCSI code generic and the last patch move it to lib/sg_pool.c.

I still keep the macro "SG_MEMPOOL_NR" since it's used in 3 places.

v2:
  - do modification in scsi code first then move to lib/sg_pool.c
  - address Christoph's comments

Ming Lin (5):
  scsi: replace "scsi_data_buffer" with "sg_table" in SG functions
  scsi: replace "mq" with "first_chunk" in SG functions
  scsi: rename SG related struct and functions
  scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS
  lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

 drivers/ata/pata_icside.c   |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |   4 +-
 drivers/scsi/Kconfig|   1 +
 drivers/scsi/arm/cumana_2.c |   2 +-
 drivers/scsi/arm/eesox.c|   2 +-
 drivers/scsi/arm/powertec.c |   2 +-
 drivers/scsi/esas2r/esas2r_main.c   |   4 +-
 drivers/scsi/hisi_sas/hisi_sas.h|   2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c |   4 +-
 drivers/scsi/mpt3sas/mpt3sas_base.h |   2 +-
 drivers/scsi/scsi_debug.c   |   2 +-
 drivers/scsi/scsi_lib.c | 172 +---
 drivers/usb/storage/scsiglue.c  |   2 +-
 include/linux/scatterlist.h |  25 ++
 include/scsi/scsi.h |  19 
 include/scsi/scsi_host.h|   2 +-
 lib/Kconfig |   7 ++
 lib/Makefile|   1 +
 lib/sg_pool.c   | 172 
 19 files changed, 241 insertions(+), 186 deletions(-)
 create mode 100644 lib/sg_pool.c

-- 
1.9.1

[PATCH v2 5/5] lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

2016-03-22 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Now it's ready to move the mempool based SG chained allocator code from
SCSI driver to lib/sg_pool.c, which will be compiled only based on a Kconfig
symbol CONFIG_SG_POOL.

SCSI selects CONFIG_SG_POOL.

Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/Kconfig|   1 +
 drivers/scsi/scsi_lib.c | 137 ---
 include/linux/scatterlist.h |  25 +++
 include/scsi/scsi.h |  19 -
 lib/Kconfig |   7 ++
 lib/Makefile|   1 +
 lib/sg_pool.c   | 172 
 7 files changed, 206 insertions(+), 156 deletions(-)
 create mode 100644 lib/sg_pool.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index e2f31c9..ee9cf41 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -17,6 +17,7 @@ config SCSI
tristate "SCSI device support"
depends on BLOCK
select SCSI_DMA if HAS_DMA
+   select SG_POOL
---help---
  If you want to use a SCSI hard disk, SCSI tape drive, SCSI CD-ROM or
  any other SCSI device under Linux, say Y and make sure that you know
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 478b0e8..6d818e8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -14,8 +14,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -40,39 +38,6 @@
 #include "scsi_logging.h"
 
 
-#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
-#define SG_MEMPOOL_SIZE2
-
-struct sg_pool {
-   size_t  size;
-   char*name;
-   struct kmem_cache   *slab;
-   mempool_t   *pool;
-};
-
-#define SP(x) { .size = x, "sgpool-" __stringify(x) }
-#if (SG_CHUNK_SIZE < 32)
-#error SG_CHUNK_SIZE is too small (must be 32 or greater)
-#endif
-static struct sg_pool sg_pools[] = {
-   SP(8),
-   SP(16),
-#if (SG_CHUNK_SIZE > 32)
-   SP(32),
-#if (SG_CHUNK_SIZE > 64)
-   SP(64),
-#if (SG_CHUNK_SIZE > 128)
-   SP(128),
-#if (SG_CHUNK_SIZE > 256)
-#error SG_CHUNK_SIZE is too large (256 MAX)
-#endif
-#endif
-#endif
-#endif
-   SP(SG_CHUNK_SIZE)
-};
-#undef SP
-
 struct kmem_cache *scsi_sdb_cache;
 
 /*
@@ -553,65 +518,6 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int sg_pool_index(unsigned short nents)
-{
-   unsigned int index;
-
-   BUG_ON(nents > SG_CHUNK_SIZE);
-
-   if (nents <= 8)
-   index = 0;
-   else
-   index = get_count_order(nents) - 3;
-
-   return index;
-}
-
-static void sg_pool_free(struct scatterlist *sgl, unsigned int nents)
-{
-   struct sg_pool *sgp;
-
-   sgp = sg_pools + sg_pool_index(nents);
-   mempool_free(sgl, sgp->pool);
-}
-
-static struct scatterlist *sg_pool_alloc(unsigned int nents, gfp_t gfp_mask)
-{
-   struct sg_pool *sgp;
-
-   sgp = sg_pools + sg_pool_index(nents);
-   return mempool_alloc(sgp->pool, gfp_mask);
-}
-
-static void sg_free_table_chained(struct sg_table *table, bool first_chunk)
-{
-   if (first_chunk && table->orig_nents <= SG_CHUNK_SIZE)
-   return;
-   __sg_free_table(table, SG_CHUNK_SIZE, first_chunk, sg_pool_free);
-}
-
-static int sg_alloc_table_chained(struct sg_table *table, int nents,
-   struct scatterlist *first_chunk)
-{
-   int ret;
-
-   BUG_ON(!nents);
-
-   if (first_chunk) {
-   if (nents <= SG_CHUNK_SIZE) {
-   table->nents = table->orig_nents = nents;
-   sg_init_table(table->sgl, nents);
-   return 0;
-   }
-   }
-
-   ret = __sg_alloc_table(table, nents, SG_CHUNK_SIZE,
-  first_chunk, GFP_ATOMIC, sg_pool_alloc);
-   if (unlikely(ret))
-   sg_free_table_chained(table, (bool)first_chunk);
-   return ret;
-}
-
 static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 {
if (cmd->request->cmd_type == REQ_TYPE_FS) {
@@ -2269,8 +2175,6 @@ EXPORT_SYMBOL(scsi_unblock_requests);
 
 int __init scsi_init_queue(void)
 {
-   int i;
-
scsi_sdb_cache = kmem_cache_create("scsi_data_buffer",
   sizeof(struct scsi_data_buffer),
   0, 0, NULL);
@@ -2279,53 +2183,12 @@ int __init scsi_init_queue(void)
return -ENOMEM;
}
 
-   for (i = 0; i < SG_MEMPOOL_NR; i++) {
-   struct sg_pool *sgp = sg_pools + i;
-   int size = sgp->size * sizeof(struct scatterlist);
-
-   sgp->slab = kmem_cache_create(sgp->name, size, 0,
-   SLAB_

[PATCH v2 3/5] scsi: rename SG related struct and functions

2016-03-22 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Rename SCSI specific struct and functions to more genenic names.

Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/scsi_lib.c | 52 -
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ab3dd09..5fd9dd7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -40,10 +40,10 @@
 #include "scsi_logging.h"
 
 
-#define SG_MEMPOOL_NR  ARRAY_SIZE(scsi_sg_pools)
+#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
 #define SG_MEMPOOL_SIZE2
 
-struct scsi_host_sg_pool {
+struct sg_pool {
size_t  size;
char*name;
struct kmem_cache   *slab;
@@ -54,7 +54,7 @@ struct scsi_host_sg_pool {
 #if (SCSI_MAX_SG_SEGMENTS < 32)
 #error SCSI_MAX_SG_SEGMENTS is too small (must be 32 or greater)
 #endif
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
+static struct sg_pool sg_pools[] = {
SP(8),
SP(16),
 #if (SCSI_MAX_SG_SEGMENTS > 32)
@@ -553,7 +553,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int scsi_sgtable_index(unsigned short nents)
+static inline unsigned int sg_pool_index(unsigned short nents)
 {
unsigned int index;
 
@@ -567,30 +567,30 @@ static inline unsigned int scsi_sgtable_index(unsigned 
short nents)
return index;
 }
 
-static void scsi_sg_free(struct scatterlist *sgl, unsigned int nents)
+static void sg_pool_free(struct scatterlist *sgl, unsigned int nents)
 {
-   struct scsi_host_sg_pool *sgp;
+   struct sg_pool *sgp;
 
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
+   sgp = sg_pools + sg_pool_index(nents);
mempool_free(sgl, sgp->pool);
 }
 
-static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
+static struct scatterlist *sg_pool_alloc(unsigned int nents, gfp_t gfp_mask)
 {
-   struct scsi_host_sg_pool *sgp;
+   struct sg_pool *sgp;
 
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
+   sgp = sg_pools + sg_pool_index(nents);
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct sg_table *table, bool first_chunk)
+static void sg_free_table_chained(struct sg_table *table, bool first_chunk)
 {
if (first_chunk && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, sg_pool_free);
 }
 
-static int scsi_alloc_sgtable(struct sg_table *table, int nents,
+static int sg_alloc_table_chained(struct sg_table *table, int nents,
struct scatterlist *first_chunk)
 {
int ret;
@@ -606,9 +606,9 @@ static int scsi_alloc_sgtable(struct sg_table *table, int 
nents,
}
 
ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
-  first_chunk, GFP_ATOMIC, scsi_sg_alloc);
+  first_chunk, GFP_ATOMIC, sg_pool_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(table, (bool)first_chunk);
+   sg_free_table_chained(table, (bool)first_chunk);
return ret;
 }
 
@@ -627,14 +627,14 @@ static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
struct scsi_data_buffer *sdb;
 
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb.table, true);
+   sg_free_table_chained(>sdb.table, true);
if (cmd->request->next_rq) {
sdb = cmd->request->next_rq->special;
if (sdb)
-   scsi_free_sgtable(>table, true);
+   sg_free_table_chained(>table, true);
}
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(>prot_sdb->table, true);
+   sg_free_table_chained(>prot_sdb->table, true);
 }
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
@@ -673,19 +673,19 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb.table, false);
+   sg_free_table_chained(>sdb.table, false);
 
memset(>sdb, 0, sizeof(cmd->sdb));
 
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(>prot_sdb->table, false);
+   sg_free_table_chained(>prot_sdb->table, false);
 }
 
 static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-   scsi_free_sgtable(_sdb->table, false);
+   sg_free_table_chained(_sdb->table, false);
kmem_cache_free

[PATCH v2 5/5] lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c

2016-03-22 Thread Ming Lin

From: Ming Lin 

Now it's ready to move the mempool based SG chained allocator code from
SCSI driver to lib/sg_pool.c, which will be compiled only based on a Kconfig
symbol CONFIG_SG_POOL.

SCSI selects CONFIG_SG_POOL.

Signed-off-by: Ming Lin 
---
 drivers/scsi/Kconfig|   1 +
 drivers/scsi/scsi_lib.c | 137 ---
 include/linux/scatterlist.h |  25 +++
 include/scsi/scsi.h |  19 -
 lib/Kconfig |   7 ++
 lib/Makefile|   1 +
 lib/sg_pool.c   | 172 
 7 files changed, 206 insertions(+), 156 deletions(-)
 create mode 100644 lib/sg_pool.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index e2f31c9..ee9cf41 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -17,6 +17,7 @@ config SCSI
tristate "SCSI device support"
depends on BLOCK
select SCSI_DMA if HAS_DMA
+   select SG_POOL
---help---
  If you want to use a SCSI hard disk, SCSI tape drive, SCSI CD-ROM or
  any other SCSI device under Linux, say Y and make sure that you know
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 478b0e8..6d818e8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -14,8 +14,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -40,39 +38,6 @@
 #include "scsi_logging.h"
 
 
-#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
-#define SG_MEMPOOL_SIZE2
-
-struct sg_pool {
-   size_t  size;
-   char*name;
-   struct kmem_cache   *slab;
-   mempool_t   *pool;
-};
-
-#define SP(x) { .size = x, "sgpool-" __stringify(x) }
-#if (SG_CHUNK_SIZE < 32)
-#error SG_CHUNK_SIZE is too small (must be 32 or greater)
-#endif
-static struct sg_pool sg_pools[] = {
-   SP(8),
-   SP(16),
-#if (SG_CHUNK_SIZE > 32)
-   SP(32),
-#if (SG_CHUNK_SIZE > 64)
-   SP(64),
-#if (SG_CHUNK_SIZE > 128)
-   SP(128),
-#if (SG_CHUNK_SIZE > 256)
-#error SG_CHUNK_SIZE is too large (256 MAX)
-#endif
-#endif
-#endif
-#endif
-   SP(SG_CHUNK_SIZE)
-};
-#undef SP
-
 struct kmem_cache *scsi_sdb_cache;
 
 /*
@@ -553,65 +518,6 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int sg_pool_index(unsigned short nents)
-{
-   unsigned int index;
-
-   BUG_ON(nents > SG_CHUNK_SIZE);
-
-   if (nents <= 8)
-   index = 0;
-   else
-   index = get_count_order(nents) - 3;
-
-   return index;
-}
-
-static void sg_pool_free(struct scatterlist *sgl, unsigned int nents)
-{
-   struct sg_pool *sgp;
-
-   sgp = sg_pools + sg_pool_index(nents);
-   mempool_free(sgl, sgp->pool);
-}
-
-static struct scatterlist *sg_pool_alloc(unsigned int nents, gfp_t gfp_mask)
-{
-   struct sg_pool *sgp;
-
-   sgp = sg_pools + sg_pool_index(nents);
-   return mempool_alloc(sgp->pool, gfp_mask);
-}
-
-static void sg_free_table_chained(struct sg_table *table, bool first_chunk)
-{
-   if (first_chunk && table->orig_nents <= SG_CHUNK_SIZE)
-   return;
-   __sg_free_table(table, SG_CHUNK_SIZE, first_chunk, sg_pool_free);
-}
-
-static int sg_alloc_table_chained(struct sg_table *table, int nents,
-   struct scatterlist *first_chunk)
-{
-   int ret;
-
-   BUG_ON(!nents);
-
-   if (first_chunk) {
-   if (nents <= SG_CHUNK_SIZE) {
-   table->nents = table->orig_nents = nents;
-   sg_init_table(table->sgl, nents);
-   return 0;
-   }
-   }
-
-   ret = __sg_alloc_table(table, nents, SG_CHUNK_SIZE,
-  first_chunk, GFP_ATOMIC, sg_pool_alloc);
-   if (unlikely(ret))
-   sg_free_table_chained(table, (bool)first_chunk);
-   return ret;
-}
-
 static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 {
if (cmd->request->cmd_type == REQ_TYPE_FS) {
@@ -2269,8 +2175,6 @@ EXPORT_SYMBOL(scsi_unblock_requests);
 
 int __init scsi_init_queue(void)
 {
-   int i;
-
scsi_sdb_cache = kmem_cache_create("scsi_data_buffer",
   sizeof(struct scsi_data_buffer),
   0, 0, NULL);
@@ -2279,53 +2183,12 @@ int __init scsi_init_queue(void)
return -ENOMEM;
}
 
-   for (i = 0; i < SG_MEMPOOL_NR; i++) {
-   struct sg_pool *sgp = sg_pools + i;
-   int size = sgp->size * sizeof(struct scatterlist);
-
-   sgp->slab = kmem_cache_create(sgp->name, size, 0,
-   SLAB_HWCACHE_ALIGN, NULL);
-   if (!sgp->slab) {
-   printk(KERN_

[PATCH v2 3/5] scsi: rename SG related struct and functions

2016-03-22 Thread Ming Lin

From: Ming Lin 

Rename SCSI specific struct and functions to more genenic names.

Signed-off-by: Ming Lin 
---
 drivers/scsi/scsi_lib.c | 52 -
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ab3dd09..5fd9dd7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -40,10 +40,10 @@
 #include "scsi_logging.h"
 
 
-#define SG_MEMPOOL_NR  ARRAY_SIZE(scsi_sg_pools)
+#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
 #define SG_MEMPOOL_SIZE2
 
-struct scsi_host_sg_pool {
+struct sg_pool {
size_t  size;
char*name;
struct kmem_cache   *slab;
@@ -54,7 +54,7 @@ struct scsi_host_sg_pool {
 #if (SCSI_MAX_SG_SEGMENTS < 32)
 #error SCSI_MAX_SG_SEGMENTS is too small (must be 32 or greater)
 #endif
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
+static struct sg_pool sg_pools[] = {
SP(8),
SP(16),
 #if (SCSI_MAX_SG_SEGMENTS > 32)
@@ -553,7 +553,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int scsi_sgtable_index(unsigned short nents)
+static inline unsigned int sg_pool_index(unsigned short nents)
 {
unsigned int index;
 
@@ -567,30 +567,30 @@ static inline unsigned int scsi_sgtable_index(unsigned 
short nents)
return index;
 }
 
-static void scsi_sg_free(struct scatterlist *sgl, unsigned int nents)
+static void sg_pool_free(struct scatterlist *sgl, unsigned int nents)
 {
-   struct scsi_host_sg_pool *sgp;
+   struct sg_pool *sgp;
 
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
+   sgp = sg_pools + sg_pool_index(nents);
mempool_free(sgl, sgp->pool);
 }
 
-static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
+static struct scatterlist *sg_pool_alloc(unsigned int nents, gfp_t gfp_mask)
 {
-   struct scsi_host_sg_pool *sgp;
+   struct sg_pool *sgp;
 
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
+   sgp = sg_pools + sg_pool_index(nents);
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct sg_table *table, bool first_chunk)
+static void sg_free_table_chained(struct sg_table *table, bool first_chunk)
 {
if (first_chunk && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, sg_pool_free);
 }
 
-static int scsi_alloc_sgtable(struct sg_table *table, int nents,
+static int sg_alloc_table_chained(struct sg_table *table, int nents,
struct scatterlist *first_chunk)
 {
int ret;
@@ -606,9 +606,9 @@ static int scsi_alloc_sgtable(struct sg_table *table, int 
nents,
}
 
ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
-  first_chunk, GFP_ATOMIC, scsi_sg_alloc);
+  first_chunk, GFP_ATOMIC, sg_pool_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(table, (bool)first_chunk);
+   sg_free_table_chained(table, (bool)first_chunk);
return ret;
 }
 
@@ -627,14 +627,14 @@ static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
struct scsi_data_buffer *sdb;
 
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb.table, true);
+   sg_free_table_chained(>sdb.table, true);
if (cmd->request->next_rq) {
sdb = cmd->request->next_rq->special;
if (sdb)
-   scsi_free_sgtable(>table, true);
+   sg_free_table_chained(>table, true);
}
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(>prot_sdb->table, true);
+   sg_free_table_chained(>prot_sdb->table, true);
 }
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
@@ -673,19 +673,19 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
if (cmd->sdb.table.nents)
-   scsi_free_sgtable(>sdb.table, false);
+   sg_free_table_chained(>sdb.table, false);
 
memset(>sdb, 0, sizeof(cmd->sdb));
 
if (scsi_prot_sg_count(cmd))
-   scsi_free_sgtable(>prot_sdb->table, false);
+   sg_free_table_chained(>prot_sdb->table, false);
 }
 
 static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-   scsi_free_sgtable(_sdb->table, false);
+   sg_free_table_chained(_sdb->table, false);
kmem_cache_free(scsi_sdb_cache, bidi_sdb);
cmd->request->

[PATCH v2 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-03-22 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
we fit into a single scatterlist chunk.

Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.

Will move these 2 generic definitions to scatterlist.h later.

Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/ata/pata_icside.c   |  2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |  4 ++--
 drivers/scsi/arm/cumana_2.c |  2 +-
 drivers/scsi/arm/eesox.c|  2 +-
 drivers/scsi/arm/powertec.c |  2 +-
 drivers/scsi/esas2r/esas2r_main.c   |  4 ++--
 drivers/scsi/hisi_sas/hisi_sas.h|  2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c |  4 ++--
 drivers/scsi/mpt3sas/mpt3sas_base.h |  2 +-
 drivers/scsi/scsi_debug.c   |  2 +-
 drivers/scsi/scsi_lib.c | 34 +-
 drivers/usb/storage/scsiglue.c  |  2 +-
 include/scsi/scsi.h |  8 
 include/scsi/scsi_host.h|  2 +-
 14 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/ata/pata_icside.c b/drivers/ata/pata_icside.c
index d7c7320..188f2f2 100644
--- a/drivers/ata/pata_icside.c
+++ b/drivers/ata/pata_icside.c
@@ -294,7 +294,7 @@ static int icside_dma_init(struct pata_icside_info *info)
 
 static struct scsi_host_template pata_icside_sht = {
ATA_BASE_SHT(DRV_NAME),
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
 };
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 60b169a..b428b48 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -81,7 +81,7 @@ MODULE_PARM_DESC(cmd_sg_entries,
 
 module_param(indirect_sg_entries, uint, 0444);
 MODULE_PARM_DESC(indirect_sg_entries,
-"Default max number of gather/scatter entries (default is 12, 
max is " __stringify(SCSI_MAX_SG_CHAIN_SEGMENTS) ")");
+"Default max number of gather/scatter entries (default is 12, 
max is " __stringify(SG_MAX_SEGMENTS) ")");
 
 module_param(allow_ext_sg, bool, 0444);
 MODULE_PARM_DESC(allow_ext_sg,
@@ -3129,7 +3129,7 @@ static int srp_parse_options(const char *buf, struct 
srp_target_port *target)
 
case SRP_OPT_SG_TABLESIZE:
if (match_int(args, ) || token < 1 ||
-   token > SCSI_MAX_SG_CHAIN_SEGMENTS) {
+   token > SG_MAX_SEGMENTS) {
pr_warn("bad max sg_tablesize parameter '%s'\n",
p);
goto out;
diff --git a/drivers/scsi/arm/cumana_2.c b/drivers/scsi/arm/cumana_2.c
index faa1bee..edce5f3 100644
--- a/drivers/scsi/arm/cumana_2.c
+++ b/drivers/scsi/arm/cumana_2.c
@@ -365,7 +365,7 @@ static struct scsi_host_template cumanascsi2_template = {
.eh_abort_handler   = fas216_eh_abort,
.can_queue  = 1,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = "cumanascsi2",
diff --git a/drivers/scsi/arm/eesox.c b/drivers/scsi/arm/eesox.c
index a8ad688..e93e047 100644
--- a/drivers/scsi/arm/eesox.c
+++ b/drivers/scsi/arm/eesox.c
@@ -484,7 +484,7 @@ static struct scsi_host_template eesox_template = {
.eh_abort_handler   = fas216_eh_abort,
.can_queue  = 1,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = "eesox",
diff --git a/drivers/scsi/arm/powertec.c b/drivers/scsi/arm/powertec.c
index 5e1b73e..79aa889 100644
--- a/drivers/scsi/arm/powertec.c
+++ b/drivers/scsi/arm/powertec.c
@@ -291,7 +291,7 @@ static struct scsi_host_template powertecscsi_template = {
 
.can_queue  = 8,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.cmd_per_lun= 2,
.use_clustering = ENABLE_CLUSTERING,
diff --git a/drivers/scsi/esas2r/esas2r_main.c 
b/drivers/scsi/esas2r/esas2r_main.c
index 33581ba..2aca4d1 100644
--- a/drivers/scsi/esa

[PATCH v2 2/5] scsi: replace "mq" with "first_chunk" in SG functions

2016-03-22 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Parameter "bool mq" is block driver specific.
Change it to "first_chunk" to make it more generic.

Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/scsi_lib.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5eaddc7..ab3dd09 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -583,33 +583,32 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct sg_table *table, bool mq)
+static void scsi_free_sgtable(struct sg_table *table, bool first_chunk)
 {
-   if (mq && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
+   if (first_chunk && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, scsi_sg_free);
 }
 
-static int scsi_alloc_sgtable(struct sg_table *table, int nents, bool mq)
+static int scsi_alloc_sgtable(struct sg_table *table, int nents,
+   struct scatterlist *first_chunk)
 {
-   struct scatterlist *first_chunk = NULL;
int ret;
 
BUG_ON(!nents);
 
-   if (mq) {
+   if (first_chunk) {
if (nents <= SCSI_MAX_SG_SEGMENTS) {
table->nents = table->orig_nents = nents;
sg_init_table(table->sgl, nents);
return 0;
}
-   first_chunk = table->sgl;
}
 
ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
   first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(table, mq);
+   scsi_free_sgtable(table, (bool)first_chunk);
return ret;
 }
 
@@ -1091,7 +1090,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
 * If sg table allocation fails, requeue request later.
 */
if (unlikely(scsi_alloc_sgtable(>table, req->nr_phys_segments,
-   req->mq_ctx != NULL)))
+   sdb->table.sgl)))
return BLKPREP_DEFER;
 
/* 
@@ -1163,7 +1162,8 @@ int scsi_init_io(struct scsi_cmnd *cmd)
 
ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-   if (scsi_alloc_sgtable(_sdb->table, ivecs, is_mq)) {
+   if (scsi_alloc_sgtable(_sdb->table, ivecs,
+   prot_sdb->table.sgl)) {
error = BLKPREP_DEFER;
goto err_exit;
}
-- 
1.9.1

[PATCH v2 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-03-22 Thread Ming Lin

From: Ming Lin 

Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
we fit into a single scatterlist chunk.

Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.

Will move these 2 generic definitions to scatterlist.h later.

Signed-off-by: Ming Lin 
---
 drivers/ata/pata_icside.c   |  2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |  4 ++--
 drivers/scsi/arm/cumana_2.c |  2 +-
 drivers/scsi/arm/eesox.c|  2 +-
 drivers/scsi/arm/powertec.c |  2 +-
 drivers/scsi/esas2r/esas2r_main.c   |  4 ++--
 drivers/scsi/hisi_sas/hisi_sas.h|  2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c |  4 ++--
 drivers/scsi/mpt3sas/mpt3sas_base.h |  2 +-
 drivers/scsi/scsi_debug.c   |  2 +-
 drivers/scsi/scsi_lib.c | 34 +-
 drivers/usb/storage/scsiglue.c  |  2 +-
 include/scsi/scsi.h |  8 
 include/scsi/scsi_host.h|  2 +-
 14 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/ata/pata_icside.c b/drivers/ata/pata_icside.c
index d7c7320..188f2f2 100644
--- a/drivers/ata/pata_icside.c
+++ b/drivers/ata/pata_icside.c
@@ -294,7 +294,7 @@ static int icside_dma_init(struct pata_icside_info *info)
 
 static struct scsi_host_template pata_icside_sht = {
ATA_BASE_SHT(DRV_NAME),
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
 };
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 60b169a..b428b48 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -81,7 +81,7 @@ MODULE_PARM_DESC(cmd_sg_entries,
 
 module_param(indirect_sg_entries, uint, 0444);
 MODULE_PARM_DESC(indirect_sg_entries,
-"Default max number of gather/scatter entries (default is 12, 
max is " __stringify(SCSI_MAX_SG_CHAIN_SEGMENTS) ")");
+"Default max number of gather/scatter entries (default is 12, 
max is " __stringify(SG_MAX_SEGMENTS) ")");
 
 module_param(allow_ext_sg, bool, 0444);
 MODULE_PARM_DESC(allow_ext_sg,
@@ -3129,7 +3129,7 @@ static int srp_parse_options(const char *buf, struct 
srp_target_port *target)
 
case SRP_OPT_SG_TABLESIZE:
if (match_int(args, ) || token < 1 ||
-   token > SCSI_MAX_SG_CHAIN_SEGMENTS) {
+   token > SG_MAX_SEGMENTS) {
pr_warn("bad max sg_tablesize parameter '%s'\n",
p);
goto out;
diff --git a/drivers/scsi/arm/cumana_2.c b/drivers/scsi/arm/cumana_2.c
index faa1bee..edce5f3 100644
--- a/drivers/scsi/arm/cumana_2.c
+++ b/drivers/scsi/arm/cumana_2.c
@@ -365,7 +365,7 @@ static struct scsi_host_template cumanascsi2_template = {
.eh_abort_handler   = fas216_eh_abort,
.can_queue  = 1,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = "cumanascsi2",
diff --git a/drivers/scsi/arm/eesox.c b/drivers/scsi/arm/eesox.c
index a8ad688..e93e047 100644
--- a/drivers/scsi/arm/eesox.c
+++ b/drivers/scsi/arm/eesox.c
@@ -484,7 +484,7 @@ static struct scsi_host_template eesox_template = {
.eh_abort_handler   = fas216_eh_abort,
.can_queue  = 1,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = "eesox",
diff --git a/drivers/scsi/arm/powertec.c b/drivers/scsi/arm/powertec.c
index 5e1b73e..79aa889 100644
--- a/drivers/scsi/arm/powertec.c
+++ b/drivers/scsi/arm/powertec.c
@@ -291,7 +291,7 @@ static struct scsi_host_template powertecscsi_template = {
 
.can_queue  = 8,
.this_id= 7,
-   .sg_tablesize   = SCSI_MAX_SG_CHAIN_SEGMENTS,
+   .sg_tablesize   = SG_MAX_SEGMENTS,
.dma_boundary   = IOMD_DMA_BOUNDARY,
.cmd_per_lun= 2,
.use_clustering = ENABLE_CLUSTERING,
diff --git a/drivers/scsi/esas2r/esas2r_main.c 
b/drivers/scsi/esas2r/esas2r_main.c
index 33581ba..2aca4d1 100644
--- a/drivers/scsi/esas2r/esas2r_main.c
+++ b/drivers/scsi/esas2r/esa

[PATCH v2 2/5] scsi: replace "mq" with "first_chunk" in SG functions

2016-03-22 Thread Ming Lin

From: Ming Lin 

Parameter "bool mq" is block driver specific.
Change it to "first_chunk" to make it more generic.

Signed-off-by: Ming Lin 
---
 drivers/scsi/scsi_lib.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5eaddc7..ab3dd09 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -583,33 +583,32 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct sg_table *table, bool mq)
+static void scsi_free_sgtable(struct sg_table *table, bool first_chunk)
 {
-   if (mq && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
+   if (first_chunk && table->orig_nents <= SCSI_MAX_SG_SEGMENTS)
return;
-   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   __sg_free_table(table, SCSI_MAX_SG_SEGMENTS, first_chunk, scsi_sg_free);
 }
 
-static int scsi_alloc_sgtable(struct sg_table *table, int nents, bool mq)
+static int scsi_alloc_sgtable(struct sg_table *table, int nents,
+   struct scatterlist *first_chunk)
 {
-   struct scatterlist *first_chunk = NULL;
int ret;
 
BUG_ON(!nents);
 
-   if (mq) {
+   if (first_chunk) {
if (nents <= SCSI_MAX_SG_SEGMENTS) {
table->nents = table->orig_nents = nents;
sg_init_table(table->sgl, nents);
return 0;
}
-   first_chunk = table->sgl;
}
 
ret = __sg_alloc_table(table, nents, SCSI_MAX_SG_SEGMENTS,
   first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
-   scsi_free_sgtable(table, mq);
+   scsi_free_sgtable(table, (bool)first_chunk);
return ret;
 }
 
@@ -1091,7 +1090,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
 * If sg table allocation fails, requeue request later.
 */
if (unlikely(scsi_alloc_sgtable(>table, req->nr_phys_segments,
-   req->mq_ctx != NULL)))
+   sdb->table.sgl)))
return BLKPREP_DEFER;
 
/* 
@@ -1163,7 +1162,8 @@ int scsi_init_io(struct scsi_cmnd *cmd)
 
ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-   if (scsi_alloc_sgtable(_sdb->table, ivecs, is_mq)) {
+   if (scsi_alloc_sgtable(_sdb->table, ivecs,
+   prot_sdb->table.sgl)) {
error = BLKPREP_DEFER;
goto err_exit;
}
-- 
1.9.1

Re: [PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-03-21 Thread Ming Lin

On Wed, 2016-03-16 at 09:23 +0100, Christoph Hellwig wrote:
> > 
> We can defintively kill this one.

We want to support different size of pools.
How can we kill this one?

Or did you mean we just create a single pool with size SG_CHUNK_SIZE?

> 
> > +static __init int sg_mempool_init(void)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < SG_MEMPOOL_NR; i++) {
> > +   struct sg_mempool *sgp = sg_pools + i;
> > +   int size = sgp->size * sizeof(struct scatterlist);
> > +
> > +   sgp->slab = kmem_cache_create(sgp->name, size, 0,
> > +   SLAB_HWCACHE_ALIGN, NULL);
> 
> Having these mempoools around in every kernel will make some embedded
> developers rather unhappy.  We could either not create them at
> runtime, which would require either a check in the fast path, or
> an init call in every driver, or just move the functions you
> added into a separe file, which will be compiled only based on a
> Kconfig
> symbol, and could even be potentially modular.  I think that
> second option might be easier.

I created lib/sg_pool.c with CONFIG_SG_POOL.

Re: [PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-03-21 Thread Ming Lin

On Wed, 2016-03-16 at 09:23 +0100, Christoph Hellwig wrote:
> > 
> We can defintively kill this one.

We want to support different size of pools.
How can we kill this one?

Or did you mean we just create a single pool with size SG_CHUNK_SIZE?

> 
> > +static __init int sg_mempool_init(void)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < SG_MEMPOOL_NR; i++) {
> > +   struct sg_mempool *sgp = sg_pools + i;
> > +   int size = sgp->size * sizeof(struct scatterlist);
> > +
> > +   sgp->slab = kmem_cache_create(sgp->name, size, 0,
> > +   SLAB_HWCACHE_ALIGN, NULL);
> 
> Having these mempoools around in every kernel will make some embedded
> developers rather unhappy.  We could either not create them at
> runtime, which would require either a check in the fast path, or
> an init call in every driver, or just move the functions you
> added into a separe file, which will be compiled only based on a
> Kconfig
> symbol, and could even be potentially modular.  I think that
> second option might be easier.

I created lib/sg_pool.c with CONFIG_SG_POOL.

Re: [PATCH RFC 0/2] mempool based chained scatterlist alloc/free api api

2016-03-15 Thread Ming Lin

On Tue, 2016-03-15 at 16:12 -0700, James Bottomley wrote:
> On Tue, 2016-03-15 at 15:39 -0700, Ming Lin wrote:
> > From: Ming Lin <min...@ssi.samsung.com>
> > 
> > Hi list,
> > 
> > This moves the mempool based chained scatterlist alloc/free code
> > from
> > scsi_lib.c to lib/scatterlist.c.
> > 
> > So other drivers(for example, the under development NVMe over
> > fabric 
> > drivers) can also use it.
> > 
> > Ming Lin (2):
> >   scatterlist: add mempool based chained SG alloc/free api
> >   scsi: use the new chained SG api
> > 
> >  drivers/scsi/scsi_lib.c | 129 ++
> > --
> >  include/linux/scatterlist.h |  12 
> >  lib/scatterlist.c   | 156
> > 
> >  3 files changed, 175 insertions(+), 122 deletions(-)
> 
> I'd really rather this were a single patch so git can tell us the
> code
> motion.  If you add in one patch and remove in another the code
> motion
> trackers don't see it.
> 
> Secondly, you said "This copied code from scsi_lib.c to scatterlist.c
> and modified it a bit" could you move in one patch and modify in
> another, so we can see exactly what you're changing.

The modification is mostly about structure names and function names
changes.

I can do it in a single patch.

> 
> Thirdly, are you sure the pool structure for NVMe should be the same
> as
> for SCSI?  We don't do buddy pools for 1,2 or 4 entry transactions in
> SCSI just basically because of heuristics, but the packetised io
> characteristics of NVMe make single entry lists more likely for it,
> don't they?

Not sure about this, but the nvme-pci driver may not use this api,
because it also has a PRP lists except for the SG lists.

But nvme-over-rdma/nvme-over-fiber-channel driver is good to use this
api.

> 
> James
> 
>

Re: [PATCH RFC 0/2] mempool based chained scatterlist alloc/free api api

2016-03-15 Thread Ming Lin

On Tue, 2016-03-15 at 16:12 -0700, James Bottomley wrote:
> On Tue, 2016-03-15 at 15:39 -0700, Ming Lin wrote:
> > From: Ming Lin 
> > 
> > Hi list,
> > 
> > This moves the mempool based chained scatterlist alloc/free code
> > from
> > scsi_lib.c to lib/scatterlist.c.
> > 
> > So other drivers(for example, the under development NVMe over
> > fabric 
> > drivers) can also use it.
> > 
> > Ming Lin (2):
> >   scatterlist: add mempool based chained SG alloc/free api
> >   scsi: use the new chained SG api
> > 
> >  drivers/scsi/scsi_lib.c | 129 ++
> > --
> >  include/linux/scatterlist.h |  12 
> >  lib/scatterlist.c   | 156
> > 
> >  3 files changed, 175 insertions(+), 122 deletions(-)
> 
> I'd really rather this were a single patch so git can tell us the
> code
> motion.  If you add in one patch and remove in another the code
> motion
> trackers don't see it.
> 
> Secondly, you said "This copied code from scsi_lib.c to scatterlist.c
> and modified it a bit" could you move in one patch and modify in
> another, so we can see exactly what you're changing.

The modification is mostly about structure names and function names
changes.

I can do it in a single patch.

> 
> Thirdly, are you sure the pool structure for NVMe should be the same
> as
> for SCSI?  We don't do buddy pools for 1,2 or 4 entry transactions in
> SCSI just basically because of heuristics, but the packetised io
> characteristics of NVMe make single entry lists more likely for it,
> don't they?

Not sure about this, but the nvme-pci driver may not use this api,
because it also has a PRP lists except for the SG lists.

But nvme-over-rdma/nvme-over-fiber-channel driver is good to use this
api.

> 
> James
> 
>

[PATCH RFC 0/2] mempool based chained scatterlist alloc/free api api

2016-03-15 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

Hi list,

This moves the mempool based chained scatterlist alloc/free code from
scsi_lib.c to lib/scatterlist.c.

So other drivers(for example, the under development NVMe over fabric drivers)
can also use it.

Ming Lin (2):
  scatterlist: add mempool based chained SG alloc/free api
  scsi: use the new chained SG api

 drivers/scsi/scsi_lib.c | 129 ++--
 include/linux/scatterlist.h |  12 
 lib/scatterlist.c   | 156 
 3 files changed, 175 insertions(+), 122 deletions(-)

-- 
1.9.1

[PATCH RFC 2/2] scsi: use the new chained SG api

2016-03-15 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

This removes the old code and uses the new chained SG alloc/free api.

Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 drivers/scsi/scsi_lib.c | 129 +++-
 1 file changed, 7 insertions(+), 122 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8c6e318..97e283c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -39,40 +39,6 @@
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
-
-#define SG_MEMPOOL_NR  ARRAY_SIZE(scsi_sg_pools)
-#define SG_MEMPOOL_SIZE2
-
-struct scsi_host_sg_pool {
-   size_t  size;
-   char*name;
-   struct kmem_cache   *slab;
-   mempool_t   *pool;
-};
-
-#define SP(x) { .size = x, "sgpool-" __stringify(x) }
-#if (SCSI_MAX_SG_SEGMENTS < 32)
-#error SCSI_MAX_SG_SEGMENTS is too small (must be 32 or greater)
-#endif
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
-   SP(8),
-   SP(16),
-#if (SCSI_MAX_SG_SEGMENTS > 32)
-   SP(32),
-#if (SCSI_MAX_SG_SEGMENTS > 64)
-   SP(64),
-#if (SCSI_MAX_SG_SEGMENTS > 128)
-   SP(128),
-#if (SCSI_MAX_SG_SEGMENTS > 256)
-#error SCSI_MAX_SG_SEGMENTS is too large (256 MAX)
-#endif
-#endif
-#endif
-#endif
-   SP(SCSI_MAX_SG_SEGMENTS)
-};
-#undef SP
-
 struct kmem_cache *scsi_sdb_cache;
 
 /*
@@ -553,61 +519,23 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int scsi_sgtable_index(unsigned short nents)
-{
-   unsigned int index;
-
-   BUG_ON(nents > SCSI_MAX_SG_SEGMENTS);
-
-   if (nents <= 8)
-   index = 0;
-   else
-   index = get_count_order(nents) - 3;
-
-   return index;
-}
-
-static void scsi_sg_free(struct scatterlist *sgl, unsigned int nents)
-{
-   struct scsi_host_sg_pool *sgp;
-
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
-   mempool_free(sgl, sgp->pool);
-}
-
-static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
-{
-   struct scsi_host_sg_pool *sgp;
-
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
-   return mempool_alloc(sgp->pool, gfp_mask);
-}
-
 static void scsi_free_sgtable(struct scsi_data_buffer *sdb, bool mq)
 {
-   if (mq && sdb->table.orig_nents <= SCSI_MAX_SG_SEGMENTS)
-   return;
-   __sg_free_table(>table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   sg_free_chained(>table, mq);
 }
 
 static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents, bool mq)
 {
-   struct scatterlist *first_chunk = NULL;
+   struct scatterlist *first_chunk;
int ret;
 
-   BUG_ON(!nents);
-
-   if (mq) {
-   if (nents <= SCSI_MAX_SG_SEGMENTS) {
-   sdb->table.nents = sdb->table.orig_nents = nents;
-   sg_init_table(sdb->table.sgl, nents);
-   return 0;
-   }
+   if (mq)
first_chunk = sdb->table.sgl;
-   }
+   else
+   first_chunk = NULL;
+
+   ret = sg_alloc_chained(>table, nents, first_chunk, GFP_ATOMIC);
 
-   ret = __sg_alloc_table(>table, nents, SCSI_MAX_SG_SEGMENTS,
-  first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
scsi_free_sgtable(sdb, mq);
return ret;
@@ -2264,8 +2192,6 @@ EXPORT_SYMBOL(scsi_unblock_requests);
 
 int __init scsi_init_queue(void)
 {
-   int i;
-
scsi_sdb_cache = kmem_cache_create("scsi_data_buffer",
   sizeof(struct scsi_data_buffer),
   0, 0, NULL);
@@ -2274,53 +2200,12 @@ int __init scsi_init_queue(void)
return -ENOMEM;
}
 
-   for (i = 0; i < SG_MEMPOOL_NR; i++) {
-   struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-   int size = sgp->size * sizeof(struct scatterlist);
-
-   sgp->slab = kmem_cache_create(sgp->name, size, 0,
-   SLAB_HWCACHE_ALIGN, NULL);
-   if (!sgp->slab) {
-   printk(KERN_ERR "SCSI: can't init sg slab %s\n",
-   sgp->name);
-   goto cleanup_sdb;
-   }
-
-   sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
-sgp->slab);
-   if (!sgp->pool) {
-   printk(KERN_ERR "SCSI: can't init sg mempool %s\n",
-   sgp->name);
-   goto cleanup_sdb;
-   }
-   }
-
return 0;
-
-cleanup_sdb:
-   for (i = 0;

[PATCH RFC 0/2] mempool based chained scatterlist alloc/free api api

2016-03-15 Thread Ming Lin

From: Ming Lin 

Hi list,

This moves the mempool based chained scatterlist alloc/free code from
scsi_lib.c to lib/scatterlist.c.

So other drivers(for example, the under development NVMe over fabric drivers)
can also use it.

Ming Lin (2):
  scatterlist: add mempool based chained SG alloc/free api
  scsi: use the new chained SG api

 drivers/scsi/scsi_lib.c | 129 ++--
 include/linux/scatterlist.h |  12 
 lib/scatterlist.c   | 156 
 3 files changed, 175 insertions(+), 122 deletions(-)

-- 
1.9.1

[PATCH RFC 2/2] scsi: use the new chained SG api

2016-03-15 Thread Ming Lin

From: Ming Lin 

This removes the old code and uses the new chained SG alloc/free api.

Signed-off-by: Ming Lin 
---
 drivers/scsi/scsi_lib.c | 129 +++-
 1 file changed, 7 insertions(+), 122 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8c6e318..97e283c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -39,40 +39,6 @@
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
-
-#define SG_MEMPOOL_NR  ARRAY_SIZE(scsi_sg_pools)
-#define SG_MEMPOOL_SIZE2
-
-struct scsi_host_sg_pool {
-   size_t  size;
-   char*name;
-   struct kmem_cache   *slab;
-   mempool_t   *pool;
-};
-
-#define SP(x) { .size = x, "sgpool-" __stringify(x) }
-#if (SCSI_MAX_SG_SEGMENTS < 32)
-#error SCSI_MAX_SG_SEGMENTS is too small (must be 32 or greater)
-#endif
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
-   SP(8),
-   SP(16),
-#if (SCSI_MAX_SG_SEGMENTS > 32)
-   SP(32),
-#if (SCSI_MAX_SG_SEGMENTS > 64)
-   SP(64),
-#if (SCSI_MAX_SG_SEGMENTS > 128)
-   SP(128),
-#if (SCSI_MAX_SG_SEGMENTS > 256)
-#error SCSI_MAX_SG_SEGMENTS is too large (256 MAX)
-#endif
-#endif
-#endif
-#endif
-   SP(SCSI_MAX_SG_SEGMENTS)
-};
-#undef SP
-
 struct kmem_cache *scsi_sdb_cache;
 
 /*
@@ -553,61 +519,23 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
 }
 
-static inline unsigned int scsi_sgtable_index(unsigned short nents)
-{
-   unsigned int index;
-
-   BUG_ON(nents > SCSI_MAX_SG_SEGMENTS);
-
-   if (nents <= 8)
-   index = 0;
-   else
-   index = get_count_order(nents) - 3;
-
-   return index;
-}
-
-static void scsi_sg_free(struct scatterlist *sgl, unsigned int nents)
-{
-   struct scsi_host_sg_pool *sgp;
-
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
-   mempool_free(sgl, sgp->pool);
-}
-
-static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
-{
-   struct scsi_host_sg_pool *sgp;
-
-   sgp = scsi_sg_pools + scsi_sgtable_index(nents);
-   return mempool_alloc(sgp->pool, gfp_mask);
-}
-
 static void scsi_free_sgtable(struct scsi_data_buffer *sdb, bool mq)
 {
-   if (mq && sdb->table.orig_nents <= SCSI_MAX_SG_SEGMENTS)
-   return;
-   __sg_free_table(>table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
+   sg_free_chained(>table, mq);
 }
 
 static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents, bool mq)
 {
-   struct scatterlist *first_chunk = NULL;
+   struct scatterlist *first_chunk;
int ret;
 
-   BUG_ON(!nents);
-
-   if (mq) {
-   if (nents <= SCSI_MAX_SG_SEGMENTS) {
-   sdb->table.nents = sdb->table.orig_nents = nents;
-   sg_init_table(sdb->table.sgl, nents);
-   return 0;
-   }
+   if (mq)
first_chunk = sdb->table.sgl;
-   }
+   else
+   first_chunk = NULL;
+
+   ret = sg_alloc_chained(>table, nents, first_chunk, GFP_ATOMIC);
 
-   ret = __sg_alloc_table(>table, nents, SCSI_MAX_SG_SEGMENTS,
-  first_chunk, GFP_ATOMIC, scsi_sg_alloc);
if (unlikely(ret))
scsi_free_sgtable(sdb, mq);
return ret;
@@ -2264,8 +2192,6 @@ EXPORT_SYMBOL(scsi_unblock_requests);
 
 int __init scsi_init_queue(void)
 {
-   int i;
-
scsi_sdb_cache = kmem_cache_create("scsi_data_buffer",
   sizeof(struct scsi_data_buffer),
   0, 0, NULL);
@@ -2274,53 +2200,12 @@ int __init scsi_init_queue(void)
return -ENOMEM;
}
 
-   for (i = 0; i < SG_MEMPOOL_NR; i++) {
-   struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-   int size = sgp->size * sizeof(struct scatterlist);
-
-   sgp->slab = kmem_cache_create(sgp->name, size, 0,
-   SLAB_HWCACHE_ALIGN, NULL);
-   if (!sgp->slab) {
-   printk(KERN_ERR "SCSI: can't init sg slab %s\n",
-   sgp->name);
-   goto cleanup_sdb;
-   }
-
-   sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
-sgp->slab);
-   if (!sgp->pool) {
-   printk(KERN_ERR "SCSI: can't init sg mempool %s\n",
-   sgp->name);
-   goto cleanup_sdb;
-   }
-   }
-
return 0;
-
-cleanup_sdb:
-   for (i = 0; i < SG_MEMPOOL_NR; i++) {
-

[PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-03-15 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

This copied code from scsi_lib.c to scatterlist.c and
modified it a bit.

Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 include/linux/scatterlist.h |  12 
 lib/scatterlist.c   | 156 
 2 files changed, 168 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 556ec1e..888f2c3 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -266,6 +266,10 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
unsigned long offset, unsigned long size,
gfp_t gfp_mask);
 
+void sg_free_chained(struct sg_table *table, bool first_chunk);
+int sg_alloc_chained(struct sg_table *table, int nents,
+   struct scatterlist *first_chunk, gfp_t gfp);
+
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
  size_t buflen, off_t skip, bool to_buffer);
 
@@ -286,6 +290,14 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, 
unsigned int nents,
 #define SG_MAX_SINGLE_ALLOC(PAGE_SIZE / sizeof(struct scatterlist))
 
 /*
+ * The maximum number of SG segments that we will put inside a
+ * scatterlist.
+ *
+ * XXX: what's the best number?
+ */
+#define SG_MAX_SEGMENTS128
+
+/*
  * sg page iterator
  *
  * Iterates over sg entries page-by-page.  On each successful iteration,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 004fc70..f97831e 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * sg_next - return the next scatterlist entry in a list
@@ -755,3 +756,158 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, 
unsigned int nents,
return sg_copy_buffer(sgl, nents, buf, buflen, skip, true);
 }
 EXPORT_SYMBOL(sg_pcopy_to_buffer);
+
+#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
+#define SG_MEMPOOL_SIZE2
+
+struct sg_mempool {
+   size_t  size;
+   char*name;
+   struct kmem_cache *slab;
+   mempool_t   *pool;
+};
+
+#define SP(x) { .size = x, "sgpool-" __stringify(x) }
+#if (SG_MAX_SEGMENTS < 32)
+#error SG_MAX_SEGMENTS is too small (must be 32 or greater)
+#endif
+static struct sg_mempool sg_pools[] = {
+   SP(8),
+   SP(16),
+#if (SG_MAX_SEGMENTS > 32)
+   SP(32),
+#if (SG_MAX_SEGMENTS > 64)
+   SP(64),
+#if (SG_MAX_SEGMENTS > 128)
+   SP(128),
+#if (SG_MAX_SEGMENTS > 256)
+#error SG_MAX_SEGMENTS is too large (256 MAX)
+#endif
+#endif
+#endif
+#endif
+   SP(SG_MAX_SEGMENTS)
+};
+#undef SP
+
+static inline unsigned int sg_pool_index(unsigned short nents)
+{
+   unsigned int index;
+
+   BUG_ON(nents > SG_MAX_SEGMENTS);
+
+   if (nents <= 8)
+   index = 0;
+   else
+   index = get_count_order(nents) - 3;
+
+   return index;
+}
+
+static void sg_mempoll_free(struct scatterlist *sgl, unsigned int nents)
+{
+   struct sg_mempool *sgp;
+
+   sgp = sg_pools + sg_pool_index(nents);
+   mempool_free(sgl, sgp->pool);
+}
+
+static struct scatterlist *sg_mempool_alloc(unsigned int nents, gfp_t gfp)
+{
+   struct sg_mempool *sgp;
+
+   sgp = sg_pools + sg_pool_index(nents);
+   return mempool_alloc(sgp->pool, gfp);
+}
+
+/**
+ * sg_free_chained - Free a previously mapped sg table
+ * @table: The sg table header to use
+ * @first_chunk: was first_chunk not NULL in sg_alloc_chained?
+ *
+ *  Description:
+ *Free an sg table previously allocated and setup with
+ *sg_alloc_chained().
+ *
+ **/
+void sg_free_chained(struct sg_table *table, bool first_chunk)
+{
+   if (first_chunk && table->orig_nents <= SG_MAX_SEGMENTS)
+   return;
+   __sg_free_table(table, SG_MAX_SEGMENTS, 1, sg_mempoll_free);
+}
+EXPORT_SYMBOL_GPL(sg_free_chained);
+
+/**
+ * sg_alloc_chained - Allocate and chain SGLs in an sg table
+ * @table: The sg table header to use
+ * @nents: Number of entries in sg list
+ * @first_chunk: first SGL
+ * @gfp:   GFP allocation mask
+ *
+ *  Description:
+ *Allocate and chain SGLs in an sg table. If @nents@ is larger than
+ *SG_MAX_SEGMENTS a chained sg table will be setup.
+ *
+ **/
+int sg_alloc_chained(struct sg_table *table, int nents,
+   struct scatterlist *first_chunk, gfp_t gfp)
+{
+   int ret;
+
+   BUG_ON(!nents);
+
+   if (first_chunk && nents <= SG_MAX_SEGMENTS) {
+   table->nents = table->orig_nents = nents;
+   sg_init_table(first_chunk, nents);
+   return 0;
+   }
+
+   ret = __sg_alloc_table(table, nents, SG_MAX_SEGMENTS,
+   first_chunk, gfp, sg_mempool_alloc);
+   if (unlikely(ret))
+   sg_free_chained(table, (bool)first_chunk);
+
+   return ret;
+}
+EXPORT_SYMBOL_GP

[PATCH RFC 1/2] scatterlist: add mempool based chained SG alloc/free api

2016-03-15 Thread Ming Lin

From: Ming Lin 

This copied code from scsi_lib.c to scatterlist.c and
modified it a bit.

Signed-off-by: Ming Lin 
---
 include/linux/scatterlist.h |  12 
 lib/scatterlist.c   | 156 
 2 files changed, 168 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 556ec1e..888f2c3 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -266,6 +266,10 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
unsigned long offset, unsigned long size,
gfp_t gfp_mask);
 
+void sg_free_chained(struct sg_table *table, bool first_chunk);
+int sg_alloc_chained(struct sg_table *table, int nents,
+   struct scatterlist *first_chunk, gfp_t gfp);
+
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
  size_t buflen, off_t skip, bool to_buffer);
 
@@ -286,6 +290,14 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, 
unsigned int nents,
 #define SG_MAX_SINGLE_ALLOC(PAGE_SIZE / sizeof(struct scatterlist))
 
 /*
+ * The maximum number of SG segments that we will put inside a
+ * scatterlist.
+ *
+ * XXX: what's the best number?
+ */
+#define SG_MAX_SEGMENTS128
+
+/*
  * sg page iterator
  *
  * Iterates over sg entries page-by-page.  On each successful iteration,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 004fc70..f97831e 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * sg_next - return the next scatterlist entry in a list
@@ -755,3 +756,158 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, 
unsigned int nents,
return sg_copy_buffer(sgl, nents, buf, buflen, skip, true);
 }
 EXPORT_SYMBOL(sg_pcopy_to_buffer);
+
+#define SG_MEMPOOL_NR  ARRAY_SIZE(sg_pools)
+#define SG_MEMPOOL_SIZE2
+
+struct sg_mempool {
+   size_t  size;
+   char*name;
+   struct kmem_cache *slab;
+   mempool_t   *pool;
+};
+
+#define SP(x) { .size = x, "sgpool-" __stringify(x) }
+#if (SG_MAX_SEGMENTS < 32)
+#error SG_MAX_SEGMENTS is too small (must be 32 or greater)
+#endif
+static struct sg_mempool sg_pools[] = {
+   SP(8),
+   SP(16),
+#if (SG_MAX_SEGMENTS > 32)
+   SP(32),
+#if (SG_MAX_SEGMENTS > 64)
+   SP(64),
+#if (SG_MAX_SEGMENTS > 128)
+   SP(128),
+#if (SG_MAX_SEGMENTS > 256)
+#error SG_MAX_SEGMENTS is too large (256 MAX)
+#endif
+#endif
+#endif
+#endif
+   SP(SG_MAX_SEGMENTS)
+};
+#undef SP
+
+static inline unsigned int sg_pool_index(unsigned short nents)
+{
+   unsigned int index;
+
+   BUG_ON(nents > SG_MAX_SEGMENTS);
+
+   if (nents <= 8)
+   index = 0;
+   else
+   index = get_count_order(nents) - 3;
+
+   return index;
+}
+
+static void sg_mempoll_free(struct scatterlist *sgl, unsigned int nents)
+{
+   struct sg_mempool *sgp;
+
+   sgp = sg_pools + sg_pool_index(nents);
+   mempool_free(sgl, sgp->pool);
+}
+
+static struct scatterlist *sg_mempool_alloc(unsigned int nents, gfp_t gfp)
+{
+   struct sg_mempool *sgp;
+
+   sgp = sg_pools + sg_pool_index(nents);
+   return mempool_alloc(sgp->pool, gfp);
+}
+
+/**
+ * sg_free_chained - Free a previously mapped sg table
+ * @table: The sg table header to use
+ * @first_chunk: was first_chunk not NULL in sg_alloc_chained?
+ *
+ *  Description:
+ *Free an sg table previously allocated and setup with
+ *sg_alloc_chained().
+ *
+ **/
+void sg_free_chained(struct sg_table *table, bool first_chunk)
+{
+   if (first_chunk && table->orig_nents <= SG_MAX_SEGMENTS)
+   return;
+   __sg_free_table(table, SG_MAX_SEGMENTS, 1, sg_mempoll_free);
+}
+EXPORT_SYMBOL_GPL(sg_free_chained);
+
+/**
+ * sg_alloc_chained - Allocate and chain SGLs in an sg table
+ * @table: The sg table header to use
+ * @nents: Number of entries in sg list
+ * @first_chunk: first SGL
+ * @gfp:   GFP allocation mask
+ *
+ *  Description:
+ *Allocate and chain SGLs in an sg table. If @nents@ is larger than
+ *SG_MAX_SEGMENTS a chained sg table will be setup.
+ *
+ **/
+int sg_alloc_chained(struct sg_table *table, int nents,
+   struct scatterlist *first_chunk, gfp_t gfp)
+{
+   int ret;
+
+   BUG_ON(!nents);
+
+   if (first_chunk && nents <= SG_MAX_SEGMENTS) {
+   table->nents = table->orig_nents = nents;
+   sg_init_table(first_chunk, nents);
+   return 0;
+   }
+
+   ret = __sg_alloc_table(table, nents, SG_MAX_SEGMENTS,
+   first_chunk, gfp, sg_mempool_alloc);
+   if (unlikely(ret))
+   sg_free_chained(table, (bool)first_chunk);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(sg_alloc_chained);
+
+static __init int sg_mempool_ini

Re: e827091cb1 "block: merge: get the 1st and last bvec via helpers" broken

2016-03-12 Thread Ming Lin

On Fri, Mar 11, 2016 at 11:43 PM, Kent Overstreet
 wrote:
> I don't know exactly how it's broken, but with that patch segment counting is
> broken - I'm seeing blk_rq_map_sg() overrun the end of the sgtable.
>
> I suggest reverting it for 4.5...

+ Ming Lei

Re: e827091cb1 "block: merge: get the 1st and last bvec via helpers" broken

2016-03-12 Thread Ming Lin

On Fri, Mar 11, 2016 at 11:43 PM, Kent Overstreet
 wrote:
> I don't know exactly how it's broken, but with that patch segment counting is
> broken - I'm seeing blk_rq_map_sg() overrun the end of the sgtable.
>
> I suggest reverting it for 4.5...

+ Ming Lei

RE: 4.4-final: 28 bioset threads on small notebook

2016-02-21 Thread Ming Lin-SSI

>-Original Message-
>From: Kent Overstreet [mailto:kent.overstr...@gmail.com]
>
>On Sat, Feb 20, 2016 at 09:55:19PM +0100, Pavel Machek wrote:
>> Hi!
>>
>> > > > You're directing this concern to the wrong person.
>> > > >
>> > > > I already told you DM is _not_ contributing any extra "bioset" threads
>> > > > (ever since commit dbba42d8a).
>> > >
>> > > Well, sorry about that. Note that l-k is on the cc list, so hopefully
>> > > the right person sees it too.
>> > >
>> > > Ok, let me check... it seems that
>> > > 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent
>> > > Overstreet  is to blame.
>> > >
>> > > Um, and you acked the patch, so you are partly responsible.
>> >
>> > You still haven't shown you even understand the patch so don't try to
>> > blame me for one aspect you don't like.
>>
>> Well, I don't have to understand the patch to argue its wrong.
>>
>> > > > But in general, these "bioset" threads are a side-effect of the
>> > > > late-bio-splitting support.  So is your position on it: "I don't like
>> > > > that feature if it comes at the expense of adding resources I can _see_
>> > > > for something I (naively?) view as useless"?
>> > >
>> > > > Just seems... naive... but you could be trying to say something else
>> > > > entirely.
>> > >
>> > > > Anyway, if you don't like something: understand why it is there and
>then
>> > > > try to fix it to your liking (without compromising why it was there to
>> > > > begin with).
>> > >
>> > > Well, 28 kernel threads on a notebook is a bug, plain and simple. Do
>> > > you argue it is not?
>> >
>> > Just implies you have 28 request_queues right?  You clearly have
>> > something else going on on your notebook than the average notebook
>> > user.
>>
>> I'm not using the modules, but otherwise I'm not doing anything
>> special. How many request_queues should I expect? How many do you
>have
>> on your notebook?
>
>It's one rescuer thread per bio_set, not one per request queue, so 28 is more
>than I'd expect but there's lots of random bio_sets so it's not entirely
>unexpected.
>
>It'd be better to have the rescuers be per request_queue, just someone is
>going
>to have to write the code.

I boot a VM and it also has 28 bioset threads.

That's because I have 27 block devices.

root@wheezy:~# ls /sys/block/
loop0  loop2  loop4  loop6  ram0  ram10  ram12  ram14  ram2  ram4  ram6  ram8  
sr0  vdb
loop1  loop3  loop5  loop7  ram1  ram11  ram13  ram15  ram3  ram5  ram7  ram9  
vda

And the additional one comes from init_bio

[0.329627] Call Trace:
[0.329970]  [] dump_stack+0x63/0x87
[0.330531]  [] __bioset_create+0x29e/0x2b0
[0.331127]  [] ? ca_keys_setup+0xa6/0xa6
[0.331735]  [] init_bio+0xa1/0xd1
[0.332284]  [] do_one_initcall+0xcd/0x1f0
[0.332883]  [] ? parse_args+0x296/0x480
[0.333460]  [] kernel_init_freeable+0x16f/0x1fa
[0.334131]  [] ? initcall_blacklist+0xba/0xba
[0.334747]  [] ? rest_init+0x80/0x80
[0.335301]  [] kernel_init+0xe/0xf0
[0.335842]  [] ret_from_fork+0x3f/0x70
[0.336371]  [] ? rest_init+0x80/0x80

So it's almost already "per request_queue"

RE: 4.4-final: 28 bioset threads on small notebook

2016-02-21 Thread Ming Lin-SSI

>-Original Message-
>From: Kent Overstreet [mailto:kent.overstr...@gmail.com]
>
>On Sat, Feb 20, 2016 at 09:55:19PM +0100, Pavel Machek wrote:
>> Hi!
>>
>> > > > You're directing this concern to the wrong person.
>> > > >
>> > > > I already told you DM is _not_ contributing any extra "bioset" threads
>> > > > (ever since commit dbba42d8a).
>> > >
>> > > Well, sorry about that. Note that l-k is on the cc list, so hopefully
>> > > the right person sees it too.
>> > >
>> > > Ok, let me check... it seems that
>> > > 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent
>> > > Overstreet  is to blame.
>> > >
>> > > Um, and you acked the patch, so you are partly responsible.
>> >
>> > You still haven't shown you even understand the patch so don't try to
>> > blame me for one aspect you don't like.
>>
>> Well, I don't have to understand the patch to argue its wrong.
>>
>> > > > But in general, these "bioset" threads are a side-effect of the
>> > > > late-bio-splitting support.  So is your position on it: "I don't like
>> > > > that feature if it comes at the expense of adding resources I can _see_
>> > > > for something I (naively?) view as useless"?
>> > >
>> > > > Just seems... naive... but you could be trying to say something else
>> > > > entirely.
>> > >
>> > > > Anyway, if you don't like something: understand why it is there and
>then
>> > > > try to fix it to your liking (without compromising why it was there to
>> > > > begin with).
>> > >
>> > > Well, 28 kernel threads on a notebook is a bug, plain and simple. Do
>> > > you argue it is not?
>> >
>> > Just implies you have 28 request_queues right?  You clearly have
>> > something else going on on your notebook than the average notebook
>> > user.
>>
>> I'm not using the modules, but otherwise I'm not doing anything
>> special. How many request_queues should I expect? How many do you
>have
>> on your notebook?
>
>It's one rescuer thread per bio_set, not one per request queue, so 28 is more
>than I'd expect but there's lots of random bio_sets so it's not entirely
>unexpected.
>
>It'd be better to have the rescuers be per request_queue, just someone is
>going
>to have to write the code.

I boot a VM and it also has 28 bioset threads.

That's because I have 27 block devices.

root@wheezy:~# ls /sys/block/
loop0  loop2  loop4  loop6  ram0  ram10  ram12  ram14  ram2  ram4  ram6  ram8  
sr0  vdb
loop1  loop3  loop5  loop7  ram1  ram11  ram13  ram15  ram3  ram5  ram7  ram9  
vda

And the additional one comes from init_bio

[0.329627] Call Trace:
[0.329970]  [] dump_stack+0x63/0x87
[0.330531]  [] __bioset_create+0x29e/0x2b0
[0.331127]  [] ? ca_keys_setup+0xa6/0xa6
[0.331735]  [] init_bio+0xa1/0xd1
[0.332284]  [] do_one_initcall+0xcd/0x1f0
[0.332883]  [] ? parse_args+0x296/0x480
[0.333460]  [] kernel_init_freeable+0x16f/0x1fa
[0.334131]  [] ? initcall_blacklist+0xba/0xba
[0.334747]  [] ? rest_init+0x80/0x80
[0.335301]  [] kernel_init+0xe/0xf0
[0.335842]  [] ret_from_fork+0x3f/0x70
[0.336371]  [] ? rest_init+0x80/0x80

So it's almost already "per request_queue"

Re: block: re-add discard_granularity and alignment checks

2015-10-27 Thread Ming Lin

On Thu, Oct 22, 2015 at 10:03 AM, Mike Snitzer  wrote:
> On Thu, Oct 22 2015 at 12:59pm -0400,
> Ming Lin  wrote:
>
>> From: Ming Lin 
>>
>> In commit b49a087("block: remove split code in
>> blkdev_issue_{discard,write_same}"), discard_granularity and alignment
>> checks were removed. Ideally, with bio late splitting, the upper layers
>> shouldn't need to depend on device's limits.
>>
>> Christoph reported a discard regression on the HGST Ultrastar SN100 NVMe
>> device when mkfs.xfs. We have not found the root cause yet.
>>
>> This patch re-adds discard_granularity and alignment checks by reverting
>> the related changes in commit b49a087. The good thing is now we can
>> remove the 2G discard size cap and just use UINT_MAX to avoid bi_size
>> overflow.
>>
>> Reviewed-by: Christoph Hellwig 
>> Tested-by: Christoph Hellwig 
>> Signed-off-by: Ming Lin 
>
> Reviewed-by: Mike Snitzer 

Hi Jens,

Would you please take this one?

>
> Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: block: re-add discard_granularity and alignment checks

2015-10-27 Thread Ming Lin

On Thu, Oct 22, 2015 at 10:03 AM, Mike Snitzer <snit...@redhat.com> wrote:
> On Thu, Oct 22 2015 at 12:59pm -0400,
> Ming Lin <m...@kernel.org> wrote:
>
>> From: Ming Lin <min...@ssi.samsung.com>
>>
>> In commit b49a087("block: remove split code in
>> blkdev_issue_{discard,write_same}"), discard_granularity and alignment
>> checks were removed. Ideally, with bio late splitting, the upper layers
>> shouldn't need to depend on device's limits.
>>
>> Christoph reported a discard regression on the HGST Ultrastar SN100 NVMe
>> device when mkfs.xfs. We have not found the root cause yet.
>>
>> This patch re-adds discard_granularity and alignment checks by reverting
>> the related changes in commit b49a087. The good thing is now we can
>> remove the 2G discard size cap and just use UINT_MAX to avoid bi_size
>> overflow.
>>
>> Reviewed-by: Christoph Hellwig <h...@lst.de>
>> Tested-by: Christoph Hellwig <h...@lst.de>
>> Signed-off-by: Ming Lin <min...@ssi.samsung.com>
>
> Reviewed-by: Mike Snitzer <snit...@redhat.com>

Hi Jens,

Would you please take this one?

>
> Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] block: re-add discard_granularity and alignment checks

2015-10-22 Thread Ming Lin

From: Ming Lin 

In commit b49a087("block: remove split code in
blkdev_issue_{discard,write_same}"), discard_granularity and alignment
checks were removed. Ideally, with bio late splitting, the upper layers
shouldn't need to depend on device's limits.

Christoph reported a discard regression on the HGST Ultrastar SN100 NVMe
device when mkfs.xfs. We have not found the root cause yet.

This patch re-adds discard_granularity and alignment checks by reverting
the related changes in commit b49a087. The good thing is now we can
remove the 2G discard size cap and just use UINT_MAX to avoid bi_size
overflow.

Reviewed-by: Christoph Hellwig 
Tested-by: Christoph Hellwig 
Signed-off-by: Ming Lin 
---
 block/blk-lib.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index bd40292..9ebf653 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,13 +26,6 @@ static void bio_batch_end_io(struct bio *bio)
bio_put(bio);
 }
 
-/*
- * Ensure that max discard sectors doesn't overflow bi_size and hopefully
- * it is of the proper granularity as long as the granularity is a power
- * of two.
- */
-#define MAX_BIO_SECTORS ((1U << 31) >> 9)
-
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -50,6 +43,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
+   unsigned int granularity;
+   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -61,6 +56,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
+   /* Zero-sector (unknown) and one-sector granularities are the same.  */
+   granularity = max(q->limits.discard_granularity >> 9, 1U);
+   alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
+
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -74,7 +73,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug();
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect;
+   sector_t end_sect, tmp;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -82,8 +81,22 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
break;
}
 
-   req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
+   /* Make sure bi_size doesn't overflow */
+   req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
+
+   /*
+* If splitting a request, and the next starting sector would be
+* misaligned, stop the discard at the previous aligned sector.
+*/
end_sect = sector + req_sects;
+   tmp = end_sect;
+   if (req_sects < nr_sects &&
+   sector_div(tmp, granularity) != alignment) {
+   end_sect = end_sect - alignment;
+   sector_div(end_sect, granularity);
+   end_sect = end_sect * granularity + alignment;
+   req_sects = end_sect - sector;
+   }
 
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] block: re-add discard_granularity and alignment checks

2015-10-22 Thread Ming Lin

From: Ming Lin <min...@ssi.samsung.com>

In commit b49a087("block: remove split code in
blkdev_issue_{discard,write_same}"), discard_granularity and alignment
checks were removed. Ideally, with bio late splitting, the upper layers
shouldn't need to depend on device's limits.

Christoph reported a discard regression on the HGST Ultrastar SN100 NVMe
device when mkfs.xfs. We have not found the root cause yet.

This patch re-adds discard_granularity and alignment checks by reverting
the related changes in commit b49a087. The good thing is now we can
remove the 2G discard size cap and just use UINT_MAX to avoid bi_size
overflow.

Reviewed-by: Christoph Hellwig <h...@lst.de>
Tested-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Ming Lin <min...@ssi.samsung.com>
---
 block/blk-lib.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index bd40292..9ebf653 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,13 +26,6 @@ static void bio_batch_end_io(struct bio *bio)
bio_put(bio);
 }
 
-/*
- * Ensure that max discard sectors doesn't overflow bi_size and hopefully
- * it is of the proper granularity as long as the granularity is a power
- * of two.
- */
-#define MAX_BIO_SECTORS ((1U << 31) >> 9)
-
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -50,6 +43,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
+   unsigned int granularity;
+   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -61,6 +56,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
+   /* Zero-sector (unknown) and one-sector granularities are the same.  */
+   granularity = max(q->limits.discard_granularity >> 9, 1U);
+   alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
+
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -74,7 +73,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug();
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect;
+   sector_t end_sect, tmp;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -82,8 +81,22 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
break;
}
 
-   req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
+   /* Make sure bi_size doesn't overflow */
+   req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
+
+   /*
+* If splitting a request, and the next starting sector would be
+* misaligned, stop the discard at the previous aligned sector.
+*/
end_sect = sector + req_sects;
+   tmp = end_sect;
+   if (req_sects < nr_sects &&
+   sector_div(tmp, granularity) != alignment) {
+   end_sect = end_sect - alignment;
+   sector_div(end_sect, granularity);
+   end_sect = end_sect * granularity + alignment;
+   req_sects = end_sect - sector;
+   }
 
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-21 Thread Ming Lin

On Wed, 2015-10-21 at 14:18 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at  1:33pm -0400,
> Ming Lin  wrote:
> 
> > On Wed, 2015-10-21 at 12:19 -0400, Mike Snitzer wrote:
> > > On Wed, Oct 21 2015 at 12:02pm -0400,
> > > Mike Snitzer  wrote:
> > > 
> > > > On Wed, Oct 14 2015 at  9:27am -0400,
> > > > Christoph Hellwig  wrote:
> > > > 
> > > > > On Tue, Oct 13, 2015 at 10:44:11AM -0700, Ming Lin wrote:
> > > > > > I just did a quick test with a Samsung 900G NVMe device.
> > > > > > mkfs.xfs is OK on 4.3-rc5.
> > > > > > 
> > > > > > What's your device model? I may find a similar one to try.
> > > > > 
> > > > > This is a HGST Ultrastar SN100
> > > > > 
> > > > > Analsys and tentativ fix below:
> > > > > 
> > > > > blktrace for before the commit:
> > > > > 
> > > > > 259,012 0.02543  2394  G   D 0 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,013 0.08230  2394  I   D 0 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,014 0.31090   207  D   D 0 + 8388607 
> > > > > [kworker/1:1H]
> > > > > 259,015 0.44869  2394  Q   D 8388607 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,016 0.45992  2394  G   D 8388607 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,017 0.49559  2394  I   D 8388607 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,018 0.61551   207  D   D 8388607 + 8388607 
> > > > > [kworker/1:1H]
> > > > > 
> > > > > .. and so on.
> > > > > 
> > > > > blktrace with the commit:
> > > > > 
> > > > > 259,021 0.0  1228  Q   D 0 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,022 0.02543  1228  G   D 0 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,023 0.10080  1228  I   D 0 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,024 0.82187   267  D   D 0 + 4194304 
> > > > > [kworker/2:1H]
> > > > > 259,025 0.000224869  1228  Q   D 4194304 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,026 0.000225835  1228  G   D 4194304 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,027 0.000229457  1228  I   D 4194304 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,028 0.000238507   267  D   D 4194304 + 4194304 
> > > > > [kworker/2:1H]
> > > > > 
> > > > > So discards are smaller, but better aligned.  Now if I tweak a single
> > > > > line in blk-lib.c to be able to use all of bi_size I get the old I/O
> > > > > pattern back and everything works fine again:
> > > > > 
> > > > > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > > > > index bd40292..65b61dc 100644
> > > > > --- a/block/blk-lib.c
> > > > > +++ b/block/blk-lib.c
> > > > > @@ -82,7 +82,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
> > > > > sector_t sector,
> > > > >   break;
> > > > >   }
> > > > >  
> > > > > - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
> > > > > + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> > > > >   end_sect = sector + req_sects;
> > > > >  
> > > > >   bio->bi_iter.bi_sector = sector;
> > > > 
> > > > Can we change UINT_MAX >> 9 to rounddown to the first factor of
> > > > minimum_io_size?
> > > > 
> > > > That should work for all devices and for dm-thinp (and dm-cache) in
> > > > particular will ensure that all discards that are issued will be a
> > > > multiple of the underlying device's blocksize.
> > > 
> > > Jeff Moyer pointed out having req_sects be a factor of
> > > discard_granularity makes more sense.  And I agree.  Same difference in
> > > the end (since dm-thinp sets discard_granularity to the thinp
> > > blocksize).
> > 
> > An old v

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-21 Thread Ming Lin

On Wed, 2015-10-21 at 12:19 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at 12:02pm -0400,
> Mike Snitzer  wrote:
> 
> > On Wed, Oct 14 2015 at  9:27am -0400,
> > Christoph Hellwig  wrote:
> > 
> > > On Tue, Oct 13, 2015 at 10:44:11AM -0700, Ming Lin wrote:
> > > > I just did a quick test with a Samsung 900G NVMe device.
> > > > mkfs.xfs is OK on 4.3-rc5.
> > > > 
> > > > What's your device model? I may find a similar one to try.
> > > 
> > > This is a HGST Ultrastar SN100
> > > 
> > > Analsys and tentativ fix below:
> > > 
> > > blktrace for before the commit:
> > > 
> > > 259,012 0.02543  2394  G   D 0 + 8388607 [mkfs.xfs]
> > > 259,013 0.08230  2394  I   D 0 + 8388607 [mkfs.xfs]
> > > 259,014 0.31090   207  D   D 0 + 8388607 
> > > [kworker/1:1H]
> > > 259,015 0.44869  2394  Q   D 8388607 + 8388607 
> > > [mkfs.xfs]
> > > 259,016 0.45992  2394  G   D 8388607 + 8388607 
> > > [mkfs.xfs]
> > > 259,017 0.49559  2394  I   D 8388607 + 8388607 
> > > [mkfs.xfs]
> > > 259,018 0.61551   207  D   D 8388607 + 8388607 
> > > [kworker/1:1H]
> > > 
> > > .. and so on.
> > > 
> > > blktrace with the commit:
> > > 
> > > 259,021 0.0  1228  Q   D 0 + 4194304 [mkfs.xfs]
> > > 259,022 0.02543  1228  G   D 0 + 4194304 [mkfs.xfs]
> > > 259,023 0.10080  1228  I   D 0 + 4194304 [mkfs.xfs]
> > > 259,024 0.82187   267  D   D 0 + 4194304 
> > > [kworker/2:1H]
> > > 259,025 0.000224869  1228  Q   D 4194304 + 4194304 
> > > [mkfs.xfs]
> > > 259,026 0.000225835  1228  G   D 4194304 + 4194304 
> > > [mkfs.xfs]
> > > 259,027 0.000229457  1228  I   D 4194304 + 4194304 
> > > [mkfs.xfs]
> > > 259,028 0.000238507   267  D   D 4194304 + 4194304 
> > > [kworker/2:1H]
> > > 
> > > So discards are smaller, but better aligned.  Now if I tweak a single
> > > line in blk-lib.c to be able to use all of bi_size I get the old I/O
> > > pattern back and everything works fine again:
> > > 
> > > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > > index bd40292..65b61dc 100644
> > > --- a/block/blk-lib.c
> > > +++ b/block/blk-lib.c
> > > @@ -82,7 +82,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
> > > sector_t sector,
> > >   break;
> > >   }
> > >  
> > > - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
> > > + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> > >   end_sect = sector + req_sects;
> > >  
> > >   bio->bi_iter.bi_sector = sector;
> > 
> > Can we change UINT_MAX >> 9 to rounddown to the first factor of
> > minimum_io_size?
> > 
> > That should work for all devices and for dm-thinp (and dm-cache) in
> > particular will ensure that all discards that are issued will be a
> > multiple of the underlying device's blocksize.
> 
> Jeff Moyer pointed out having req_sects be a factor of
> discard_granularity makes more sense.  And I agree.  Same difference in
> the end (since dm-thinp sets discard_granularity to the thinp
> blocksize).

An old version of this patch did use discard_granularity
https://www.redhat.com/archives/dm-devel/2015-August/msg0.html

But you didn't agree.
https://www.redhat.com/archives/dm-devel/2015-August/msg1.html

Maybe we can re-add discard_granularity now?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-21 Thread Ming Lin

On Wed, 2015-10-21 at 11:33 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at 11:01am -0400,
> Ming Lin  wrote:
> 
> > On Wed, 2015-10-21 at 09:39 -0400, Jeff Moyer wrote:
> > > Christoph Hellwig  writes:
> > > 
> > > > Jens, Ming:
> > > >
> > > > are you fine with the one liner change to get back to the old I/O
> > > > pattern?  While it looks like the cards fault I'd like to avoid this
> > > > annoying regression.
> > > 
> > > I'm not Jens or Ming, but your patch looks fine to me, though you'll
> > > want to remove the MAX_BIO_SECTORS definition since it's now unused.
> > > It's not clear to me why the limit was lowered in the first place.
> > 
> > UINT_MAX >> 9 is not power of 2 and it causes dm-thinp discard fails.
> > 
> > At the lengthy discussion:
> > [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized 
> > bios
> > We agreed to cap discard to 2G as an interim solution for 4.3 until the
> > dm-thinp discard code is rewritten.
> 
> But did Jens ever commit that change to cap at 2G?  I don't recall
> seeing it.

Yes, commit b49a0871

> 
> > Hi Mike,
> > 
> > Will the dm-thinp discard rewritten ready for 4.4?
> 
> No.  I'm not clear what needs changing in dm-thinp.  I'll have to
> revisit the thread to refresh my memory.
> 
> BTW, DM thinp can easily handle discards that aren't a power-of-2 so
> long as the requested discard is a factor of the thinp blocksize.

You are right. It's not about power-of-2.

Copy my old post here about why dm-thinp discard may fail with "UINT_MAX
>> 9".

  4G: 8388608 sectors
UINT_MAX: 8388607 sectors

dm-thinp block size = default discard granularity = 128 sectors

blkdev_issue_discard(sector=0, nr_sectors=8388608)

[start_sector, end_sector]
[0, 8388607]
[0, 8388606], then dm-thinp splits it to 2 bios
[0, 8388479]
[8388480, 8388606] ---> this has problem in process_discard_bio(),
because the discard size(7 sectors) covers less 
than a block(128 sectors)
[8388607, 8388607] ---> same problem 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-21 Thread Ming Lin

On Wed, 2015-10-21 at 09:39 -0400, Jeff Moyer wrote:
> Christoph Hellwig  writes:
> 
> > Jens, Ming:
> >
> > are you fine with the one liner change to get back to the old I/O
> > pattern?  While it looks like the cards fault I'd like to avoid this
> > annoying regression.
> 
> I'm not Jens or Ming, but your patch looks fine to me, though you'll
> want to remove the MAX_BIO_SECTORS definition since it's now unused.
> It's not clear to me why the limit was lowered in the first place.

UINT_MAX >> 9 is not power of 2 and it causes dm-thinp discard fails.

At the lengthy discussion:
[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
We agreed to cap discard to 2G as an interim solution for 4.3 until the
dm-thinp discard code is rewritten.

Hi Mike,

Will the dm-thinp discard rewritten ready for 4.4?

Thanks,
Ming

> 
> You can add my Reviewed-by: Jeff Moyer  if you resend
> the patch.
> 
> -Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-21 Thread Ming Lin

On Wed, 2015-10-21 at 09:39 -0400, Jeff Moyer wrote:
> Christoph Hellwig  writes:
> 
> > Jens, Ming:
> >
> > are you fine with the one liner change to get back to the old I/O
> > pattern?  While it looks like the cards fault I'd like to avoid this
> > annoying regression.
> 
> I'm not Jens or Ming, but your patch looks fine to me, though you'll
> want to remove the MAX_BIO_SECTORS definition since it's now unused.
> It's not clear to me why the limit was lowered in the first place.

UINT_MAX >> 9 is not power of 2 and it causes dm-thinp discard fails.

At the lengthy discussion:
[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
We agreed to cap discard to 2G as an interim solution for 4.3 until the
dm-thinp discard code is rewritten.

Hi Mike,

Will the dm-thinp discard rewritten ready for 4.4?

Thanks,
Ming

> 
> You can add my Reviewed-by: Jeff Moyer  if you resend
> the patch.
> 
> -Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-21 Thread Ming Lin

On Wed, 2015-10-21 at 11:33 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at 11:01am -0400,
> Ming Lin <m...@kernel.org> wrote:
> 
> > On Wed, 2015-10-21 at 09:39 -0400, Jeff Moyer wrote:
> > > Christoph Hellwig <h...@infradead.org> writes:
> > > 
> > > > Jens, Ming:
> > > >
> > > > are you fine with the one liner change to get back to the old I/O
> > > > pattern?  While it looks like the cards fault I'd like to avoid this
> > > > annoying regression.
> > > 
> > > I'm not Jens or Ming, but your patch looks fine to me, though you'll
> > > want to remove the MAX_BIO_SECTORS definition since it's now unused.
> > > It's not clear to me why the limit was lowered in the first place.
> > 
> > UINT_MAX >> 9 is not power of 2 and it causes dm-thinp discard fails.
> > 
> > At the lengthy discussion:
> > [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized 
> > bios
> > We agreed to cap discard to 2G as an interim solution for 4.3 until the
> > dm-thinp discard code is rewritten.
> 
> But did Jens ever commit that change to cap at 2G?  I don't recall
> seeing it.

Yes, commit b49a0871

> 
> > Hi Mike,
> > 
> > Will the dm-thinp discard rewritten ready for 4.4?
> 
> No.  I'm not clear what needs changing in dm-thinp.  I'll have to
> revisit the thread to refresh my memory.
> 
> BTW, DM thinp can easily handle discards that aren't a power-of-2 so
> long as the requested discard is a factor of the thinp blocksize.

You are right. It's not about power-of-2.

Copy my old post here about why dm-thinp discard may fail with "UINT_MAX
>> 9".

  4G: 8388608 sectors
UINT_MAX: 8388607 sectors

dm-thinp block size = default discard granularity = 128 sectors

blkdev_issue_discard(sector=0, nr_sectors=8388608)

[start_sector, end_sector]
[0, 8388607]
[0, 8388606], then dm-thinp splits it to 2 bios
[0, 8388479]
[8388480, 8388606] ---> this has problem in process_discard_bio(),
because the discard size(7 sectors) covers less 
than a block(128 sectors)
[8388607, 8388607] ---> same problem 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-21 Thread Ming Lin

On Wed, 2015-10-21 at 12:19 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at 12:02pm -0400,
> Mike Snitzer <snit...@redhat.com> wrote:
> 
> > On Wed, Oct 14 2015 at  9:27am -0400,
> > Christoph Hellwig <h...@infradead.org> wrote:
> > 
> > > On Tue, Oct 13, 2015 at 10:44:11AM -0700, Ming Lin wrote:
> > > > I just did a quick test with a Samsung 900G NVMe device.
> > > > mkfs.xfs is OK on 4.3-rc5.
> > > > 
> > > > What's your device model? I may find a similar one to try.
> > > 
> > > This is a HGST Ultrastar SN100
> > > 
> > > Analsys and tentativ fix below:
> > > 
> > > blktrace for before the commit:
> > > 
> > > 259,012 0.02543  2394  G   D 0 + 8388607 [mkfs.xfs]
> > > 259,013 0.08230  2394  I   D 0 + 8388607 [mkfs.xfs]
> > > 259,014 0.31090   207  D   D 0 + 8388607 
> > > [kworker/1:1H]
> > > 259,015 0.44869  2394  Q   D 8388607 + 8388607 
> > > [mkfs.xfs]
> > > 259,016 0.45992  2394  G   D 8388607 + 8388607 
> > > [mkfs.xfs]
> > > 259,017 0.49559  2394  I   D 8388607 + 8388607 
> > > [mkfs.xfs]
> > > 259,018 0.61551   207  D   D 8388607 + 8388607 
> > > [kworker/1:1H]
> > > 
> > > .. and so on.
> > > 
> > > blktrace with the commit:
> > > 
> > > 259,021 0.0  1228  Q   D 0 + 4194304 [mkfs.xfs]
> > > 259,022 0.02543  1228  G   D 0 + 4194304 [mkfs.xfs]
> > > 259,023 0.10080  1228  I   D 0 + 4194304 [mkfs.xfs]
> > > 259,024 0.82187   267  D   D 0 + 4194304 
> > > [kworker/2:1H]
> > > 259,025 0.000224869  1228  Q   D 4194304 + 4194304 
> > > [mkfs.xfs]
> > > 259,026 0.000225835  1228  G   D 4194304 + 4194304 
> > > [mkfs.xfs]
> > > 259,027 0.000229457  1228  I   D 4194304 + 4194304 
> > > [mkfs.xfs]
> > > 259,028 0.000238507   267  D   D 4194304 + 4194304 
> > > [kworker/2:1H]
> > > 
> > > So discards are smaller, but better aligned.  Now if I tweak a single
> > > line in blk-lib.c to be able to use all of bi_size I get the old I/O
> > > pattern back and everything works fine again:
> > > 
> > > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > > index bd40292..65b61dc 100644
> > > --- a/block/blk-lib.c
> > > +++ b/block/blk-lib.c
> > > @@ -82,7 +82,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
> > > sector_t sector,
> > >   break;
> > >   }
> > >  
> > > - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
> > > + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> > >   end_sect = sector + req_sects;
> > >  
> > >   bio->bi_iter.bi_sector = sector;
> > 
> > Can we change UINT_MAX >> 9 to rounddown to the first factor of
> > minimum_io_size?
> > 
> > That should work for all devices and for dm-thinp (and dm-cache) in
> > particular will ensure that all discards that are issued will be a
> > multiple of the underlying device's blocksize.
> 
> Jeff Moyer pointed out having req_sects be a factor of
> discard_granularity makes more sense.  And I agree.  Same difference in
> the end (since dm-thinp sets discard_granularity to the thinp
> blocksize).

An old version of this patch did use discard_granularity
https://www.redhat.com/archives/dm-devel/2015-August/msg0.html

But you didn't agree.
https://www.redhat.com/archives/dm-devel/2015-August/msg1.html

Maybe we can re-add discard_granularity now?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-21 Thread Ming Lin

On Wed, 2015-10-21 at 14:18 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at  1:33pm -0400,
> Ming Lin <m...@kernel.org> wrote:
> 
> > On Wed, 2015-10-21 at 12:19 -0400, Mike Snitzer wrote:
> > > On Wed, Oct 21 2015 at 12:02pm -0400,
> > > Mike Snitzer <snit...@redhat.com> wrote:
> > > 
> > > > On Wed, Oct 14 2015 at  9:27am -0400,
> > > > Christoph Hellwig <h...@infradead.org> wrote:
> > > > 
> > > > > On Tue, Oct 13, 2015 at 10:44:11AM -0700, Ming Lin wrote:
> > > > > > I just did a quick test with a Samsung 900G NVMe device.
> > > > > > mkfs.xfs is OK on 4.3-rc5.
> > > > > > 
> > > > > > What's your device model? I may find a similar one to try.
> > > > > 
> > > > > This is a HGST Ultrastar SN100
> > > > > 
> > > > > Analsys and tentativ fix below:
> > > > > 
> > > > > blktrace for before the commit:
> > > > > 
> > > > > 259,012 0.02543  2394  G   D 0 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,013 0.08230  2394  I   D 0 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,014 0.31090   207  D   D 0 + 8388607 
> > > > > [kworker/1:1H]
> > > > > 259,015 0.44869  2394  Q   D 8388607 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,016 0.45992  2394  G   D 8388607 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,017 0.49559  2394  I   D 8388607 + 8388607 
> > > > > [mkfs.xfs]
> > > > > 259,018 0.61551   207  D   D 8388607 + 8388607 
> > > > > [kworker/1:1H]
> > > > > 
> > > > > .. and so on.
> > > > > 
> > > > > blktrace with the commit:
> > > > > 
> > > > > 259,021 0.0  1228  Q   D 0 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,022 0.02543  1228  G   D 0 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,023 0.10080  1228  I   D 0 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,024 0.82187   267  D   D 0 + 4194304 
> > > > > [kworker/2:1H]
> > > > > 259,025 0.000224869  1228  Q   D 4194304 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,026 0.000225835  1228  G   D 4194304 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,027 0.000229457  1228  I   D 4194304 + 4194304 
> > > > > [mkfs.xfs]
> > > > > 259,028 0.000238507   267  D   D 4194304 + 4194304 
> > > > > [kworker/2:1H]
> > > > > 
> > > > > So discards are smaller, but better aligned.  Now if I tweak a single
> > > > > line in blk-lib.c to be able to use all of bi_size I get the old I/O
> > > > > pattern back and everything works fine again:
> > > > > 
> > > > > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > > > > index bd40292..65b61dc 100644
> > > > > --- a/block/blk-lib.c
> > > > > +++ b/block/blk-lib.c
> > > > > @@ -82,7 +82,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
> > > > > sector_t sector,
> > > > >   break;
> > > > >   }
> > > > >  
> > > > > - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
> > > > > + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> > > > >   end_sect = sector + req_sects;
> > > > >  
> > > > >   bio->bi_iter.bi_sector = sector;
> > > > 
> > > > Can we change UINT_MAX >> 9 to rounddown to the first factor of
> > > > minimum_io_size?
> > > > 
> > > > That should work for all devices and for dm-thinp (and dm-cache) in
> > > > particular will ensure that all discards that are issued will be a
> > > > multiple of the underlying device's blocksize.
> > > 
> > > Jeff Moyer pointed out having req_sects be a factor of
> > > discard_granularity makes more sense.  And I agree.  Same difference in
> > > the end (since dm-thinp sets discard_granulari

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-13 Thread Ming Lin

On Tue, Oct 13, 2015 at 4:50 AM, Christoph Hellwig  wrote:
> On Wed, Aug 12, 2015 at 12:07:15AM -0700, Ming Lin wrote:
>> From: Ming Lin 
>>
>> The split code in blkdev_issue_{discard,write_same} can go away
>> now that any driver that cares does the split. We have to make
>> sure bio size doesn't overflow.
>>
>> For discard, we set max discard sectors to (1<<31)>>9 to ensure
>> it doesn't overflow bi_size and hopefully it is of the proper
>> granularity as long as the granularity is a power of two.
>
> This ends up breaking discard on NVMe devices for a me.  An mkfs.xfs
> which does a discard of the whole device now hangs the system.
> Something in here makes it send discard command that the device doesn't
> like and the aborts don't seem to help either, although that might be
> an issue with the abort handling in the driver.
>
> Just a heads up for now, once I get a bit more time I'll try to collect
> a blktrace to figure out how the commands sent to the driver look
> different before and after the patch.

I just did a quick test with a Samsung 900G NVMe device.
mkfs.xfs is OK on 4.3-rc5.

What's your device model? I may find a similar one to try.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}

2015-10-13 Thread Ming Lin

On Tue, Oct 13, 2015 at 4:50 AM, Christoph Hellwig <h...@infradead.org> wrote:
> On Wed, Aug 12, 2015 at 12:07:15AM -0700, Ming Lin wrote:
>> From: Ming Lin <min...@ssi.samsung.com>
>>
>> The split code in blkdev_issue_{discard,write_same} can go away
>> now that any driver that cares does the split. We have to make
>> sure bio size doesn't overflow.
>>
>> For discard, we set max discard sectors to (1<<31)>>9 to ensure
>> it doesn't overflow bi_size and hopefully it is of the proper
>> granularity as long as the granularity is a power of two.
>
> This ends up breaking discard on NVMe devices for a me.  An mkfs.xfs
> which does a discard of the whole device now hangs the system.
> Something in here makes it send discard command that the device doesn't
> like and the aborts don't seem to help either, although that might be
> an issue with the abort handling in the driver.
>
> Just a heads up for now, once I get a bit more time I'll try to collect
> a blktrace to figure out how the commands sent to the driver look
> different before and after the patch.

I just did a quick test with a Samsung 900G NVMe device.
mkfs.xfs is OK on 4.3-rc5.

What's your device model? I may find a similar one to try.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]

2015-09-12 Thread Ming Lin

On Sat, Sep 12, 2015 at 12:34 AM, Ming Lin  wrote:
> On Fri, 2015-09-11 at 21:43 -0700, Ming Lin wrote:
>> On Fri, Sep 11, 2015 at 2:43 PM, Mike Snitzer  wrote:
>> > Ming, Jens, others:
>> >
>> > Please see this BZ comment that speaks to a 4.3 regression due to the
>> > late bio splitting changes:
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41
>> >
>> > But inlined here so we can continue on list:
>> > (In reply to Josh Boyer from comment #40)
>> >> The function that was fixed in 4.2 doesn't exist any longer in
>> >> 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
>> >> v4.2-6105-gdd5cdb48edfd which contains commit
>> >> 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
>> >> whatever fix was made in dm_merge_bvec doesn't seem to have made it to
>> >> whatever replaced it.
>> >
>> > The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
>> > dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
>> > a clear equivalent in the late bio splitting code that replaced block
>> > core's merge_bvec logic.
>> >
>> > merge_bvec was all about limiting bios (by asking "can/should this page
>> > be added to this bio?") whereas the late bio splitting is more "build
>> > the bios as large as possible and worry about splitting later".
>> >
>> > Regardless, this regression needs to be reported to Ming Lin
>> > , Jens Axboe and the others involved in
>> > maintaining the late bio splitting changes in block core.
>>
>> I'm looking at it now.
>
> I tried rawhide-20150903 boot.iso and rawhide-20150904 boot.iso.
> 0903 boot.iso is OK, but 0904 boot.iso just stuck at "Reached target
> Basic System". So I can't see the panic.
> http://www.minggr.net/pub/20150912/rawhide-20150904-boot.iso.png
>
> I'll run test on 32bit VM, see if I can reproduce the bug.
>
> Adam,
>
> Could you also help to confirm that commit 7140aaf is OK and commit
> 8ae1266 is bad?

I mean to confirm "commit 7140aaf + git cherry-pick bd4aaf8" is OK
and commit 8ae1266 is bad.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]

2015-09-12 Thread Ming Lin

On Fri, 2015-09-11 at 21:43 -0700, Ming Lin wrote:
> On Fri, Sep 11, 2015 at 2:43 PM, Mike Snitzer  wrote:
> > Ming, Jens, others:
> >
> > Please see this BZ comment that speaks to a 4.3 regression due to the
> > late bio splitting changes:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41
> >
> > But inlined here so we can continue on list:
> > (In reply to Josh Boyer from comment #40)
> >> The function that was fixed in 4.2 doesn't exist any longer in
> >> 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
> >> v4.2-6105-gdd5cdb48edfd which contains commit
> >> 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
> >> whatever fix was made in dm_merge_bvec doesn't seem to have made it to
> >> whatever replaced it.
> >
> > The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
> > dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
> > a clear equivalent in the late bio splitting code that replaced block
> > core's merge_bvec logic.
> >
> > merge_bvec was all about limiting bios (by asking "can/should this page
> > be added to this bio?") whereas the late bio splitting is more "build
> > the bios as large as possible and worry about splitting later".
> >
> > Regardless, this regression needs to be reported to Ming Lin
> > , Jens Axboe and the others involved in
> > maintaining the late bio splitting changes in block core.
> 
> I'm looking at it now.

I tried rawhide-20150903 boot.iso and rawhide-20150904 boot.iso.
0903 boot.iso is OK, but 0904 boot.iso just stuck at "Reached target
Basic System". So I can't see the panic.
http://www.minggr.net/pub/20150912/rawhide-20150904-boot.iso.png

I'll run test on 32bit VM, see if I can reproduce the bug.

Adam,

Could you also help to confirm that commit 7140aaf is OK and commit
8ae1266 is bad?

Thanks,
Ming


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 406 matches

Mail list logo