Re: [PATCH v1 12/54] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE

2017-01-05 Thread Ming Lei
On Wed, Jan 4, 2017 at 12:43 AM, Mike Snitzer  wrote:
> On Tue, Dec 27 2016 at 10:56am -0500,
> Ming Lei  wrote:
>
>> For BIO based DM, some targets aren't ready for dealing with
>> bigger incoming bio than 1Mbyte, such as crypt target.
>>
>> Signed-off-by: Ming Lei 
>> ---
>>  drivers/md/dm.c | 11 ++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 3086da5664f3..6139bf7623f7 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -899,7 +899,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, 
>> sector_t len)
>>   return -EINVAL;
>>   }
>>
>> - ti->max_io_len = (uint32_t) len;
>> + /*
>> +  * BIO based queue uses its own splitting. When multipage bvecs
>> +  * is switched on, size of the incoming bio may be too big to
>> +  * be handled in some targets, such as crypt.
>> +  *
>> +  * When these targets are ready for the big bio, we can remove
>> +  * the limit.
>> +  */
>> + ti->max_io_len = min_t(uint32_t, len,
>> +(BIO_MAX_PAGES * PAGE_SIZE));
>>
>>   return 0;
>>  }
>> --
>> 2.7.4
>
> dm_set_target_max_io_len() is already meant to be called by the .ctr
> hook for each DM target.  So why not just have the dm-crypt target (and
> other targets if needed) pass your reduced $len?
>
> That way only targets that need to be fixed (e.g. dm-crypt) impose this
> limit.

Looks better way, and I will do it in V2.


Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os

2017-01-05 Thread Jaegeuk Kim
Hello,

On 01/04, Damien Le Moal wrote:

...
> 
> > Finally, if I really like to develop SMR- or NAND flash oriented file
> > system then I would like to play with peculiarities of concrete
> > technologies. And any unified interface will destroy the opportunity 
> > to create the really efficient solution. Finally, if my software
> > solution is unable to provide some fancy and efficient features then
> > guys will prefer to use the regular stack (ext4, xfs + block layer).
> 
> Not necessarily. Again think in terms of device "model" and associated
> feature set. An FS implementation may decide to support all possible
> models, with likely a resulting incredible complexity. More likely,
> similarly with what is happening with SMR, only models that make sense
> will be supported by FS implementation that can be easily modified.
> Example again here of f2fs: changes to support SMR were rather simple,
> whereas the initial effort to support SMR with ext4 was pretty much
> abandoned as it was too complex to integrate in the existing code while
> keeping the existing on-disk format.

>From the f2fs viewpoint, now we support single host-managed SMR drive having
a portion of conventional zones. In addition, f2fs supports multiple devices
[1], which enables us to use pure host-managed SMR which has no conventional
zone, working with another small conventional partition.

I think current lightNVM with OCSSD aims towards a drive-managed device for
generic filesystems. Depending on FTL, however, OCSSD can report conventional
or sequential zones. 1) If FTL handles random 4K writes pretty well, it would
be better to report converntional zones. Otherwise, 2) if FTL has almost nothing
to map bettwen LBA to PBA, it is able to report sequential zones likewise pure
host-managed SMR.

Interestingly, for 1) host-aware model, there is no need to change f2fs at all.
In order to explore 2) pure host-managed model, I introduced aligned write IO
[2] to make FTL more simple by eliminating partial page write. IMHO, it'd be
funny to evaluate several zoned models of SMR and OCSSD accordingly.

[1] https://lkml.org/lkml/2016/11/9/727
[2] https://lkml.org/lkml/2016/12/30/242

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block: protect iterate_bdevs() against concurrent close

2017-01-05 Thread Dan Williams
On Thu, Jan 5, 2017 at 4:03 PM, Dan Williams  wrote:
> On Thu, Dec 1, 2016 at 12:18 AM, Jan Kara  wrote:
>> From: Rabin Vincent 
>>
>> If a block device is closed while iterate_bdevs() is handling it, the
>> following NULL pointer dereference occurs because bdev->b_disk is NULL
>> in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
>> turn called by the mapping_cap_writeback_dirty() call in
>> __filemap_fdatawrite_range()):
>>
>>  BUG: unable to handle kernel NULL pointer dereference at 0508
>>  IP: [] blk_get_backing_dev_info+0x10/0x20
>>  PGD 9e62067 PUD 9ee8067 PMD 0
>>  Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>  Modules linked in:
>>  CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
>>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>>  task: 880009f4d700 ti: 880009f5c000 task.ti: 880009f5c000
>>  RIP: 0010:[]  [] 
>> blk_get_backing_dev_info+0x10/0x20
>>  RSP: 0018:880009f5fe68  EFLAGS: 00010246
>>  RAX:  RBX: 88000ec17a38 RCX: 81a4e940
>>  RDX: 7fff RSI:  RDI: 88000ec176c0
>>  RBP: 880009f5fe68 R08:  R09: 
>>  R10: 0001 R11:  R12: 88000ec17860
>>  R13: 811b25c0 R14: 88000ec178e0 R15: 88000ec17a38
>>  FS:  7faee505d700() GS:88000fb0() knlGS:
>>  CS:  0010 DS:  ES:  CR0: 8005003b
>>  CR2: 0508 CR3: 09e8a000 CR4: 06e0
>>  Stack:
>>   880009f5feb8 8112e7f5  7fff
>>     7fff 0001
>>   88000ec178e0 88000ec17860 880009f5fec8 8112e81f
>>  Call Trace:
>>   [] __filemap_fdatawrite_range+0x85/0x90
>>   [] filemap_fdatawrite+0x1f/0x30
>>   [] fdatawrite_one_bdev+0x16/0x20
>>   [] iterate_bdevs+0xf2/0x130
>>   [] sys_sync+0x63/0x90
>>   [] entry_SYSCALL_64_fastpath+0x12/0x76
>>  Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 
>> 00 5d
>>  RIP  [] blk_get_backing_dev_info+0x10/0x20
>>   RSP 
>>  CR2: 0508
>>  ---[ end trace 2487336ceb3de62d ]---
>>
>> The crash is easily reproducible by running the following command, if an
>> msleep(100) is inserted before the call to func() in iterate_devs():
>>
>>  while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
>>
>> Fix it by holding the bd_mutex across the func() call and only calling
>> func() if the bdev is opened.
>>
>> Cc: sta...@vger.kernel.org
>> Fixes: 5c0d6b60a0ba46d45020547eacf7199171920935
>> Reported-and-tested-by: Wei Fang 
>> Signed-off-by: Rabin Vincent 
>> Signed-off-by: Jan Kara 
>> ---
>>  fs/block_dev.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 05b553368bb4..899fa8ccc347 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1950,6 +1950,7 @@ void iterate_bdevs(void (*func)(struct block_device *, 
>> void *), void *arg)
>> spin_lock(_superblock->s_inode_list_lock);
>> list_for_each_entry(inode, _superblock->s_inodes, 
>> i_sb_list) {
>> struct address_space *mapping = inode->i_mapping;
>> +   struct block_device *bdev;
>>
>> spin_lock(>i_lock);
>> if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
>> @@ -1970,8 +1971,12 @@ void iterate_bdevs(void (*func)(struct block_device 
>> *, void *), void *arg)
>>  */
>> iput(old_inode);
>> old_inode = inode;
>> +   bdev = I_BDEV(inode);
>>
>> -   func(I_BDEV(inode), arg);
>> +   mutex_lock(>bd_mutex);
>> +   if (bdev->bd_openers)
>> +   func(bdev, arg);
>> +   mutex_unlock(>bd_mutex);
>>
>> spin_lock(_superblock->s_inode_list_lock);
>> }
>> --
>
>
> Hi,
>
> I hit a bug with a similar signature back in October:
>
> http://marc.info/?l=linux-block=147769594003740=2
>
> ...and still see it in 4.10-rc2.
>
> My reproducer is not very reliable. I'm thinking this fix doesn't work
> because it assumes the only race is close vs sync, but my case is
> device-shutdown vs sync. In fact iterate_bdevs() is not in my
> backtrace:
>
> [ 5750.941063] BUG: unable to handle kernel NULL pointer dereference
> at 0568
> [..]
> [ 5750.959449] CPU: 1 PID: 8910 Comm: lt-libndctl Tainted: G
> O4.10.0-rc2+ #672
> [ 5750.961283] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
> [ 5750.963538] task: 88033173c600 task.stack: c90006a8
> [ 5750.964694] RIP: 0010:blk_get_backing_dev_info+0x10/0x20
> [ 5750.965774] RSP: 0018:c90006a83b00 EFLAGS: 00010246

Re: [PATCH] block: protect iterate_bdevs() against concurrent close

2017-01-05 Thread Dan Williams
On Thu, Dec 1, 2016 at 12:18 AM, Jan Kara  wrote:
> From: Rabin Vincent 
>
> If a block device is closed while iterate_bdevs() is handling it, the
> following NULL pointer dereference occurs because bdev->b_disk is NULL
> in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
> turn called by the mapping_cap_writeback_dirty() call in
> __filemap_fdatawrite_range()):
>
>  BUG: unable to handle kernel NULL pointer dereference at 0508
>  IP: [] blk_get_backing_dev_info+0x10/0x20
>  PGD 9e62067 PUD 9ee8067 PMD 0
>  Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>  Modules linked in:
>  CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>  task: 880009f4d700 ti: 880009f5c000 task.ti: 880009f5c000
>  RIP: 0010:[]  [] 
> blk_get_backing_dev_info+0x10/0x20
>  RSP: 0018:880009f5fe68  EFLAGS: 00010246
>  RAX:  RBX: 88000ec17a38 RCX: 81a4e940
>  RDX: 7fff RSI:  RDI: 88000ec176c0
>  RBP: 880009f5fe68 R08:  R09: 
>  R10: 0001 R11:  R12: 88000ec17860
>  R13: 811b25c0 R14: 88000ec178e0 R15: 88000ec17a38
>  FS:  7faee505d700() GS:88000fb0() knlGS:
>  CS:  0010 DS:  ES:  CR0: 8005003b
>  CR2: 0508 CR3: 09e8a000 CR4: 06e0
>  Stack:
>   880009f5feb8 8112e7f5  7fff
>     7fff 0001
>   88000ec178e0 88000ec17860 880009f5fec8 8112e81f
>  Call Trace:
>   [] __filemap_fdatawrite_range+0x85/0x90
>   [] filemap_fdatawrite+0x1f/0x30
>   [] fdatawrite_one_bdev+0x16/0x20
>   [] iterate_bdevs+0xf2/0x130
>   [] sys_sync+0x63/0x90
>   [] entry_SYSCALL_64_fastpath+0x12/0x76
>  Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 
> 5d
>  RIP  [] blk_get_backing_dev_info+0x10/0x20
>   RSP 
>  CR2: 0508
>  ---[ end trace 2487336ceb3de62d ]---
>
> The crash is easily reproducible by running the following command, if an
> msleep(100) is inserted before the call to func() in iterate_devs():
>
>  while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
>
> Fix it by holding the bd_mutex across the func() call and only calling
> func() if the bdev is opened.
>
> Cc: sta...@vger.kernel.org
> Fixes: 5c0d6b60a0ba46d45020547eacf7199171920935
> Reported-and-tested-by: Wei Fang 
> Signed-off-by: Rabin Vincent 
> Signed-off-by: Jan Kara 
> ---
>  fs/block_dev.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 05b553368bb4..899fa8ccc347 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1950,6 +1950,7 @@ void iterate_bdevs(void (*func)(struct block_device *, 
> void *), void *arg)
> spin_lock(_superblock->s_inode_list_lock);
> list_for_each_entry(inode, _superblock->s_inodes, i_sb_list) 
> {
> struct address_space *mapping = inode->i_mapping;
> +   struct block_device *bdev;
>
> spin_lock(>i_lock);
> if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
> @@ -1970,8 +1971,12 @@ void iterate_bdevs(void (*func)(struct block_device *, 
> void *), void *arg)
>  */
> iput(old_inode);
> old_inode = inode;
> +   bdev = I_BDEV(inode);
>
> -   func(I_BDEV(inode), arg);
> +   mutex_lock(>bd_mutex);
> +   if (bdev->bd_openers)
> +   func(bdev, arg);
> +   mutex_unlock(>bd_mutex);
>
> spin_lock(_superblock->s_inode_list_lock);
> }
> --


Hi,

I hit a bug with a similar signature back in October:

http://marc.info/?l=linux-block=147769594003740=2

...and still see it in 4.10-rc2.

My reproducer is not very reliable. I'm thinking this fix doesn't work
because it assumes the only race is close vs sync, but my case is
device-shutdown vs sync. In fact iterate_bdevs() is not in my
backtrace:

[ 5750.941063] BUG: unable to handle kernel NULL pointer dereference
at 0568
[..]
[ 5750.959449] CPU: 1 PID: 8910 Comm: lt-libndctl Tainted: G
O4.10.0-rc2+ #672
[ 5750.961283] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
[ 5750.963538] task: 88033173c600 task.stack: c90006a8
[ 5750.964694] RIP: 0010:blk_get_backing_dev_info+0x10/0x20
[ 5750.965774] RSP: 0018:c90006a83b00 EFLAGS: 00010246
[ 5750.966842] RAX:  RBX: c90006a83b60 RCX: 
[ 5750.968136] RDX: 0001 RSI:  RDI: 88031b3a8040
[ 5750.969429] RBP: c90006a83b00 R08: 

Re: [PATCH] virtio_blk: avoid DMA to stack for the sense buffer

2017-01-05 Thread Christoph Hellwig
On Thu, Jan 05, 2017 at 11:37:46AM +0100, 王金浦 wrote:
> Thanks, so it's only relevant to kernel > 4.9, as  CONFIG_VMAP_STACK
> only introduced in 4.9 kernel.

kernel >= 4.9, but otherwise, yes.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Two small fixes that address issues introduced in kernel v4.10-rc1

2017-01-05 Thread Bart Van Assche
On Mon, 2017-01-02 at 09:49 -0700, Jens Axboe wrote:
> I hand applied both of these, came through mangled for some reason.

Hello Jens,

Sorry that the patches came through mangled. Since my employer is
migrating mailboxes from the @sandisk.com domain to the @wdc.com
domain the only option I have for sending patches from my work
address without using a VPN is by using the Evolution e-mail client.
I will double check the configuration of this e-mail client.

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_blk: avoid DMA to stack for the sense buffer

2017-01-05 Thread Christoph Hellwig
On Wed, Jan 04, 2017 at 04:47:03PM +0100, 王金浦 wrote:
> This sounds scary.
> Could you share how to reproduce it, this should go into stable if
> it's the case.

Step 1: Build your kernel with CONFIG_VMAP_STACK=y
Step 2: issue a SG_IO ioctl, e.g. sg_inq /dev/vda

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html