Re: dm-crypt: Fix error with too large bios

2016-08-29 Thread Mikulas Patocka


On Mon, 29 Aug 2016, Mike Snitzer wrote:

> On Sat, Aug 27 2016 at 11:09am -0400,
> Mikulas Patocka  wrote:
> 
> > 
> > 
> > On Fri, 26 Aug 2016, Mike Snitzer wrote:
> > 
> > > On Thu, Aug 25 2016 at  4:13pm -0400,
> > > Jens Axboe  wrote:
> > > 
> > > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> > > > >
> > > > >Device mapper can't split the bio in generic_make_request - it frees 
> > > > >the
> > > > >md->queue->bio_split bioset, to save one kernel thread per device. 
> > > > >Device
> > > > >mapper uses its own splitting mechanism.
> > > > >
> > > > >So what is the final decision? - should device mapper split the big 
> > > > >bio or
> > > > >should bcache not submit big bios?
> > > > >
> > > > >I think splitting big bios in the device mapper is better - simply 
> > > > >because
> > > > >it is much less code than reworking bcache to split bios internally.
> > > > >
> > > > >BTW. In the device mapper, we have a layer dm-io, that was created to 
> > > > >work
> > > > >around bio size limitations - it accepts unlimited I/O request and 
> > > > >splits
> > > > >it to several bios. When bio size limitations are gone, we could 
> > > > >simplify
> > > > >dm-io too.
> > > > 
> > > > The patch from Ming Lei was applied for 4.8 the other day.
> > > 
> > > See linux-block commit:
> > > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=4d70dca4eadf2f95abe389116ac02b8439c2d16c
> > 
> > But this patch won't work for device mapper, blk_bio_segment_split is 
> > called from blk_queue_split and device mapper doesn't use blk_queue_split 
> > (it can't because it frees queue->bio_split).
> > 
> > We still need these two patches:
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> 
> I'm left wondering whether it'd be better to revert commit dbba42d8a9e
> ("dm: eliminate unused "bioset" process for each bio-based DM device")

That would create one more process per dm device.

There is no need to create another process if the bio can be split in the 
device mapper.

> And also look for other areas of DM that could leverage the block core's
> relatively new splitting capabilities.
> 
> Otherwise, we'll just be sprinking more BIO_MAX_PAGES-based splitting
> code in DM targets because DM is so "special" that it doesn't need the
> block core's splitting (even though it clearly would now benefit).

It would be possible to split bios on BIO_MAX_PAGES boundary in the dm 
core. But I checked all the dm targets and I found problem only in 
dm-crypt and dm-log-writes. So there is no need to split bio bios for the 
other targets.

> Mike

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race

2016-08-29 Thread Tejun Heo
Hello,

On Sat, Aug 27, 2016 at 11:30:22AM +0200, Vegard Nossum wrote:
> > Don't know what's the right fix, but I posted a slightly different one
> > for the same crash some months ago:
> > https://patchwork.kernel.org/patch/8556941/
> > 
> 
> Ah, I'm sorry, I didn't see that.
> 
> Your patch is 100% identical to my first attempt at a fix and I can
> confirm that it also fixes the problem for me.
> 
> If people who are more savvy in block/fs code could ack the locking bits
> I think we should apply the patch ASAP because it's an easy local DOS if
> you have (open/read) access to any block device.

I think the right thing to do there is doing blkdev_get() /
blkdev_put() around func() invocation in iterate_bdevs() rather than
holding bd_mutex across the callback.  Can you please verify whether
that works?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops when completing request on the wrong queue

2016-08-29 Thread Jens Axboe

On 08/29/2016 12:06 PM, Gabriel Krisman Bertazi wrote:

Jens Axboe  writes:

Can you try this patch? It's not perfect, but I'll be interested if it
makes a difference for you.




Hi Jens,

Sorry for the delay.  I just got back to this and have been running your
patch on top of 4.8 without a crash for over 1 hour.  I wanna give it
more time to make sure it's running properly, though.

Let me get back to you after a few more rounds of test.


Thanks, sounds good. The patches have landed in mainline too.


This one should handle the WARN_ON() for running the hw queue on the
wrong CPU as well.


On the workaround you added to prevent WARN_ON, we surely need to
prevent blk_mq_hctx_next_cpu from scheduling dead cpus in the first
place, right..  How do you feel about the following RFC?  I know it's
not a complete fix, but it feels like a good improvement to me.

http://www.spinics.net/lists/linux-scsi/msg98608.html


But we can't completely prevent it, and I don't think we have to. I just
don't want to trigger a warning for something that's a valid condition.
I want the warning to trigger if this happens without the CPU going
offline, since then it's indicative of a real bug in the mapping. Your
patch isn't going to prevent it either - it'll shrink the window, at the
expense of making blk_mq_hctx_next_cpu() more expensive. So I don't
think it's worthwhile.


--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dm-crypt: Fix error with too large bios

2016-08-29 Thread Mike Snitzer
On Sat, Aug 27 2016 at 11:09am -0400,
Mikulas Patocka  wrote:

> 
> 
> On Fri, 26 Aug 2016, Mike Snitzer wrote:
> 
> > On Thu, Aug 25 2016 at  4:13pm -0400,
> > Jens Axboe  wrote:
> > 
> > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> > > >
> > > >Device mapper can't split the bio in generic_make_request - it frees the
> > > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> > > >mapper uses its own splitting mechanism.
> > > >
> > > >So what is the final decision? - should device mapper split the big bio 
> > > >or
> > > >should bcache not submit big bios?
> > > >
> > > >I think splitting big bios in the device mapper is better - simply 
> > > >because
> > > >it is much less code than reworking bcache to split bios internally.
> > > >
> > > >BTW. In the device mapper, we have a layer dm-io, that was created to 
> > > >work
> > > >around bio size limitations - it accepts unlimited I/O request and splits
> > > >it to several bios. When bio size limitations are gone, we could simplify
> > > >dm-io too.
> > > 
> > > The patch from Ming Lei was applied for 4.8 the other day.
> > 
> > See linux-block commit:
> > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=4d70dca4eadf2f95abe389116ac02b8439c2d16c
> 
> But this patch won't work for device mapper, blk_bio_segment_split is 
> called from blk_queue_split and device mapper doesn't use blk_queue_split 
> (it can't because it frees queue->bio_split).
> 
> We still need these two patches:
> https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html

I'm left wondering whether it'd be better to revert commit dbba42d8a9e
("dm: eliminate unused "bioset" process for each bio-based DM device")

And also look for other areas of DM that could leverage the block core's
relatively new splitting capabilities.

Otherwise, we'll just be sprinking more BIO_MAX_PAGES-based splitting
code in DM targets because DM is so "special" that it doesn't need the
block core's splitting (even though it clearly would now benefit).

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops when completing request on the wrong queue

2016-08-29 Thread Gabriel Krisman Bertazi
Jens Axboe  writes:
>> Can you try this patch? It's not perfect, but I'll be interested if it
>> makes a difference for you.
>

Hi Jens,

Sorry for the delay.  I just got back to this and have been running your
patch on top of 4.8 without a crash for over 1 hour.  I wanna give it
more time to make sure it's running properly, though.

Let me get back to you after a few more rounds of test.

> This one should handle the WARN_ON() for running the hw queue on the
> wrong CPU as well.

On the workaround you added to prevent WARN_ON, we surely need to
prevent blk_mq_hctx_next_cpu from scheduling dead cpus in the first
place, right..  How do you feel about the following RFC?  I know it's
not a complete fix, but it feels like a good improvement to me.

http://www.spinics.net/lists/linux-scsi/msg98608.html

-- 
Gabriel Krisman Bertazi

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] nvme: remove the post_scan callout

2016-08-29 Thread Christoph Hellwig
No need now that we don't have to reverse engineer the irq affinity.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 3 ---
 drivers/nvme/host/nvme.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2feacc7..b245616 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1826,9 +1826,6 @@ static void nvme_scan_work(struct work_struct *work)
list_sort(NULL, >namespaces, ns_cmp);
mutex_unlock(>namespaces_mutex);
kfree(id);
-
-   if (ctrl->ops->post_scan)
-   ctrl->ops->post_scan(ctrl);
 }
 
 void nvme_queue_scan(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ab18b78..99e4c16 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -184,7 +184,6 @@ struct nvme_ctrl_ops {
int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
int (*reset_ctrl)(struct nvme_ctrl *ctrl);
void (*free_ctrl)(struct nvme_ctrl *ctrl);
-   void (*post_scan)(struct nvme_ctrl *ctrl);
void (*submit_async_event)(struct nvme_ctrl *ctrl, int aer_idx);
int (*delete_ctrl)(struct nvme_ctrl *ctrl);
const char *(*get_subsysnqn)(struct nvme_ctrl *ctrl);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html