Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-11 Thread Sagi Grimberg
diff --git a/block/blk-mq.c b/block/blk-mq.c index 75336848f7a7..81ced3096433 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -444,6 +444,10 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, return ERR_PTR(-EXDEV); } cpu =

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 11:31:37AM +0300, Sagi Grimberg wrote: > > > > My device exposes nr_hw_queues which is not higher than num_online_cpus > > > so I want to connect all hctxs with hope that they will be used. > > > > The issue is that CPU online & offline can happen any time, and after > >

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Sagi Grimberg
Hi Sagi Sorry for the late response, bellow patch works, here is the full log: Thanks for testing! Now that we isolated the issue, the question is if this fix is correct given that we are guaranteed that the connect context will run on an online cpu? another reference to the patch (we can

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Yi Zhang
On 04/09/2018 04:54 PM, Yi Zhang wrote: On 04/09/2018 04:31 PM, Sagi Grimberg wrote: My device exposes nr_hw_queues which is not higher than num_online_cpus so I want to connect all hctxs with hope that they will be used. The issue is that CPU online & offline can happen any time, and

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Yi Zhang
On 04/09/2018 04:31 PM, Sagi Grimberg wrote: My device exposes nr_hw_queues which is not higher than num_online_cpus so I want to connect all hctxs with hope that they will be used. The issue is that CPU online & offline can happen any time, and after blk-mq removes CPU hotplug handler,

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Sagi Grimberg
My device exposes nr_hw_queues which is not higher than num_online_cpus so I want to connect all hctxs with hope that they will be used. The issue is that CPU online & offline can happen any time, and after blk-mq removes CPU hotplug handler, there is no way to remap queue when CPU topo is

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 04:35:59PM +0300, Sagi Grimberg wrote: > > > On 04/08/2018 03:57 PM, Ming Lei wrote: > > On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote: > > > > > > > > > > > > Hi Sagi > > > > > > > > > > > > > > > > > > Still can reproduce this issue with the change: >

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Sagi Grimberg
On 04/08/2018 03:57 PM, Ming Lei wrote: On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote: Hi Sagi Still can reproduce this issue with the change: Thanks for validating Yi, Would it be possible to test the following: -- diff --git a/block/blk-mq.c b/block/blk-mq.c index

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote: > > > > > > > > Hi Sagi > > > > > > > > > > > > > > Still can reproduce this issue with the change: > > > > > > > > > > > > Thanks for validating Yi, > > > > > > > > > > > > Would it be possible to test the following: > > > > > >

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Sagi Grimberg
Hi Sagi Still can reproduce this issue with the change: Thanks for validating Yi, Would it be possible to test the following: -- diff --git a/block/blk-mq.c b/block/blk-mq.c index 75336848f7a7..81ced3096433 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -444,6 +444,10 @@ struct request

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 01:58:49PM +0300, Sagi Grimberg wrote: > > > > > > Hi Sagi > > > > > > > > > > Still can reproduce this issue with the change: > > > > > > > > Thanks for validating Yi, > > > > > > > > Would it be possible to test the following: > > > > -- > > > > diff --git

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Sagi Grimberg
Hi Sagi Still can reproduce this issue with the change: Thanks for validating Yi, Would it be possible to test the following: -- diff --git a/block/blk-mq.c b/block/blk-mq.c index 75336848f7a7..81ced3096433 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -444,6 +444,10 @@ struct request

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 06:44:33PM +0800, Ming Lei wrote: > On Sun, Apr 08, 2018 at 01:36:27PM +0300, Sagi Grimberg wrote: > > > > > Hi Sagi > > > > > > Still can reproduce this issue with the change: > > > > Thanks for validating Yi, > > > > Would it be possible to test the following: > > --

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 01:36:27PM +0300, Sagi Grimberg wrote: > > > Hi Sagi > > > > Still can reproduce this issue with the change: > > Thanks for validating Yi, > > Would it be possible to test the following: > -- > diff --git a/block/blk-mq.c b/block/blk-mq.c > index

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Sagi Grimberg
Hi Sagi Still can reproduce this issue with the change: Thanks for validating Yi, Would it be possible to test the following: -- diff --git a/block/blk-mq.c b/block/blk-mq.c index 75336848f7a7..81ced3096433 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -444,6 +444,10 @@ struct request

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-05 Thread Yi Zhang
On 04/04/2018 09:22 PM, Sagi Grimberg wrote: On 03/30/2018 12:32 PM, Yi Zhang wrote: Hello I got this kernel BUG on 4.16.0-rc7, here is the reproducer and log, let me know if you need more info, thanks. Reproducer: 1. setup target #nvmetcli restore /etc/rdma.json 2. connect target on

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-04 Thread Sagi Grimberg
On 03/30/2018 12:32 PM, Yi Zhang wrote: Hello I got this kernel BUG on 4.16.0-rc7, here is the reproducer and log, let me know if you need more info, thanks. Reproducer: 1. setup target #nvmetcli restore /etc/rdma.json 2. connect target on host #nvme connect-all -t rdma -a $IP -s 4420during