Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-10 Thread Ming Lei
Hi Laurence,

Great thanks for your so quick test!

On Fri, May 11, 2018 at 5:59 AM, Laurence Oberman  wrote:
> On Thu, 2018-05-10 at 18:28 +0800, Ming Lei wrote:
>> On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote:
>> > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
>> > > Hi,
>> > >
>> > > The 1st patch introduces blk_quiesce_timeout() and
>> > > blk_unquiesce_timeout()
>> > > for NVMe, meantime fixes blk_sync_queue().
>> > >
>> > > The 2nd patch covers timeout for admin commands for recovering
>> > > controller
>> > > for avoiding possible deadlock.
>> > >
>> > > The 3rd and 4th patches avoid to wait_freeze on queues which
>> > > aren't
>> > > frozen.
>> > >
>> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
>> > > and
>> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
>> > > timeout
>> > > mecanism become much more rebost than before.
>> > >
>> > > gitweb:
>> > >   https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
>> > >
>> > > V4:
>> > >   - fixe nvme_init_set_host_mem_cmd()
>> > >   - use nested EH model, and run both nvme_dev_disable() and
>> > >   resetting in one same context
>> > >
>> > > V3:
>> > >   - fix one new race related freezing in patch 4,
>> > > nvme_reset_work()
>> > >   may hang forever without this patch
>> > >   - rewrite the last 3 patches, and avoid to break
>> > > nvme_reset_ctrl*()
>> > >
>> > > V2:
>> > >   - fix draining timeout work, so no need to change return value
>> > > from
>> > >   .timeout()
>> > >   - fix race between nvme_start_freeze() and nvme_unfreeze()
>> > >   - cover timeout for admin commands running in EH
>> > >
>> > > Ming Lei (7):
>> > >   block: introduce blk_quiesce_timeout() and
>> > > blk_unquiesce_timeout()
>> > >   nvme: pci: cover timeout for admin commands running in EH
>> > >   nvme: pci: only wait freezing if queue is frozen
>> > >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
>> > > recovery
>> > >   nvme: core: introduce 'reset_lock' for sync reset state and
>> > > reset
>> > > activities
>> > >   nvme: pci: prepare for supporting error recovery from resetting
>> > > context
>> > >   nvme: pci: support nested EH
>> > >
>> > >  block/blk-core.c |  21 +++-
>> > >  block/blk-mq.c   |   9 ++
>> > >  block/blk-timeout.c  |   5 +-
>> > >  drivers/nvme/host/core.c |  46 ++-
>> > >  drivers/nvme/host/nvme.h |   5 +
>> > >  drivers/nvme/host/pci.c  | 304
>> > > ---
>> > >  include/linux/blkdev.h   |  13 ++
>> > >  7 files changed, 356 insertions(+), 47 deletions(-)
>> > >
>> > > Cc: Jianchao Wang 
>> > > Cc: Christoph Hellwig 
>> > > Cc: Sagi Grimberg 
>> > > Cc: linux-n...@lists.infradead.org
>> > > Cc: Laurence Oberman 
>> >
>> > Hello Ming
>> >
>> > I have a two node NUMA system here running your kernel tree
>> > 4.17.0-rc3.ming.nvme+
>> >
>> > [root@segstorage1 ~]# numactl --hardware
>> > available: 2 nodes (0-1)
>> > node 0 cpus: 0 3 5 6 8 11 13 14
>> > node 0 size: 63922 MB
>> > node 0 free: 61310 MB
>> > node 1 cpus: 1 2 4 7 9 10 12 15
>> > node 1 size: 64422 MB
>> > node 1 free: 62372 MB
>> > node distances:
>> > node   0   1
>> >   0:  10  20
>> >   1:  20  10
>> >
>> > I ran block/011
>> >
>> > [root@segstorage1 blktests]# ./check block/011
>> > block/011 => nvme0n1 (disable PCI device while doing
>> > I/O)[failed]
>> > runtime...  106.936s
>> > --- tests/block/011.out 2018-05-05 18:01:14.268414752
>> > -0400
>> > +++ results/nvme0n1/block/011.out.bad   2018-05-05
>> > 19:07:21.028634858 -0400
>> > @@ -1,2 +1,36 @@
>> >  Running block/011
>> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> > ...
>> > (Run 'diff -u tests/block/011.out
>> > results/nvme0n1/block/011.out.bad' to see the entire diff)
>> >
>> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
>> > [ 1452.676351] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.718221] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
>> > [ 1452.760890] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760894] nvme nvme0: controller is down; 

Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-10 Thread Laurence Oberman
On Thu, 2018-05-10 at 18:28 +0800, Ming Lei wrote:
> On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote:
> > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > > Hi,
> > > 
> > > The 1st patch introduces blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > > for NVMe, meantime fixes blk_sync_queue().
> > > 
> > > The 2nd patch covers timeout for admin commands for recovering
> > > controller
> > > for avoiding possible deadlock.
> > > 
> > > The 3rd and 4th patches avoid to wait_freeze on queues which
> > > aren't
> > > frozen.
> > > 
> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > > and
> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > > timeout
> > > mecanism become much more rebost than before.
> > > 
> > > gitweb:
> > >   https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > > 
> > > V4:
> > >   - fixe nvme_init_set_host_mem_cmd()
> > >   - use nested EH model, and run both nvme_dev_disable() and
> > >   resetting in one same context
> > > 
> > > V3:
> > >   - fix one new race related freezing in patch 4,
> > > nvme_reset_work()
> > >   may hang forever without this patch
> > >   - rewrite the last 3 patches, and avoid to break
> > > nvme_reset_ctrl*()
> > > 
> > > V2:
> > >   - fix draining timeout work, so no need to change return value
> > > from
> > >   .timeout()
> > >   - fix race between nvme_start_freeze() and nvme_unfreeze()
> > >   - cover timeout for admin commands running in EH
> > > 
> > > Ming Lei (7):
> > >   block: introduce blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > >   nvme: pci: cover timeout for admin commands running in EH
> > >   nvme: pci: only wait freezing if queue is frozen
> > >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > > recovery
> > >   nvme: core: introduce 'reset_lock' for sync reset state and
> > > reset
> > > activities
> > >   nvme: pci: prepare for supporting error recovery from resetting
> > > context
> > >   nvme: pci: support nested EH
> > > 
> > >  block/blk-core.c |  21 +++-
> > >  block/blk-mq.c   |   9 ++
> > >  block/blk-timeout.c  |   5 +-
> > >  drivers/nvme/host/core.c |  46 ++-
> > >  drivers/nvme/host/nvme.h |   5 +
> > >  drivers/nvme/host/pci.c  | 304
> > > ---
> > >  include/linux/blkdev.h   |  13 ++
> > >  7 files changed, 356 insertions(+), 47 deletions(-)
> > > 
> > > Cc: Jianchao Wang 
> > > Cc: Christoph Hellwig 
> > > Cc: Sagi Grimberg 
> > > Cc: linux-n...@lists.infradead.org
> > > Cc: Laurence Oberman 
> > 
> > Hello Ming
> > 
> > I have a two node NUMA system here running your kernel tree
> > 4.17.0-rc3.ming.nvme+
> > 
> > [root@segstorage1 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 3 5 6 8 11 13 14
> > node 0 size: 63922 MB
> > node 0 free: 61310 MB
> > node 1 cpus: 1 2 4 7 9 10 12 15
> > node 1 size: 64422 MB
> > node 1 free: 62372 MB
> > node distances:
> > node   0   1 
> >   0:  10  20 
> >   1:  20  10 
> > 
> > I ran block/011
> > 
> > [root@segstorage1 blktests]# ./check block/011
> > block/011 => nvme0n1 (disable PCI device while doing
> > I/O)[failed]
> > runtime...  106.936s
> > --- tests/block/011.out 2018-05-05 18:01:14.268414752
> > -0400
> > +++ results/nvme0n1/block/011.out.bad   2018-05-05
> > 19:07:21.028634858 -0400
> > @@ -1,2 +1,36 @@
> >  Running block/011
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ...
> > (Run 'diff -u tests/block/011.out
> > results/nvme0n1/block/011.out.bad' to see the entire diff)
> > 
> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> > [ 1452.676351] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718221] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
> > [ 1452.760890] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760894] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760897] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760900] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 

Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-10 Thread Ming Lei
On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > Hi,
> > 
> > The 1st patch introduces blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> > for NVMe, meantime fixes blk_sync_queue().
> > 
> > The 2nd patch covers timeout for admin commands for recovering
> > controller
> > for avoiding possible deadlock.
> > 
> > The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> > frozen.
> > 
> > The last 4 patches fixes several races wrt. NVMe timeout handler, and
> > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > timeout
> > mecanism become much more rebost than before.
> > 
> > gitweb:
> > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > 
> > V4:
> > - fixe nvme_init_set_host_mem_cmd()
> > - use nested EH model, and run both nvme_dev_disable() and
> > resetting in one same context
> > 
> > V3:
> > - fix one new race related freezing in patch 4,
> > nvme_reset_work()
> > may hang forever without this patch
> > - rewrite the last 3 patches, and avoid to break
> > nvme_reset_ctrl*()
> > 
> > V2:
> > - fix draining timeout work, so no need to change return value
> > from
> > .timeout()
> > - fix race between nvme_start_freeze() and nvme_unfreeze()
> > - cover timeout for admin commands running in EH
> > 
> > Ming Lei (7):
> >   block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
> >   nvme: pci: cover timeout for admin commands running in EH
> >   nvme: pci: only wait freezing if queue is frozen
> >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > recovery
> >   nvme: core: introduce 'reset_lock' for sync reset state and reset
> > activities
> >   nvme: pci: prepare for supporting error recovery from resetting
> > context
> >   nvme: pci: support nested EH
> > 
> >  block/blk-core.c |  21 +++-
> >  block/blk-mq.c   |   9 ++
> >  block/blk-timeout.c  |   5 +-
> >  drivers/nvme/host/core.c |  46 ++-
> >  drivers/nvme/host/nvme.h |   5 +
> >  drivers/nvme/host/pci.c  | 304
> > ---
> >  include/linux/blkdev.h   |  13 ++
> >  7 files changed, 356 insertions(+), 47 deletions(-)
> > 
> > Cc: Jianchao Wang 
> > Cc: Christoph Hellwig 
> > Cc: Sagi Grimberg 
> > Cc: linux-n...@lists.infradead.org
> > Cc: Laurence Oberman 
> 
> Hello Ming
> 
> I have a two node NUMA system here running your kernel tree
> 4.17.0-rc3.ming.nvme+
> 
> [root@segstorage1 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 3 5 6 8 11 13 14
> node 0 size: 63922 MB
> node 0 free: 61310 MB
> node 1 cpus: 1 2 4 7 9 10 12 15
> node 1 size: 64422 MB
> node 1 free: 62372 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> 
> I ran block/011
> 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)[failed]
> runtime...  106.936s
> --- tests/block/011.out   2018-05-05 18:01:14.268414752 -0400
> +++ results/nvme0n1/block/011.out.bad 2018-05-05
> 19:07:21.028634858 -0400
> @@ -1,2 +1,36 @@
>  Running block/011
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ...
> (Run 'diff -u tests/block/011.out
> results/nvme0n1/block/011.out.bad' to see the entire diff)
> 
> [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718239] nvme nvme0: EH 0: before shutdown
> [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
> 

Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-09 Thread Ming Lei
On Wed, May 09, 2018 at 01:46:09PM +0800, jianchao.wang wrote:
> Hi ming
> 
> I did some tests on my local.
> 
> [  598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller
> 
> This should be a timeout on nvme_reset_dev->nvme_wait_freeze.
> 
> [  598.828743] nvme nvme0: EH 1: before shutdown
> [  599.013586] nvme nvme0: EH 1: after shutdown
> [  599.137197] nvme nvme0: EH 1: after recovery
> 
> The EH 1 have mark the state to LIVE
> 
> [  599.137241] nvme nvme0: failed to mark controller state 1
> 
> So the EH 0 failed to mark state to LIVE
> The card was removed.
> This should not be expected by nested EH.

Right.

> 
> [  599.137322] nvme nvme0: Removing after probe failure status: 0
> [  599.326539] nvme nvme0: EH 0: after recovery
> [  599.326760] nvme0n1: detected capacity change from 128035676160 to 0
> [  599.457208] nvme nvme0: failed to set APST feature (-19)
> 
> nvme_reset_dev should identify whether it is nested.

The above should be caused by race between updating controller state,
hope I can find some time in this week to investigate it further.

Also maybe we can change to remove controller until nested EH has
been tried enough times.

Thanks,
Ming


Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-08 Thread jianchao.wang
Hi ming

I did some tests on my local.

[  598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller

This should be a timeout on nvme_reset_dev->nvme_wait_freeze.

[  598.828743] nvme nvme0: EH 1: before shutdown
[  599.013586] nvme nvme0: EH 1: after shutdown
[  599.137197] nvme nvme0: EH 1: after recovery

The EH 1 have mark the state to LIVE

[  599.137241] nvme nvme0: failed to mark controller state 1

So the EH 0 failed to mark state to LIVE
The card was removed.
This should not be expected by nested EH.

[  599.137322] nvme nvme0: Removing after probe failure status: 0
[  599.326539] nvme nvme0: EH 0: after recovery
[  599.326760] nvme0n1: detected capacity change from 128035676160 to 0
[  599.457208] nvme nvme0: failed to set APST feature (-19)

nvme_reset_dev should identify whether it is nested.

Thanks
Jianchao


Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-08 Thread Keith Busch
On Sat, May 05, 2018 at 07:51:22PM -0400, Laurence Oberman wrote:
> 3rd and 4th attempts slightly better, but clearly not dependable
> 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)[failed]
> runtime...  81.188s
> --- tests/block/011.out   2018-05-05 18:01:14.268414752 -0400
> +++ results/nvme0n1/block/011.out.bad 2018-05-05
> 19:44:48.848568687 -0400
> @@ -1,2 +1,3 @@
>  Running block/011
> +tests/block/011: line 47: echo: write error: Input/output error
>  Test complete
> 
> This one passed 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)[passed]
> runtime  81.188s  ...  43.400s
> 
> I will capture a vmcore next time it panics and give some information
> after analyzing the core

We definitely should never panic, but I am not sure this blktest can be
reliable on IO errors: the test is disabling memory space enabling and
bus master without the driver's knowledge, and it does this repeatedly
in a tight loop. If the test happens to disable the device while the
driver is trying to recover from the previous iteration, the recovery
will surely fail, so I think IO errors may possibly be expected.

As far as I can tell, the only way you'll actually get it to succeed is
if the test's subsequent "enable" happen's to hit in conjuction with the
driver's reset pci_enable_device_mem(), such that the pci_dev's enable_cnt
is > 1, which prevents the disabling for the remainder of the test's
looping.

I still think this is a very good test, but we might be able to make it
more deterministic on what actually happens to the pci device.


Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-05 Thread Laurence Oberman
On Sat, 2018-05-05 at 19:31 -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05 at 19:11 -0400, Laurence Oberman wrote:
> > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > > Hi,
> > > 
> > > The 1st patch introduces blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > > for NVMe, meantime fixes blk_sync_queue().
> > > 
> > > The 2nd patch covers timeout for admin commands for recovering
> > > controller
> > > for avoiding possible deadlock.
> > > 
> > > The 3rd and 4th patches avoid to wait_freeze on queues which
> > > aren't
> > > frozen.
> > > 
> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > > and
> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > > timeout
> > > mecanism become much more rebost than before.
> > > 
> > > gitweb:
> > >   https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > > 
> > > V4:
> > >   - fixe nvme_init_set_host_mem_cmd()
> > >   - use nested EH model, and run both nvme_dev_disable() and
> > >   resetting in one same context
> > > 
> > > V3:
> > >   - fix one new race related freezing in patch 4,
> > > nvme_reset_work()
> > >   may hang forever without this patch
> > >   - rewrite the last 3 patches, and avoid to break
> > > nvme_reset_ctrl*()
> > > 
> > > V2:
> > >   - fix draining timeout work, so no need to change return value
> > > from
> > >   .timeout()
> > >   - fix race between nvme_start_freeze() and nvme_unfreeze()
> > >   - cover timeout for admin commands running in EH
> > > 
> > > Ming Lei (7):
> > >   block: introduce blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > >   nvme: pci: cover timeout for admin commands running in EH
> > >   nvme: pci: only wait freezing if queue is frozen
> > >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > > recovery
> > >   nvme: core: introduce 'reset_lock' for sync reset state and
> > > reset
> > > activities
> > >   nvme: pci: prepare for supporting error recovery from resetting
> > > context
> > >   nvme: pci: support nested EH
> > > 
> > >  block/blk-core.c |  21 +++-
> > >  block/blk-mq.c   |   9 ++
> > >  block/blk-timeout.c  |   5 +-
> > >  drivers/nvme/host/core.c |  46 ++-
> > >  drivers/nvme/host/nvme.h |   5 +
> > >  drivers/nvme/host/pci.c  | 304
> > > ---
> > >  include/linux/blkdev.h   |  13 ++
> > >  7 files changed, 356 insertions(+), 47 deletions(-)
> > > 
> > > Cc: Jianchao Wang 
> > > Cc: Christoph Hellwig 
> > > Cc: Sagi Grimberg 
> > > Cc: linux-n...@lists.infradead.org
> > > Cc: Laurence Oberman 
> > 
> > Hello Ming
> > 
> > I have a two node NUMA system here running your kernel tree
> > 4.17.0-rc3.ming.nvme+
> > 
> > [root@segstorage1 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 3 5 6 8 11 13 14
> > node 0 size: 63922 MB
> > node 0 free: 61310 MB
> > node 1 cpus: 1 2 4 7 9 10 12 15
> > node 1 size: 64422 MB
> > node 1 free: 62372 MB
> > node distances:
> > node   0   1 
> >   0:  10  20 
> >   1:  20  10 
> > 
> > I ran block/011
> > 
> > [root@segstorage1 blktests]# ./check block/011
> > block/011 => nvme0n1 (disable PCI device while doing
> > I/O)[failed]
> > runtime...  106.936s
> > --- tests/block/011.out 2018-05-05 18:01:14.268414752
> > -0400
> > +++ results/nvme0n1/block/011.out.bad   2018-05-05
> > 19:07:21.028634858 -0400
> > @@ -1,2 +1,36 @@
> >  Running block/011
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ...
> > (Run 'diff -u tests/block/011.out
> > results/nvme0n1/block/011.out.bad' to see the entire diff)
> > 
> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> > [ 1452.676351] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718221] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
> > [ 1452.760890] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760894] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760897] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760900] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 

Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-05 Thread Laurence Oberman
On Sat, 2018-05-05 at 19:11 -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > Hi,
> > 
> > The 1st patch introduces blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> > for NVMe, meantime fixes blk_sync_queue().
> > 
> > The 2nd patch covers timeout for admin commands for recovering
> > controller
> > for avoiding possible deadlock.
> > 
> > The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> > frozen.
> > 
> > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > and
> > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > timeout
> > mecanism become much more rebost than before.
> > 
> > gitweb:
> > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > 
> > V4:
> > - fixe nvme_init_set_host_mem_cmd()
> > - use nested EH model, and run both nvme_dev_disable() and
> > resetting in one same context
> > 
> > V3:
> > - fix one new race related freezing in patch 4,
> > nvme_reset_work()
> > may hang forever without this patch
> > - rewrite the last 3 patches, and avoid to break
> > nvme_reset_ctrl*()
> > 
> > V2:
> > - fix draining timeout work, so no need to change return value
> > from
> > .timeout()
> > - fix race between nvme_start_freeze() and nvme_unfreeze()
> > - cover timeout for admin commands running in EH
> > 
> > Ming Lei (7):
> >   block: introduce blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> >   nvme: pci: cover timeout for admin commands running in EH
> >   nvme: pci: only wait freezing if queue is frozen
> >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > recovery
> >   nvme: core: introduce 'reset_lock' for sync reset state and reset
> > activities
> >   nvme: pci: prepare for supporting error recovery from resetting
> > context
> >   nvme: pci: support nested EH
> > 
> >  block/blk-core.c |  21 +++-
> >  block/blk-mq.c   |   9 ++
> >  block/blk-timeout.c  |   5 +-
> >  drivers/nvme/host/core.c |  46 ++-
> >  drivers/nvme/host/nvme.h |   5 +
> >  drivers/nvme/host/pci.c  | 304
> > ---
> >  include/linux/blkdev.h   |  13 ++
> >  7 files changed, 356 insertions(+), 47 deletions(-)
> > 
> > Cc: Jianchao Wang 
> > Cc: Christoph Hellwig 
> > Cc: Sagi Grimberg 
> > Cc: linux-n...@lists.infradead.org
> > Cc: Laurence Oberman 
> 
> Hello Ming
> 
> I have a two node NUMA system here running your kernel tree
> 4.17.0-rc3.ming.nvme+
> 
> [root@segstorage1 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 3 5 6 8 11 13 14
> node 0 size: 63922 MB
> node 0 free: 61310 MB
> node 1 cpus: 1 2 4 7 9 10 12 15
> node 1 size: 64422 MB
> node 1 free: 62372 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> 
> I ran block/011
> 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)[failed]
> runtime...  106.936s
> --- tests/block/011.out   2018-05-05 18:01:14.268414752
> -0400
> +++ results/nvme0n1/block/011.out.bad 2018-05-05
> 19:07:21.028634858 -0400
> @@ -1,2 +1,36 @@
>  Running block/011
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ...
> (Run 'diff -u tests/block/011.out
> results/nvme0n1/block/011.out.bad' to see the entire diff)
> 
> [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718239] nvme nvme0: EH 0: before shutdown
> [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
> 

Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-05 Thread Laurence Oberman
On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> Hi,
> 
> The 1st patch introduces blk_quiesce_timeout() and
> blk_unquiesce_timeout()
> for NVMe, meantime fixes blk_sync_queue().
> 
> The 2nd patch covers timeout for admin commands for recovering
> controller
> for avoiding possible deadlock.
> 
> The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> frozen.
> 
> The last 4 patches fixes several races wrt. NVMe timeout handler, and
> finally can make blktests block/011 passed. Meantime the NVMe PCI
> timeout
> mecanism become much more rebost than before.
> 
> gitweb:
>   https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> 
> V4:
>   - fixe nvme_init_set_host_mem_cmd()
>   - use nested EH model, and run both nvme_dev_disable() and
>   resetting in one same context
> 
> V3:
>   - fix one new race related freezing in patch 4,
> nvme_reset_work()
>   may hang forever without this patch
>   - rewrite the last 3 patches, and avoid to break
> nvme_reset_ctrl*()
> 
> V2:
>   - fix draining timeout work, so no need to change return value
> from
>   .timeout()
>   - fix race between nvme_start_freeze() and nvme_unfreeze()
>   - cover timeout for admin commands running in EH
> 
> Ming Lei (7):
>   block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
>   nvme: pci: cover timeout for admin commands running in EH
>   nvme: pci: only wait freezing if queue is frozen
>   nvme: pci: freeze queue in nvme_dev_disable() in case of error
> recovery
>   nvme: core: introduce 'reset_lock' for sync reset state and reset
> activities
>   nvme: pci: prepare for supporting error recovery from resetting
> context
>   nvme: pci: support nested EH
> 
>  block/blk-core.c |  21 +++-
>  block/blk-mq.c   |   9 ++
>  block/blk-timeout.c  |   5 +-
>  drivers/nvme/host/core.c |  46 ++-
>  drivers/nvme/host/nvme.h |   5 +
>  drivers/nvme/host/pci.c  | 304
> ---
>  include/linux/blkdev.h   |  13 ++
>  7 files changed, 356 insertions(+), 47 deletions(-)
> 
> Cc: Jianchao Wang 
> Cc: Christoph Hellwig 
> Cc: Sagi Grimberg 
> Cc: linux-n...@lists.infradead.org
> Cc: Laurence Oberman 

Hello Ming

I have a two node NUMA system here running your kernel tree
4.17.0-rc3.ming.nvme+

[root@segstorage1 ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 3 5 6 8 11 13 14
node 0 size: 63922 MB
node 0 free: 61310 MB
node 1 cpus: 1 2 4 7 9 10 12 15
node 1 size: 64422 MB
node 1 free: 62372 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

I ran block/011

[root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)[failed]
runtime...  106.936s
--- tests/block/011.out 2018-05-05 18:01:14.268414752 -0400
+++ results/nvme0n1/block/011.out.bad   2018-05-05
19:07:21.028634858 -0400
@@ -1,2 +1,36 @@
 Running block/011
+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
...
(Run 'diff -u tests/block/011.out
results/nvme0n1/block/011.out.bad' to see the entire diff)

[ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
[ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.718239] nvme nvme0: EH 0: before shutdown
[ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760918] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760921] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760923] nvme nvme0: controller is 

[PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-05 Thread Ming Lei
Hi,

The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout()
for NVMe, meantime fixes blk_sync_queue().

The 2nd patch covers timeout for admin commands for recovering controller
for avoiding possible deadlock.

The 3rd and 4th patches avoid to wait_freeze on queues which aren't frozen.

The last 4 patches fixes several races wrt. NVMe timeout handler, and
finally can make blktests block/011 passed. Meantime the NVMe PCI timeout
mecanism become much more rebost than before.

gitweb:
https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4

V4:
- fixe nvme_init_set_host_mem_cmd()
- use nested EH model, and run both nvme_dev_disable() and
resetting in one same context

V3:
- fix one new race related freezing in patch 4, nvme_reset_work()
may hang forever without this patch
- rewrite the last 3 patches, and avoid to break nvme_reset_ctrl*()

V2:
- fix draining timeout work, so no need to change return value from
.timeout()
- fix race between nvme_start_freeze() and nvme_unfreeze()
- cover timeout for admin commands running in EH

Ming Lei (7):
  block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
  nvme: pci: cover timeout for admin commands running in EH
  nvme: pci: only wait freezing if queue is frozen
  nvme: pci: freeze queue in nvme_dev_disable() in case of error
recovery
  nvme: core: introduce 'reset_lock' for sync reset state and reset
activities
  nvme: pci: prepare for supporting error recovery from resetting
context
  nvme: pci: support nested EH

 block/blk-core.c |  21 +++-
 block/blk-mq.c   |   9 ++
 block/blk-timeout.c  |   5 +-
 drivers/nvme/host/core.c |  46 ++-
 drivers/nvme/host/nvme.h |   5 +
 drivers/nvme/host/pci.c  | 304 ---
 include/linux/blkdev.h   |  13 ++
 7 files changed, 356 insertions(+), 47 deletions(-)

Cc: Jianchao Wang 
Cc: Christoph Hellwig 
Cc: Sagi Grimberg 
Cc: linux-n...@lists.infradead.org
Cc: Laurence Oberman 
-- 
2.9.5