Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread Ming Lei
On Thu, Apr 19, 2018 at 09:51:16AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> Thanks for your kindly response.
> 
> On 04/18/2018 11:40 PM, Ming Lei wrote:
> >> Regarding to this patchset, it is mainly to fix the dependency between
> >> nvme_timeout and nvme_dev_disable, as your can see:
> >> nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
> >> depend on nvme_timeout when controller no response.
> > Do you mean nvme_disable_io_queues()? If yes, this one has been handled
> > by wait_for_completion_io_timeout() already, and looks the block timeout
> > can be disabled simply. Or are there others?
> > 
> Here is one possible scenario currently
> 
> nvme_dev_disable // hold shutdown_lock nvme_timeout
>   -> nvme_set_host_mem   -> nvme_dev_disable
> -> nvme_submit_sync_cmd-> try to require 
> shutdown_lock 
>   -> __nvme_submit_sync_cmd
> -> blk_execute_rq
>   //if sysctl_hung_task_timeout_secs == 0
>   -> wait_for_completion_io
> And maybe nvme_dev_disable need to issue other commands in the future.

OK, thanks for sharing this one, for now I think it might need to be
handled by wait_for_completion_io_timeout() for working around this issue.

> 
> Even if we could fix these kind of issues as nvme_disable_io_queues, 
> it is still a risk I think.

Yeah, I can't agree more, that is why I think the nvme time/eh code should
be refactored, and solve the current issues in a more clean/maintainable
way.

Thanks,
Ming


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread Ming Lei
On Thu, Apr 19, 2018 at 09:51:16AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> Thanks for your kindly response.
> 
> On 04/18/2018 11:40 PM, Ming Lei wrote:
> >> Regarding to this patchset, it is mainly to fix the dependency between
> >> nvme_timeout and nvme_dev_disable, as your can see:
> >> nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
> >> depend on nvme_timeout when controller no response.
> > Do you mean nvme_disable_io_queues()? If yes, this one has been handled
> > by wait_for_completion_io_timeout() already, and looks the block timeout
> > can be disabled simply. Or are there others?
> > 
> Here is one possible scenario currently
> 
> nvme_dev_disable // hold shutdown_lock nvme_timeout
>   -> nvme_set_host_mem   -> nvme_dev_disable
> -> nvme_submit_sync_cmd-> try to require 
> shutdown_lock 
>   -> __nvme_submit_sync_cmd
> -> blk_execute_rq
>   //if sysctl_hung_task_timeout_secs == 0
>   -> wait_for_completion_io
> And maybe nvme_dev_disable need to issue other commands in the future.

OK, thanks for sharing this one, for now I think it might need to be
handled by wait_for_completion_io_timeout() for working around this issue.

> 
> Even if we could fix these kind of issues as nvme_disable_io_queues, 
> it is still a risk I think.

Yeah, I can't agree more, that is why I think the nvme time/eh code should
be refactored, and solve the current issues in a more clean/maintainable
way.

Thanks,
Ming


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread jianchao.wang
Hi Ming

Thanks for your kindly response.

On 04/18/2018 11:40 PM, Ming Lei wrote:
>> Regarding to this patchset, it is mainly to fix the dependency between
>> nvme_timeout and nvme_dev_disable, as your can see:
>> nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
>> depend on nvme_timeout when controller no response.
> Do you mean nvme_disable_io_queues()? If yes, this one has been handled
> by wait_for_completion_io_timeout() already, and looks the block timeout
> can be disabled simply. Or are there others?
> 
Here is one possible scenario currently

nvme_dev_disable // hold shutdown_lock nvme_timeout
  -> nvme_set_host_mem   -> nvme_dev_disable
-> nvme_submit_sync_cmd-> try to require 
shutdown_lock 
  -> __nvme_submit_sync_cmd
-> blk_execute_rq
  //if sysctl_hung_task_timeout_secs == 0
  -> wait_for_completion_io
And maybe nvme_dev_disable need to issue other commands in the future.

Even if we could fix these kind of issues as nvme_disable_io_queues, 
it is still a risk I think.

Thanks
Jianchao


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread jianchao.wang
Hi Ming

Thanks for your kindly response.

On 04/18/2018 11:40 PM, Ming Lei wrote:
>> Regarding to this patchset, it is mainly to fix the dependency between
>> nvme_timeout and nvme_dev_disable, as your can see:
>> nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
>> depend on nvme_timeout when controller no response.
> Do you mean nvme_disable_io_queues()? If yes, this one has been handled
> by wait_for_completion_io_timeout() already, and looks the block timeout
> can be disabled simply. Or are there others?
> 
Here is one possible scenario currently

nvme_dev_disable // hold shutdown_lock nvme_timeout
  -> nvme_set_host_mem   -> nvme_dev_disable
-> nvme_submit_sync_cmd-> try to require 
shutdown_lock 
  -> __nvme_submit_sync_cmd
-> blk_execute_rq
  //if sysctl_hung_task_timeout_secs == 0
  -> wait_for_completion_io
And maybe nvme_dev_disable need to issue other commands in the future.

Even if we could fix these kind of issues as nvme_disable_io_queues, 
it is still a risk I think.

Thanks
Jianchao


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread Ming Lei
On Wed, Apr 18, 2018 at 10:24:28PM +0800, jianchao.wang wrote:
> Hi Ming
> 
> On 04/17/2018 11:17 PM, Ming Lei wrote:
> > Looks blktest(block/011) can trigger IO hang easily on NVMe PCI device,
> > and all are related with nvme_dev_disable():
> > 
> > 1) admin queue may be disabled by nvme_dev_disable() from timeout path
> > during resetting, then reset can't move on
> > 
> > 2) the nvme_dev_disable() called from nvme_reset_work() may cause double
> > completion on timed-out request 
> > 
> > So could you share us what your plan is about this patchset?
> 
> Regarding to this patchset, it is mainly to fix the dependency between
> nvme_timeout and nvme_dev_disable, as your can see:
> nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
> depend on nvme_timeout when controller no response.

Do you mean nvme_disable_io_queues()? If yes, this one has been handled
by wait_for_completion_io_timeout() already, and looks the block timeout
can be disabled simply. Or are there others?

> Till now, some parts
> of the patchset looks bad and seem to have a lot of work need to be done.
> :)

Yeah, this part is much more complicated than I thought.

I think it is a good topic to discuss in the coming LSF/MM, and the NVMe
timeout(EH) may need to be refactored/cleaned up, and current issues
should be addressed in clean way.

Guys, are there other issues wrt. NVMe timeout & reset except for the
above?

Thanks,
Ming


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread Ming Lei
On Wed, Apr 18, 2018 at 10:24:28PM +0800, jianchao.wang wrote:
> Hi Ming
> 
> On 04/17/2018 11:17 PM, Ming Lei wrote:
> > Looks blktest(block/011) can trigger IO hang easily on NVMe PCI device,
> > and all are related with nvme_dev_disable():
> > 
> > 1) admin queue may be disabled by nvme_dev_disable() from timeout path
> > during resetting, then reset can't move on
> > 
> > 2) the nvme_dev_disable() called from nvme_reset_work() may cause double
> > completion on timed-out request 
> > 
> > So could you share us what your plan is about this patchset?
> 
> Regarding to this patchset, it is mainly to fix the dependency between
> nvme_timeout and nvme_dev_disable, as your can see:
> nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
> depend on nvme_timeout when controller no response.

Do you mean nvme_disable_io_queues()? If yes, this one has been handled
by wait_for_completion_io_timeout() already, and looks the block timeout
can be disabled simply. Or are there others?

> Till now, some parts
> of the patchset looks bad and seem to have a lot of work need to be done.
> :)

Yeah, this part is much more complicated than I thought.

I think it is a good topic to discuss in the coming LSF/MM, and the NVMe
timeout(EH) may need to be refactored/cleaned up, and current issues
should be addressed in clean way.

Guys, are there other issues wrt. NVMe timeout & reset except for the
above?

Thanks,
Ming


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread jianchao.wang
Hi Ming

On 04/17/2018 11:17 PM, Ming Lei wrote:
> Looks blktest(block/011) can trigger IO hang easily on NVMe PCI device,
> and all are related with nvme_dev_disable():
> 
> 1) admin queue may be disabled by nvme_dev_disable() from timeout path
> during resetting, then reset can't move on
> 
> 2) the nvme_dev_disable() called from nvme_reset_work() may cause double
> completion on timed-out request 
> 
> So could you share us what your plan is about this patchset?

Regarding to this patchset, it is mainly to fix the dependency between
nvme_timeout and nvme_dev_disable, as your can see:
nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
depend on nvme_timeout when controller no response. Till now, some parts
of the patchset looks bad and seem to have a lot of work need to be done.
:)

Thanks
Jianchao


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread jianchao.wang
Hi Ming

On 04/17/2018 11:17 PM, Ming Lei wrote:
> Looks blktest(block/011) can trigger IO hang easily on NVMe PCI device,
> and all are related with nvme_dev_disable():
> 
> 1) admin queue may be disabled by nvme_dev_disable() from timeout path
> during resetting, then reset can't move on
> 
> 2) the nvme_dev_disable() called from nvme_reset_work() may cause double
> completion on timed-out request 
> 
> So could you share us what your plan is about this patchset?

Regarding to this patchset, it is mainly to fix the dependency between
nvme_timeout and nvme_dev_disable, as your can see:
nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
depend on nvme_timeout when controller no response. Till now, some parts
of the patchset looks bad and seem to have a lot of work need to be done.
:)

Thanks
Jianchao


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-17 Thread Ming Lei
On Thu, Mar 08, 2018 at 02:19:26PM +0800, Jianchao Wang wrote:
> Firstly, really appreciate Keith and Sagi's precious advice on previous 
> versions.
> And this is the version 4.
> 
> Some patches of the previous patchset have been submitted and the left is 
> this patchset
> which has been refactored. Please consider it for 4.17.
> 
> The target of this patchset is to avoid nvme_dev_disable to be invoked by 
> nvme_timeout.
> As we know, nvme_dev_disable will issue commands on adminq, if the controller 
> no response,
> it has to depend on timeout path. However, nvme_timeout will also need to 
> invoke
> nvme_dev_disable. This will introduce dangerous circular dependence. Moreover,
> nvme_dev_disable is under the shutdown_lock, even when it go to sleep, this 
> makes things
> worse.
> 
> The basic idea of this patchset is:
>  - When need to schedule reset_work, hand over expired requests to 
> nvme_dev_disable. They
>will be completed after the controller is disabled/shtudown.
> 
>  - When requests from nvme_dev_disable and nvme_reset_work expires, disable 
> the controller
>directly then the request could be completed to wakeup the waiter. 
> 
> The 'disable the controller directly' here means that it doesn't send 
> commands on adminq.
> A new interface is introduced for this, nvme_pci_disable_ctrl_directly. More 
> details,
> please refer to the comment of the function.
> 
> Then nvme_timeout doesn't depends on nvme_dev_disable any more.
> 
> Because there is big difference from previous version, and some relatively 
> independent patches
> have been submitted, so I just reserve the key part of previous version 
> change log following.
> 
> Change V3->V4
>  - refactor the interfaces flushing in-flight requests and add them to nvme 
> core.
>  - refactor the nvme_timeout to make it more clearly
> 
> Change V2->V3:
>  - discard the patch which unfreeze the queue after nvme_dev_disable
> 
> Changes V1->V2:
>  - disable PCI controller bus master in nvme_pci_disable_ctrl_directly
> 
> There are 5 patches:
> 1st one is to change the operations on nvme_request->flags to atomic 
> operations, then we could introduce
> another NVME_REQ_ABORTED next.
> 2nd patch introduce two new interfaces to flush in-flight requests in nvme 
> core.
> 3rd patch is to avoid the nvme_dev_disable in nvme_timeout, it introduce new 
> interface nvme_pci_disable_ctrl_directly
> and refactor the nvme_timeout
> 4th~5th is to fix issues introduced after 3rd patch.
> 
> Jianchao Wang (5)
> 0001-nvme-do-atomically-bit-operations-on-nvme_request.fl.patch
> 0002-nvme-add-helper-interface-to-flush-in-flight-request.patch
> 0003-nvme-pci-avoid-nvme_dev_disable-to-be-invoked-in-nvm.patch
> 0004-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
> 0005-nvme-pci-add-the-timeout-case-for-DELETEING-state.patch
> 
> diff stat
>  drivers/nvme/host/core.c |  96 
> +++
>  drivers/nvme/host/nvme.h |   4 +-
>  drivers/nvme/host/pci.c  | 224 
> +++---
>  

Hi Jianchao,

Looks blktest(block/011) can trigger IO hang easily on NVMe PCI device,
and all are related with nvme_dev_disable():

1) admin queue may be disabled by nvme_dev_disable() from timeout path
during resetting, then reset can't move on

2) the nvme_dev_disable() called from nvme_reset_work() may cause double
completion on timed-out request 

So could you share us what your plan is about this patchset?

Thanks,
Ming


Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-17 Thread Ming Lei
On Thu, Mar 08, 2018 at 02:19:26PM +0800, Jianchao Wang wrote:
> Firstly, really appreciate Keith and Sagi's precious advice on previous 
> versions.
> And this is the version 4.
> 
> Some patches of the previous patchset have been submitted and the left is 
> this patchset
> which has been refactored. Please consider it for 4.17.
> 
> The target of this patchset is to avoid nvme_dev_disable to be invoked by 
> nvme_timeout.
> As we know, nvme_dev_disable will issue commands on adminq, if the controller 
> no response,
> it has to depend on timeout path. However, nvme_timeout will also need to 
> invoke
> nvme_dev_disable. This will introduce dangerous circular dependence. Moreover,
> nvme_dev_disable is under the shutdown_lock, even when it go to sleep, this 
> makes things
> worse.
> 
> The basic idea of this patchset is:
>  - When need to schedule reset_work, hand over expired requests to 
> nvme_dev_disable. They
>will be completed after the controller is disabled/shtudown.
> 
>  - When requests from nvme_dev_disable and nvme_reset_work expires, disable 
> the controller
>directly then the request could be completed to wakeup the waiter. 
> 
> The 'disable the controller directly' here means that it doesn't send 
> commands on adminq.
> A new interface is introduced for this, nvme_pci_disable_ctrl_directly. More 
> details,
> please refer to the comment of the function.
> 
> Then nvme_timeout doesn't depends on nvme_dev_disable any more.
> 
> Because there is big difference from previous version, and some relatively 
> independent patches
> have been submitted, so I just reserve the key part of previous version 
> change log following.
> 
> Change V3->V4
>  - refactor the interfaces flushing in-flight requests and add them to nvme 
> core.
>  - refactor the nvme_timeout to make it more clearly
> 
> Change V2->V3:
>  - discard the patch which unfreeze the queue after nvme_dev_disable
> 
> Changes V1->V2:
>  - disable PCI controller bus master in nvme_pci_disable_ctrl_directly
> 
> There are 5 patches:
> 1st one is to change the operations on nvme_request->flags to atomic 
> operations, then we could introduce
> another NVME_REQ_ABORTED next.
> 2nd patch introduce two new interfaces to flush in-flight requests in nvme 
> core.
> 3rd patch is to avoid the nvme_dev_disable in nvme_timeout, it introduce new 
> interface nvme_pci_disable_ctrl_directly
> and refactor the nvme_timeout
> 4th~5th is to fix issues introduced after 3rd patch.
> 
> Jianchao Wang (5)
> 0001-nvme-do-atomically-bit-operations-on-nvme_request.fl.patch
> 0002-nvme-add-helper-interface-to-flush-in-flight-request.patch
> 0003-nvme-pci-avoid-nvme_dev_disable-to-be-invoked-in-nvm.patch
> 0004-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
> 0005-nvme-pci-add-the-timeout-case-for-DELETEING-state.patch
> 
> diff stat
>  drivers/nvme/host/core.c |  96 
> +++
>  drivers/nvme/host/nvme.h |   4 +-
>  drivers/nvme/host/pci.c  | 224 
> +++---
>  

Hi Jianchao,

Looks blktest(block/011) can trigger IO hang easily on NVMe PCI device,
and all are related with nvme_dev_disable():

1) admin queue may be disabled by nvme_dev_disable() from timeout path
during resetting, then reset can't move on

2) the nvme_dev_disable() called from nvme_reset_work() may cause double
completion on timed-out request 

So could you share us what your plan is about this patchset?

Thanks,
Ming


PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-03-07 Thread Jianchao Wang
Firstly, really appreciate Keith and Sagi's precious advice on previous 
versions.
And this is the version 4.

Some patches of the previous patchset have been submitted and the left is this 
patchset
which has been refactored. Please consider it for 4.17.

The target of this patchset is to avoid nvme_dev_disable to be invoked by 
nvme_timeout.
As we know, nvme_dev_disable will issue commands on adminq, if the controller 
no response,
it has to depend on timeout path. However, nvme_timeout will also need to invoke
nvme_dev_disable. This will introduce dangerous circular dependence. Moreover,
nvme_dev_disable is under the shutdown_lock, even when it go to sleep, this 
makes things
worse.

The basic idea of this patchset is:
 - When need to schedule reset_work, hand over expired requests to 
nvme_dev_disable. They
   will be completed after the controller is disabled/shtudown.

 - When requests from nvme_dev_disable and nvme_reset_work expires, disable the 
controller
   directly then the request could be completed to wakeup the waiter. 

The 'disable the controller directly' here means that it doesn't send commands 
on adminq.
A new interface is introduced for this, nvme_pci_disable_ctrl_directly. More 
details,
please refer to the comment of the function.

Then nvme_timeout doesn't depends on nvme_dev_disable any more.

Because there is big difference from previous version, and some relatively 
independent patches
have been submitted, so I just reserve the key part of previous version change 
log following.

Change V3->V4
 - refactor the interfaces flushing in-flight requests and add them to nvme 
core.
 - refactor the nvme_timeout to make it more clearly

Change V2->V3:
 - discard the patch which unfreeze the queue after nvme_dev_disable

Changes V1->V2:
 - disable PCI controller bus master in nvme_pci_disable_ctrl_directly

There are 5 patches:
1st one is to change the operations on nvme_request->flags to atomic 
operations, then we could introduce
another NVME_REQ_ABORTED next.
2nd patch introduce two new interfaces to flush in-flight requests in nvme core.
3rd patch is to avoid the nvme_dev_disable in nvme_timeout, it introduce new 
interface nvme_pci_disable_ctrl_directly
and refactor the nvme_timeout
4th~5th is to fix issues introduced after 3rd patch.

Jianchao Wang (5)
0001-nvme-do-atomically-bit-operations-on-nvme_request.fl.patch
0002-nvme-add-helper-interface-to-flush-in-flight-request.patch
0003-nvme-pci-avoid-nvme_dev_disable-to-be-invoked-in-nvm.patch
0004-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
0005-nvme-pci-add-the-timeout-case-for-DELETEING-state.patch

diff stat
 drivers/nvme/host/core.c |  96 +++
 drivers/nvme/host/nvme.h |   4 +-
 drivers/nvme/host/pci.c  | 224 
+++---
 
 Thanks
 Jianchao


PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-03-07 Thread Jianchao Wang
Firstly, really appreciate Keith and Sagi's precious advice on previous 
versions.
And this is the version 4.

Some patches of the previous patchset have been submitted and the left is this 
patchset
which has been refactored. Please consider it for 4.17.

The target of this patchset is to avoid nvme_dev_disable to be invoked by 
nvme_timeout.
As we know, nvme_dev_disable will issue commands on adminq, if the controller 
no response,
it has to depend on timeout path. However, nvme_timeout will also need to invoke
nvme_dev_disable. This will introduce dangerous circular dependence. Moreover,
nvme_dev_disable is under the shutdown_lock, even when it go to sleep, this 
makes things
worse.

The basic idea of this patchset is:
 - When need to schedule reset_work, hand over expired requests to 
nvme_dev_disable. They
   will be completed after the controller is disabled/shtudown.

 - When requests from nvme_dev_disable and nvme_reset_work expires, disable the 
controller
   directly then the request could be completed to wakeup the waiter. 

The 'disable the controller directly' here means that it doesn't send commands 
on adminq.
A new interface is introduced for this, nvme_pci_disable_ctrl_directly. More 
details,
please refer to the comment of the function.

Then nvme_timeout doesn't depends on nvme_dev_disable any more.

Because there is big difference from previous version, and some relatively 
independent patches
have been submitted, so I just reserve the key part of previous version change 
log following.

Change V3->V4
 - refactor the interfaces flushing in-flight requests and add them to nvme 
core.
 - refactor the nvme_timeout to make it more clearly

Change V2->V3:
 - discard the patch which unfreeze the queue after nvme_dev_disable

Changes V1->V2:
 - disable PCI controller bus master in nvme_pci_disable_ctrl_directly

There are 5 patches:
1st one is to change the operations on nvme_request->flags to atomic 
operations, then we could introduce
another NVME_REQ_ABORTED next.
2nd patch introduce two new interfaces to flush in-flight requests in nvme core.
3rd patch is to avoid the nvme_dev_disable in nvme_timeout, it introduce new 
interface nvme_pci_disable_ctrl_directly
and refactor the nvme_timeout
4th~5th is to fix issues introduced after 3rd patch.

Jianchao Wang (5)
0001-nvme-do-atomically-bit-operations-on-nvme_request.fl.patch
0002-nvme-add-helper-interface-to-flush-in-flight-request.patch
0003-nvme-pci-avoid-nvme_dev_disable-to-be-invoked-in-nvm.patch
0004-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
0005-nvme-pci-add-the-timeout-case-for-DELETEING-state.patch

diff stat
 drivers/nvme/host/core.c |  96 +++
 drivers/nvme/host/nvme.h |   4 +-
 drivers/nvme/host/pci.c  | 224 
+++---
 
 Thanks
 Jianchao