Re: [PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable
Hi Keith Thanks for you kindly response and comment. That's really appreciated. On 02/03/2018 02:31 AM, Keith Busch wrote: > On Fri, Feb 02, 2018 at 03:00:47PM +0800, Jianchao Wang wrote: >> Currently, the complicated relationship between nvme_dev_disable >> and nvme_timeout has become a devil that will introduce many >> circular pattern which may trigger deadlock or IO hang. Let's >> enumerate the tangles between them: >> - nvme_timeout has to invoke nvme_dev_disable to stop the >>controller doing DMA access before free the request. >> - nvme_dev_disable has to depend on nvme_timeout to complete >>adminq requests to set HMB or delete sq/cq when the controller >>has no response. >> - nvme_dev_disable will race with nvme_timeout when cancels the >>outstanding requests. > > Your patch is releasing a command back to the OS with the > PCI controller bus master still enabled. This could lead to data or > memory corruption. > There are two cases nvme_timeout will return. BLK_EH_HANDLED BLK_EH_NOT_HANDLED For the 1st case, the patch will disable the controller. Then the controller will stop processing any outstanding command and delete the sq/cq queues as the protocol. Looks like it is still not enough, I will to disable the _pci_in nvme_pci_disable_dev_directly next version. Really thanks for your directive here. For the 2nd case, it will return BLK_EH_NOT_HANDLED. blk_mq_rq_timed_out will do nothing for this case. All the command will be handled after all the things are disabled. > In any case, it's not as complicated as you're making it out to > be. It'd be easier to just enforce the exisiting rule that commands > issued in the disabling path not depend on completions or timeout > handling. All of commands issued in this path already do this except > for HMB disabling. Let'sjust fix that command, right? > We will still met nvme_timeout will invoke nvme_dev_disable and cannot synchronize on the outstanding requests. This is really a devil and will be a block to do other improvements. This patch just do two things: 1. grab all the previous outstanding requests with blk_abort_request. Then release them after the controller is totally disabled/shutdown. consequently, during the disable/shutdown and initializing procedure, nvme_timeout path only need to serve them. And this also could ensure there will be _no_ any outstanding requests after nvme_dev_disable. 2. fail the adminq command issued during disable/shutdown and initializing procedure when the controller no response. we need to do two steps for this, disable the controller/pci and complete the command. Then nvme_timeout will not need to invoke nvme_dev_disable and nvme_dev_disable will be independent. Please consider this. Many thanks Jianchao
Re: [PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable
Hi Keith Thanks for you kindly response and comment. That's really appreciated. On 02/03/2018 02:31 AM, Keith Busch wrote: > On Fri, Feb 02, 2018 at 03:00:47PM +0800, Jianchao Wang wrote: >> Currently, the complicated relationship between nvme_dev_disable >> and nvme_timeout has become a devil that will introduce many >> circular pattern which may trigger deadlock or IO hang. Let's >> enumerate the tangles between them: >> - nvme_timeout has to invoke nvme_dev_disable to stop the >>controller doing DMA access before free the request. >> - nvme_dev_disable has to depend on nvme_timeout to complete >>adminq requests to set HMB or delete sq/cq when the controller >>has no response. >> - nvme_dev_disable will race with nvme_timeout when cancels the >>outstanding requests. > > Your patch is releasing a command back to the OS with the > PCI controller bus master still enabled. This could lead to data or > memory corruption. > There are two cases nvme_timeout will return. BLK_EH_HANDLED BLK_EH_NOT_HANDLED For the 1st case, the patch will disable the controller. Then the controller will stop processing any outstanding command and delete the sq/cq queues as the protocol. Looks like it is still not enough, I will to disable the _pci_in nvme_pci_disable_dev_directly next version. Really thanks for your directive here. For the 2nd case, it will return BLK_EH_NOT_HANDLED. blk_mq_rq_timed_out will do nothing for this case. All the command will be handled after all the things are disabled. > In any case, it's not as complicated as you're making it out to > be. It'd be easier to just enforce the exisiting rule that commands > issued in the disabling path not depend on completions or timeout > handling. All of commands issued in this path already do this except > for HMB disabling. Let'sjust fix that command, right? > We will still met nvme_timeout will invoke nvme_dev_disable and cannot synchronize on the outstanding requests. This is really a devil and will be a block to do other improvements. This patch just do two things: 1. grab all the previous outstanding requests with blk_abort_request. Then release them after the controller is totally disabled/shutdown. consequently, during the disable/shutdown and initializing procedure, nvme_timeout path only need to serve them. And this also could ensure there will be _no_ any outstanding requests after nvme_dev_disable. 2. fail the adminq command issued during disable/shutdown and initializing procedure when the controller no response. we need to do two steps for this, disable the controller/pci and complete the command. Then nvme_timeout will not need to invoke nvme_dev_disable and nvme_dev_disable will be independent. Please consider this. Many thanks Jianchao
Re: [PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable
On Fri, Feb 02, 2018 at 03:00:47PM +0800, Jianchao Wang wrote: > Currently, the complicated relationship between nvme_dev_disable > and nvme_timeout has become a devil that will introduce many > circular pattern which may trigger deadlock or IO hang. Let's > enumerate the tangles between them: > - nvme_timeout has to invoke nvme_dev_disable to stop the >controller doing DMA access before free the request. > - nvme_dev_disable has to depend on nvme_timeout to complete >adminq requests to set HMB or delete sq/cq when the controller >has no response. > - nvme_dev_disable will race with nvme_timeout when cancels the >outstanding requests. Your patch is releasing a command back to the OS with the PCI controller bus master still enabled. This could lead to data or memory corruption. In any case, it's not as complicated as you're making it out to be. It'd be easier to just enforce the exisiting rule that commands issued in the disabling path not depend on completions or timeout handling. All of commands issued in this path already do this except for HMB disabling. Let'sjust fix that command, right?
Re: [PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable
On Fri, Feb 02, 2018 at 03:00:47PM +0800, Jianchao Wang wrote: > Currently, the complicated relationship between nvme_dev_disable > and nvme_timeout has become a devil that will introduce many > circular pattern which may trigger deadlock or IO hang. Let's > enumerate the tangles between them: > - nvme_timeout has to invoke nvme_dev_disable to stop the >controller doing DMA access before free the request. > - nvme_dev_disable has to depend on nvme_timeout to complete >adminq requests to set HMB or delete sq/cq when the controller >has no response. > - nvme_dev_disable will race with nvme_timeout when cancels the >outstanding requests. Your patch is releasing a command back to the OS with the PCI controller bus master still enabled. This could lead to data or memory corruption. In any case, it's not as complicated as you're making it out to be. It'd be easier to just enforce the exisiting rule that commands issued in the disabling path not depend on completions or timeout handling. All of commands issued in this path already do this except for HMB disabling. Let'sjust fix that command, right?
[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable
Currently, the complicated relationship between nvme_dev_disable and nvme_timeout has become a devil that will introduce many circular pattern which may trigger deadlock or IO hang. Let's enumerate the tangles between them: - nvme_timeout has to invoke nvme_dev_disable to stop the controller doing DMA access before free the request. - nvme_dev_disable has to depend on nvme_timeout to complete adminq requests to set HMB or delete sq/cq when the controller has no response. - nvme_dev_disable will race with nvme_timeout when cancels the outstanding requests. To break up them, let's first look at what's kind of requests nvme_timeout has to handle. RESETTING previous adminq/IOq request or shutdownadminq requests from nvme_dev_disable RECONNECTING adminq requests from nvme_reset_work nvme_timeout has to invoke nvme_dev_disable first before complete all the expired request above. We avoid this as following. For the previous adminq/IOq request: use blk_abort_request to force all the outstanding requests expired in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and return BLK_EH_NOT_HANDLED. Then the request will not be completed and freed. We needn't invoke nvme_dev_disable any more. blk_abort_request is safe when race with irq completion path. we have been able to grab all the outstanding requests. This could eliminate the race between nvme_timeout and nvme_dev_disable. We use NVME_REQ_CANCELLED to identify them. After the controller is totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate to clear requests and invoke blk_mq_complete_request to complete them. In addition, to identify the previous adminq/IOq request and adminq requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to distinguish them. For the adminq requests from nvme_dev_disable/nvme_reset_work: invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will see the error. With this patch, we could avoid nvme_dev_disable to be invoked by nvme_timeout and eliminate the race between nvme_timeout and nvme_dev_disable on outstanding requests. Signed-off-by: Jianchao Wang--- drivers/nvme/host/pci.c | 146 1 file changed, 123 insertions(+), 23 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a7fa397..5b192b0 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -70,6 +70,8 @@ struct nvme_queue; static void nvme_process_cq(struct nvme_queue *nvmeq); static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown); +#define NVME_PCI_OUTSTANDING_GRABBING 1 +#define NVME_PCI_OUTSTANDING_GRABBED 2 /* * Represents an NVM Express device. Each nvme_dev is a PCI function. @@ -80,6 +82,7 @@ struct nvme_dev { struct blk_mq_tag_set admin_tagset; u32 __iomem *dbs; struct device *dev; + int grab_flag; struct dma_pool *prp_page_pool; struct dma_pool *prp_small_pool; unsigned online_queues; @@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, blk_status_t error) blk_mq_free_request(req); } +static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev->dev); + u32 csts; + bool dead; + + if (!pci_is_enabled(pdev)) + return; + + csts = readl(dev->bar + NVME_REG_CSTS); + dead = !!((csts & NVME_CSTS_CFS) || + !(csts & NVME_CSTS_RDY) || + pdev->error_state != pci_channel_io_normal); + if (!dead) + nvme_disable_ctrl(>ctrl, dev->ctrl.cap); +} + static bool nvme_should_reset(struct nvme_dev *dev, u32 csts) { @@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) /* * Reset immediately if the controller is failed +* nvme_dev_disable will take over the expired requests. */ if (nvme_should_reset(dev, csts)) { + nvme_req(req)->flags |= NVME_REQ_CANCELLED; nvme_warn_reset(dev, csts); - nvme_dev_disable(dev, false); nvme_reset_ctrl(>ctrl); - return BLK_EH_HANDLED; + return BLK_EH_NOT_HANDLED; } /* @@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) } /* -* Shutdown immediately if controller times out while starting. The -* reset work will see the pci device disabled when it gets the forced -* cancellation error. All outstanding requests are completed on -* shutdown, so we return BLK_EH_HANDLED. +* The previous outstanding requests on adminq and ioq have been +* grabbed
[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable
Currently, the complicated relationship between nvme_dev_disable and nvme_timeout has become a devil that will introduce many circular pattern which may trigger deadlock or IO hang. Let's enumerate the tangles between them: - nvme_timeout has to invoke nvme_dev_disable to stop the controller doing DMA access before free the request. - nvme_dev_disable has to depend on nvme_timeout to complete adminq requests to set HMB or delete sq/cq when the controller has no response. - nvme_dev_disable will race with nvme_timeout when cancels the outstanding requests. To break up them, let's first look at what's kind of requests nvme_timeout has to handle. RESETTING previous adminq/IOq request or shutdownadminq requests from nvme_dev_disable RECONNECTING adminq requests from nvme_reset_work nvme_timeout has to invoke nvme_dev_disable first before complete all the expired request above. We avoid this as following. For the previous adminq/IOq request: use blk_abort_request to force all the outstanding requests expired in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and return BLK_EH_NOT_HANDLED. Then the request will not be completed and freed. We needn't invoke nvme_dev_disable any more. blk_abort_request is safe when race with irq completion path. we have been able to grab all the outstanding requests. This could eliminate the race between nvme_timeout and nvme_dev_disable. We use NVME_REQ_CANCELLED to identify them. After the controller is totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate to clear requests and invoke blk_mq_complete_request to complete them. In addition, to identify the previous adminq/IOq request and adminq requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to distinguish them. For the adminq requests from nvme_dev_disable/nvme_reset_work: invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will see the error. With this patch, we could avoid nvme_dev_disable to be invoked by nvme_timeout and eliminate the race between nvme_timeout and nvme_dev_disable on outstanding requests. Signed-off-by: Jianchao Wang --- drivers/nvme/host/pci.c | 146 1 file changed, 123 insertions(+), 23 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a7fa397..5b192b0 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -70,6 +70,8 @@ struct nvme_queue; static void nvme_process_cq(struct nvme_queue *nvmeq); static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown); +#define NVME_PCI_OUTSTANDING_GRABBING 1 +#define NVME_PCI_OUTSTANDING_GRABBED 2 /* * Represents an NVM Express device. Each nvme_dev is a PCI function. @@ -80,6 +82,7 @@ struct nvme_dev { struct blk_mq_tag_set admin_tagset; u32 __iomem *dbs; struct device *dev; + int grab_flag; struct dma_pool *prp_page_pool; struct dma_pool *prp_small_pool; unsigned online_queues; @@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, blk_status_t error) blk_mq_free_request(req); } +static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev->dev); + u32 csts; + bool dead; + + if (!pci_is_enabled(pdev)) + return; + + csts = readl(dev->bar + NVME_REG_CSTS); + dead = !!((csts & NVME_CSTS_CFS) || + !(csts & NVME_CSTS_RDY) || + pdev->error_state != pci_channel_io_normal); + if (!dead) + nvme_disable_ctrl(>ctrl, dev->ctrl.cap); +} + static bool nvme_should_reset(struct nvme_dev *dev, u32 csts) { @@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) /* * Reset immediately if the controller is failed +* nvme_dev_disable will take over the expired requests. */ if (nvme_should_reset(dev, csts)) { + nvme_req(req)->flags |= NVME_REQ_CANCELLED; nvme_warn_reset(dev, csts); - nvme_dev_disable(dev, false); nvme_reset_ctrl(>ctrl); - return BLK_EH_HANDLED; + return BLK_EH_NOT_HANDLED; } /* @@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) } /* -* Shutdown immediately if controller times out while starting. The -* reset work will see the pci device disabled when it gets the forced -* cancellation error. All outstanding requests are completed on -* shutdown, so we return BLK_EH_HANDLED. +* The previous outstanding requests on adminq and ioq have been +* grabbed or drained for RECONNECTING
[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable
Currently, the complicated relationship between nvme_dev_disable and nvme_timeout has become a devil that will introduce many circular pattern which may trigger deadlock or IO hang. Let's enumerate the tangles between them: - nvme_timeout has to invoke nvme_dev_disable to stop the controller doing DMA access before free the request. - nvme_dev_disable has to depend on nvme_timeout to complete adminq requests to set HMB or delete sq/cq when the controller has no response. - nvme_dev_disable will race with nvme_timeout when cancels the outstanding requests. To break up them, let's first look at what's kind of requests nvme_timeout has to handle. RESETTING previous adminq/IOq request or shutdownadminq requests from nvme_dev_disable RECONNECTING adminq requests from nvme_reset_work nvme_timeout has to invoke nvme_dev_disable first before complete all the expired request above. We avoid this as following. For the previous adminq/IOq request: use blk_abort_request to force all the outstanding requests expired in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and return BLK_EH_NOT_HANDLED. Then the request will not be completed and freed. We needn't invoke nvme_dev_disable any more. blk_abort_request is safe when race with irq completion path. we have been able to grab all the outstanding requests. This could eliminate the race between nvme_timeout and nvme_dev_disable. We use NVME_REQ_CANCELLED to identify them. After the controller is totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate to clear requests and invoke blk_mq_complete_request to complete them. In addition, to identify the previous adminq/IOq request and adminq requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to distinguish them. For the adminq requests from nvme_dev_disable/nvme_reset_work: invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will see the error. With this patch, we could avoid nvme_dev_disable to be invoked by nvme_timeout and eliminate the race between nvme_timeout and nvme_dev_disable on outstanding requests. Signed-off-by: Jianchao Wang--- drivers/nvme/host/pci.c | 146 1 file changed, 123 insertions(+), 23 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a7fa397..5b192b0 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -70,6 +70,8 @@ struct nvme_queue; static void nvme_process_cq(struct nvme_queue *nvmeq); static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown); +#define NVME_PCI_OUTSTANDING_GRABBING 1 +#define NVME_PCI_OUTSTANDING_GRABBED 2 /* * Represents an NVM Express device. Each nvme_dev is a PCI function. @@ -80,6 +82,7 @@ struct nvme_dev { struct blk_mq_tag_set admin_tagset; u32 __iomem *dbs; struct device *dev; + int grab_flag; struct dma_pool *prp_page_pool; struct dma_pool *prp_small_pool; unsigned online_queues; @@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, blk_status_t error) blk_mq_free_request(req); } +static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev->dev); + u32 csts; + bool dead; + + if (!pci_is_enabled(pdev)) + return; + + csts = readl(dev->bar + NVME_REG_CSTS); + dead = !!((csts & NVME_CSTS_CFS) || + !(csts & NVME_CSTS_RDY) || + pdev->error_state != pci_channel_io_normal); + if (!dead) + nvme_disable_ctrl(>ctrl, dev->ctrl.cap); +} + static bool nvme_should_reset(struct nvme_dev *dev, u32 csts) { @@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) /* * Reset immediately if the controller is failed +* nvme_dev_disable will take over the expired requests. */ if (nvme_should_reset(dev, csts)) { + nvme_req(req)->flags |= NVME_REQ_CANCELLED; nvme_warn_reset(dev, csts); - nvme_dev_disable(dev, false); nvme_reset_ctrl(>ctrl); - return BLK_EH_HANDLED; + return BLK_EH_NOT_HANDLED; } /* @@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) } /* -* Shutdown immediately if controller times out while starting. The -* reset work will see the pci device disabled when it gets the forced -* cancellation error. All outstanding requests are completed on -* shutdown, so we return BLK_EH_HANDLED. +* The previous outstanding requests on adminq and ioq have been +* grabbed
[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable
Currently, the complicated relationship between nvme_dev_disable and nvme_timeout has become a devil that will introduce many circular pattern which may trigger deadlock or IO hang. Let's enumerate the tangles between them: - nvme_timeout has to invoke nvme_dev_disable to stop the controller doing DMA access before free the request. - nvme_dev_disable has to depend on nvme_timeout to complete adminq requests to set HMB or delete sq/cq when the controller has no response. - nvme_dev_disable will race with nvme_timeout when cancels the outstanding requests. To break up them, let's first look at what's kind of requests nvme_timeout has to handle. RESETTING previous adminq/IOq request or shutdownadminq requests from nvme_dev_disable RECONNECTING adminq requests from nvme_reset_work nvme_timeout has to invoke nvme_dev_disable first before complete all the expired request above. We avoid this as following. For the previous adminq/IOq request: use blk_abort_request to force all the outstanding requests expired in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and return BLK_EH_NOT_HANDLED. Then the request will not be completed and freed. We needn't invoke nvme_dev_disable any more. blk_abort_request is safe when race with irq completion path. we have been able to grab all the outstanding requests. This could eliminate the race between nvme_timeout and nvme_dev_disable. We use NVME_REQ_CANCELLED to identify them. After the controller is totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate to clear requests and invoke blk_mq_complete_request to complete them. In addition, to identify the previous adminq/IOq request and adminq requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to distinguish them. For the adminq requests from nvme_dev_disable/nvme_reset_work: invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will see the error. With this patch, we could avoid nvme_dev_disable to be invoked by nvme_timeout and eliminate the race between nvme_timeout and nvme_dev_disable on outstanding requests. Signed-off-by: Jianchao Wang --- drivers/nvme/host/pci.c | 146 1 file changed, 123 insertions(+), 23 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a7fa397..5b192b0 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -70,6 +70,8 @@ struct nvme_queue; static void nvme_process_cq(struct nvme_queue *nvmeq); static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown); +#define NVME_PCI_OUTSTANDING_GRABBING 1 +#define NVME_PCI_OUTSTANDING_GRABBED 2 /* * Represents an NVM Express device. Each nvme_dev is a PCI function. @@ -80,6 +82,7 @@ struct nvme_dev { struct blk_mq_tag_set admin_tagset; u32 __iomem *dbs; struct device *dev; + int grab_flag; struct dma_pool *prp_page_pool; struct dma_pool *prp_small_pool; unsigned online_queues; @@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, blk_status_t error) blk_mq_free_request(req); } +static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev->dev); + u32 csts; + bool dead; + + if (!pci_is_enabled(pdev)) + return; + + csts = readl(dev->bar + NVME_REG_CSTS); + dead = !!((csts & NVME_CSTS_CFS) || + !(csts & NVME_CSTS_RDY) || + pdev->error_state != pci_channel_io_normal); + if (!dead) + nvme_disable_ctrl(>ctrl, dev->ctrl.cap); +} + static bool nvme_should_reset(struct nvme_dev *dev, u32 csts) { @@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) /* * Reset immediately if the controller is failed +* nvme_dev_disable will take over the expired requests. */ if (nvme_should_reset(dev, csts)) { + nvme_req(req)->flags |= NVME_REQ_CANCELLED; nvme_warn_reset(dev, csts); - nvme_dev_disable(dev, false); nvme_reset_ctrl(>ctrl); - return BLK_EH_HANDLED; + return BLK_EH_NOT_HANDLED; } /* @@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) } /* -* Shutdown immediately if controller times out while starting. The -* reset work will see the pci device disabled when it gets the forced -* cancellation error. All outstanding requests are completed on -* shutdown, so we return BLK_EH_HANDLED. +* The previous outstanding requests on adminq and ioq have been +* grabbed or drained for RECONNECTING