Re: [Intel-gfx] [PATCH] Revert "nvme-pci: use host managed power state for suspend"
On Fri, Aug 16, 2019 at 6:38 AM Christoph Hellwig wrote: > On Fri, Aug 16, 2019 at 01:30:29PM +0100, Chris Wilson wrote: > > Quoting Christoph Hellwig (2019-08-16 13:26:44) > > > Please, report the actual problem. Blindly reverting a patch without > > > even an explanation of your regressions is not the way to do it. > > > > As stated, the system doesn't suspend. > > > > If you would like to wait, you will get test results from our CI > > giving the current failed state and the outcome of the patch. > > Platform type, SSD vendor and type, firmware version? Also not a fan of knee-jerk reverts. Even if it may turn out to be necessary, let's at least start with a bug report for an opportunity to fix first! Could you please try Rafael's solution? These two commits here: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=xps13-9380-20190812=accd2dd72c8f087441d725dd916688171519e4e6 https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=xps13-9380-20190812=4eaefe8c621c6195c91044396ed8060c179f7aae ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "nvme-pci: use host managed power state for suspend"
On Fri, Aug 16, 2019 at 07:12:11AM -0700, Chris Wilson wrote: > Quoting Keith Busch (2019-08-16 13:46:41) > > Could you please try Rafael's solution? These two commits here: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=xps13-9380-20190812=accd2dd72c8f087441d725dd916688171519e4e6 > > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=xps13-9380-20190812=4eaefe8c621c6195c91044396ed8060c179f7aae > > Which indeed work, thank you. Great! A pull request inlcuding these fixes was sent to Linus earlier today. We should expect to see its inclusion in the next -rc. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "nvme-pci: use host managed power state for suspend"
Quoting Keith Busch (2019-08-16 13:46:41) > On Fri, Aug 16, 2019 at 6:38 AM Christoph Hellwig wrote: > > On Fri, Aug 16, 2019 at 01:30:29PM +0100, Chris Wilson wrote: > > > Quoting Christoph Hellwig (2019-08-16 13:26:44) > > > > Please, report the actual problem. Blindly reverting a patch without > > > > even an explanation of your regressions is not the way to do it. > > > > > > As stated, the system doesn't suspend. > > > > > > If you would like to wait, you will get test results from our CI > > > giving the current failed state and the outcome of the patch. > > > > Platform type, SSD vendor and type, firmware version? Which platform were you interested in, and is that information not present in the debug log? The issue is observed across multiple different SSD and vendors. > Also not a fan of knee-jerk reverts. Even if it may turn out to be > necessary, let's at least start with a bug report for an opportunity > to fix first! I just did report that we successfully bisected the earlier bug we reported, and was testing a revert for our CI. > Could you please try Rafael's solution? These two commits here: > > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=xps13-9380-20190812=accd2dd72c8f087441d725dd916688171519e4e6 > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=xps13-9380-20190812=4eaefe8c621c6195c91044396ed8060c179f7aae Which indeed work, thank you. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "nvme-pci: use host managed power state for suspend"
On Fri, Aug 16, 2019 at 01:30:29PM +0100, Chris Wilson wrote: > Quoting Christoph Hellwig (2019-08-16 13:26:44) > > Please, report the actual problem. Blindly reverting a patch without > > even an explanation of your regressions is not the way to do it. > > As stated, the system doesn't suspend. > > If you would like to wait, you will get test results from our CI > giving the current failed state and the outcome of the patch. Platform type, SSD vendor and type, firmware version? ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "nvme-pci: use host managed power state for suspend"
Quoting Christoph Hellwig (2019-08-16 13:26:44) > Please, report the actual problem. Blindly reverting a patch without > even an explanation of your regressions is not the way to do it. As stated, the system doesn't suspend. If you would like to wait, you will get test results from our CI giving the current failed state and the outcome of the patch. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "nvme-pci: use host managed power state for suspend"
Please, report the actual problem. Blindly reverting a patch without even an explanation of your regressions is not the way to do it. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] Revert "nvme-pci: use host managed power state for suspend"
commit d916b1be94b6dc8d293abed2451f3062f6af7551 Author: Keith Busch Date: Thu May 23 09:27:35 2019 -0600 nvme-pci: use host managed power state for suspend Bisected as cause of suspend failure for gem_eio/suspend on multiple kbl hosts. Cc: Rafael J. Wysocki Cc: Keith Busch Cc: Sagi Grimberg Cc: Christoph Hellwig Cc: Keith Busch Cc: Jens Axboe Cc: linux-n...@lists.infradead.org --- drivers/nvme/host/pci.c | 95 ++--- 1 file changed, 3 insertions(+), 92 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index db160cee42ad..bdc9e0625bb7 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -18,7 +18,6 @@ #include #include #include -#include #include #include #include @@ -111,7 +110,6 @@ struct nvme_dev { u32 cmbsz; u32 cmbloc; struct nvme_ctrl ctrl; - u32 last_ps; mempool_t *iod_mempool; @@ -2831,94 +2829,16 @@ static void nvme_remove(struct pci_dev *pdev) } #ifdef CONFIG_PM_SLEEP -static int nvme_get_power_state(struct nvme_ctrl *ctrl, u32 *ps) -{ - return nvme_get_features(ctrl, NVME_FEAT_POWER_MGMT, 0, NULL, 0, ps); -} - -static int nvme_set_power_state(struct nvme_ctrl *ctrl, u32 ps) -{ - return nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, ps, NULL, 0, NULL); -} - -static int nvme_resume(struct device *dev) -{ - struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev)); - struct nvme_ctrl *ctrl = >ctrl; - - if (pm_resume_via_firmware() || !ctrl->npss || - nvme_set_power_state(ctrl, ndev->last_ps) != 0) - nvme_reset_ctrl(ctrl); - return 0; -} - static int nvme_suspend(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); struct nvme_dev *ndev = pci_get_drvdata(pdev); - struct nvme_ctrl *ctrl = >ctrl; - int ret = -EBUSY; - - /* -* The platform does not remove power for a kernel managed suspend so -* use host managed nvme power settings for lowest idle power if -* possible. This should have quicker resume latency than a full device -* shutdown. But if the firmware is involved after the suspend or the -* device does not support any non-default power states, shut down the -* device fully. -*/ - if (pm_suspend_via_firmware() || !ctrl->npss) { - nvme_dev_disable(ndev, true); - return 0; - } - - nvme_start_freeze(ctrl); - nvme_wait_freeze(ctrl); - nvme_sync_queues(ctrl); - - if (ctrl->state != NVME_CTRL_LIVE && - ctrl->state != NVME_CTRL_ADMIN_ONLY) - goto unfreeze; - - ndev->last_ps = 0; - ret = nvme_get_power_state(ctrl, >last_ps); - if (ret < 0) - goto unfreeze; - - ret = nvme_set_power_state(ctrl, ctrl->npss); - if (ret < 0) - goto unfreeze; - - if (ret) { - /* -* Clearing npss forces a controller reset on resume. The -* correct value will be resdicovered then. -*/ - nvme_dev_disable(ndev, true); - ctrl->npss = 0; - ret = 0; - goto unfreeze; - } - /* -* A saved state prevents pci pm from generically controlling the -* device's power. If we're using protocol specific settings, we don't -* want pci interfering. -*/ - pci_save_state(pdev); -unfreeze: - nvme_unfreeze(ctrl); - return ret; -} - -static int nvme_simple_suspend(struct device *dev) -{ - struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev)); nvme_dev_disable(ndev, true); return 0; } -static int nvme_simple_resume(struct device *dev) +static int nvme_resume(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); struct nvme_dev *ndev = pci_get_drvdata(pdev); @@ -2926,16 +2846,9 @@ static int nvme_simple_resume(struct device *dev) nvme_reset_ctrl(>ctrl); return 0; } +#endif -static const struct dev_pm_ops nvme_dev_pm_ops = { - .suspend= nvme_suspend, - .resume = nvme_resume, - .freeze = nvme_simple_suspend, - .thaw = nvme_simple_resume, - .poweroff = nvme_simple_suspend, - .restore= nvme_simple_resume, -}; -#endif /* CONFIG_PM_SLEEP */ +static SIMPLE_DEV_PM_OPS(nvme_dev_pm_ops, nvme_suspend, nvme_resume); static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev, pci_channel_state_t state) @@ -3042,11 +2955,9 @@ static struct pci_driver nvme_driver = { .probe = nvme_probe, .remove = nvme_remove, .shutdown = nvme_shutdown, -#ifdef CONFIG_PM_SLEEP .driver = { .pm = _dev_pm_ops, }, -#endif .sriov_configure =