Re: [PATCH v4 1/1] PCI/ERR: don't clobber status after reset_link()
On Mon, Jan 25, 2021 at 09:34 Kuppuswamy, Sathyanarayanan wrote: On 1/8/21 2:30 PM, Bjorn Helgaas wrote: Can we push this forward now? There are several pending patches in this area from Keith and Sathyanarayanan; I haven't gotten to them yet, so not sure whether they help address any of this. Following two patches should also address the same issue. My patch: https://patchwork.kernel.org/project/linux-pci/patch/6f63321637fef86b6cf0beebf98b987062f9e811.1610153755.git.sathyanarayanan.kuppusw...@linux.intel.com/ This series does *not* fix the problem for me. Keith's patch: https://patchwork.kernel.org/project/linux-pci/patch/20210104230300.1277180-4-kbu...@kernel.org/ Keith's series *does* fix the problem for me: Acked-by: Hedi Berriche Tested-by: Hedi Berriche Cheers, Hedi. -- Sathyanarayanan Kuppuswamy Linux Kernel Developer -- Be careful of reading health books, you might die of a misprint. -- Mark Twain
Re: [PATCH v4 1/1] PCI/ERR: don't clobber status after reset_link()
On Mon, Nov 02, 2020 at 15:10 Hedi Berriche wrote: Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") broke pcie_do_recovery(): updating status after reset_link() has the ill side effect of causing recovery to fail if the error status is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following code will *never* run in the case of a successful reset_link() 177 if (status == PCI_ERS_RESULT_CAN_RECOVER) { ... 181 } 183 if (status == PCI_ERS_RESULT_NEED_RESET) { ... 192 } For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not calling ->slot_reset() (because we skip report_slot_reset()) thus breaking driver (re)initialisation. Don't clobber status with the return value of reset_link(); set status to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Reviewed-by: Sinan Kaya Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..2730826cfd8a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -165,10 +165,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; + } else { + if (status == PCI_ERS_RESULT_DISCONNECT || + status == PCI_ERS_RESULT_NO_AER_DRIVER) + status = PCI_ERS_RESULT_RECOVERED; } } else { pci_walk_bus(bus, report_normal_detected, ); -- 2.28.0 Bjorn, Sorry to bug you, but could you please cast your eyes on this patch and let me know whether you have any concerns that might be barring it from inclusion. Cheers, Hedi. -- Be careful of reading health books, you might die of a misprint. -- Mark Twain
[PATCH v4 0/1] PCI/ERR: fix regression introduced by 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
This is essentially a resend of v3 as it failed to get enough traction; no code change, I only added Sinan Kaya's Reviewed-by. - Changes since v3: * added Sinan Kaya Reviewed-by - Changes since v2: * set status to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. - Changes since v1: * changed the commit message to clarify what broke post commit 6d2c89441571 * dropped the misnomer post_reset_status variable in favour of a more natural approach that relies on a boolean to keep track of the outcome of reset_link() After commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") pcie_do_recovery() no longer calls ->slot_reset() in the case of a successful reset which breaks error recovery by breaking driver (re)initialisation. Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: Sinan Kaya Cc: sta...@kernel.org # v5.7+ --- Hedi Berriche (1): PCI/ERR: don't clobber status after reset_link() drivers/pci/pcie/err.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) -- 2.28.0
[PATCH v4 1/1] PCI/ERR: don't clobber status after reset_link()
Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") broke pcie_do_recovery(): updating status after reset_link() has the ill side effect of causing recovery to fail if the error status is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following code will *never* run in the case of a successful reset_link() 177 if (status == PCI_ERS_RESULT_CAN_RECOVER) { ... 181 } 183 if (status == PCI_ERS_RESULT_NEED_RESET) { ... 192 } For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not calling ->slot_reset() (because we skip report_slot_reset()) thus breaking driver (re)initialisation. Don't clobber status with the return value of reset_link(); set status to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Reviewed-by: Sinan Kaya Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..2730826cfd8a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -165,10 +165,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; + } else { + if (status == PCI_ERS_RESULT_DISCONNECT || + status == PCI_ERS_RESULT_NO_AER_DRIVER) + status = PCI_ERS_RESULT_RECOVERED; } } else { pci_walk_bus(bus, report_normal_detected, ); -- 2.28.0
Re: [RESEND PATCH v3 1/1] PCI/ERR: don't clobber status after reset_link()
On Sun, Oct 11, 2020 at 18:56 Sinan Kaya wrote: On 10/10/2020 6:16 PM, Hedi Berriche wrote: Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") broke pcie_do_recovery(): updating status after reset_link() has the ill side effect of causing recovery to fail if the error status is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following code will *never* run in the case of a successful reset_link() 177 if (status == PCI_ERS_RESULT_CAN_RECOVER) { ... 181 } 183 if (status == PCI_ERS_RESULT_NEED_RESET) { ... 192 } For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not calling ->slot_reset() (because we skip report_slot_reset()) thus breaking driver (re)initialisation. Don't clobber status with the return value of reset_link(); set status to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..2730826cfd8a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -165,10 +165,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; + } else { + if (status == PCI_ERS_RESULT_DISCONNECT || + status == PCI_ERS_RESULT_NO_AER_DRIVER) + status = PCI_ERS_RESULT_RECOVERED; } } else { pci_walk_bus(bus, report_normal_detected, ); Reviewed-by: Sinan Kaya Thanks for the Reviewed-by, Sinan. Folks, Any further comments or is this ripe for acceptance? Cheers, Hedi. -- Be careful of reading health books, you might die of a misprint. -- Mark Twain
[RESEND PATCH v3 0/1] PCI/ERR: fix regression introduced by 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
This is a resend of v3 as the the original, sent over 6 hours ago, is yet to make it to LKML. - Changes since v2: * set status to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. - Changes since v1: * changed the commit message to clarify what broke post commit 6d2c89441571 * dropped the misnomer post_reset_status variable in favour of a more natural approach that relies on a boolean to keep track of the outcome of reset_link() After commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") pcie_do_recovery() no longer calls ->slot_reset() in the case of a successful reset which breaks error recovery by breaking driver (re)initialisation. Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- Hedi Berriche (1): PCI/ERR: don't clobber status after reset_link() drivers/pci/pcie/err.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) -- 2.28.0
[RESEND PATCH v3 1/1] PCI/ERR: don't clobber status after reset_link()
Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") broke pcie_do_recovery(): updating status after reset_link() has the ill side effect of causing recovery to fail if the error status is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following code will *never* run in the case of a successful reset_link() 177 if (status == PCI_ERS_RESULT_CAN_RECOVER) { ... 181 } 183 if (status == PCI_ERS_RESULT_NEED_RESET) { ... 192 } For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not calling ->slot_reset() (because we skip report_slot_reset()) thus breaking driver (re)initialisation. Don't clobber status with the return value of reset_link(); set status to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..2730826cfd8a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -165,10 +165,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; + } else { + if (status == PCI_ERS_RESULT_DISCONNECT || + status == PCI_ERS_RESULT_NO_AER_DRIVER) + status = PCI_ERS_RESULT_RECOVERED; } } else { pci_walk_bus(bus, report_normal_detected, ); -- 2.28.0
[PATCH v3 0/1] PCI/ERR: fix regression introduced by 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
- Changes since v2: * set status to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. - Changes since v1: * changed the commit message to clarify what broke post commit 6d2c89441571 * dropped the misnomer post_reset_status variable in favour of a more natural approach that relies on a boolean to keep track of the outcome of reset_link() After commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") pcie_do_recovery() no longer calls ->slot_reset() in the case of a successful reset which breaks error recovery by breaking driver (re)initialisation. Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- Hedi Berriche (1): PCI/ERR: don't clobber status after reset_link() drivers/pci/pcie/err.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) -- 2.28.0
[PATCH v3 1/1] PCI/ERR: don't clobber status after reset_link()
Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") broke pcie_do_recovery(): updating status after reset_link() has the ill side effect of causing recovery to fail if the error status is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following code will *never* run in the case of a successful reset_link() 177 if (status == PCI_ERS_RESULT_CAN_RECOVER) { ... 181 } 183 if (status == PCI_ERS_RESULT_NEED_RESET) { ... 192 } For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not calling ->slot_reset() (because we skip report_slot_reset()) thus breaking driver (re)initialisation. Don't clobber status with the return value of reset_link(); set status to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..2730826cfd8a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -165,10 +165,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; + } else { + if (status == PCI_ERS_RESULT_DISCONNECT || + status == PCI_ERS_RESULT_NO_AER_DRIVER) + status = PCI_ERS_RESULT_RECOVERED; } } else { pci_walk_bus(bus, report_normal_detected, ); -- 2.28.0
[PATCH v2 1/1] PCI/ERR: don't clobber status after reset_link()
Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") broke pcie_do_recovery(): updating status after reset_link() has the ill side effect of causing recovery to fail if the error status is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following code will *never* run in the case of a successful reset_link() 177 if (status == PCI_ERS_RESULT_CAN_RECOVER) { ... 181 } 183 if (status == PCI_ERS_RESULT_NEED_RESET) { ... 192 } For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not calling ->slot_reset() (because we skip report_slot_reset()) thus breaking driver (re)initialisation. Don't clobber status after reset_link(), use a boolean instead to track the outcome of reset_link(). Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..b4bfa87fc49d 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -152,6 +152,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, { pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; struct pci_bus *bus; + bool reset_failed = false; /* * Error recovery runs on all subordinates of the first downstream port. @@ -165,8 +166,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + reset_failed = (reset_link(dev) != PCI_ERS_RESULT_RECOVERED); + if (reset_failed) { pci_warn(dev, "link reset failed\n"); goto failed; } @@ -174,6 +175,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_walk_bus(bus, report_normal_detected, ); } + if ((status == PCI_ERS_RESULT_DISCONNECT || +status == PCI_ERS_RESULT_NO_AER_DRIVER) && +!reset_failed) { + status = PCI_ERS_RESULT_RECOVERED; + } + if (status == PCI_ERS_RESULT_CAN_RECOVER) { status = PCI_ERS_RESULT_RECOVERED; pci_dbg(dev, "broadcast mmio_enabled message\n"); -- 2.28.0
[PATCH v2 0/1] PCI/ERR: fix regression introduced by 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
- Changes since v1: * changed the commit message to clarify what broke post commit 6d2c89441571 * dropped the misnomer post_reset_status variable in favour of a more natural approach that relies on a boolean to keep track of the outcome of reset_link() After commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") pcie_do_recovery() no longer calls ->slot_reset() in the case of a successful reset which breaks error recovery by breaking driver (re)initialisation. Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- Hedi Berriche (1): PCI/ERR: don't clobber status after reset_link() drivers/pci/pcie/err.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) -- 2.28.0
Re: [PATCH v1 1/1] PCI/ERR: don't clobber status after reset_link()
On Fri, Oct 09, 2020 at 05:09 Hedi Berriche wrote: On Fri, Oct 09, 2020 at 04:46 Raj, Ashok wrote: Hi Ashok, Thanks for looking into this. On Fri, Oct 09, 2020 at 03:52:51AM +0100, Hedi Berriche wrote: Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") changed pcie_do_recovery() so that status is updated with the return value from reset_link(); this was to fix the problem where we would wrongly report recovery failure, despite a successful reset_link(), whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Unfortunately this breaks the flow of pcie_do_recovery() as it prevents What is the reference to "this breaks" above? The code change introduced by commit 6d2c89441571; would "this code change" instead of "this breaks" work better? If not, I can also rephrase the whole paragraph along the following lines: Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") breaks the flow of pcie_do_recovery() as it prevents the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes error recovery to fail. ... and do away with the first paragraph. the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes error recovery to fail. Don't clobber status after reset_link() to restore the intended flow in pcie_do_recovery(). Fix the original problem by saving the return value from reset_link() and use it later on to decide whether error recovery should be deemed successful in the scenarios where the initial error status is PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. I would rather rephrase the above to make it clear what is being proposed. Since the description seems to talk about the old problem and new solution all mixed up. OK; will do that to clarify that what's being proposed here is: 1. fix the regression introduced by commit 6d2c89441571 2. address the problem that commit 6d2c89441571 aimed to fix Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Keith Busch Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..dbd0b56bd6c1 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_channel_state_t state, pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) { - pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; + pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; why call it post_reset_status? Perhaps post_reset_status is not a great choice; would reset_result or reset_link_result be better? ... or just do this with a boolean instead as I had it in an earlier iteration of the patch before I eventually opted to use an pci_ers_result_t. Cheers, Hedi. Cheers, Hedi. struct pci_bus *bus; /* @@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + post_reset_status = reset_link(dev); + if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; } @@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_walk_bus(bus, report_normal_detected, ); } + if ((status == PCI_ERS_RESULT_DISCONNECT || +status == PCI_ERS_RESULT_NO_AER_DRIVER) && +post_reset_status == PCI_ERS_RESULT_RECOVERED) { + /* error recovery succeeded thanks to reset_link() */ + status = PCI_ERS_RESULT_RECOVERED; + } + if (status == PCI_ERS_RESULT_CAN_RECOVER) { status = PCI_ERS_RESULT_RECOVERED; pci_dbg(dev, "broadcast mmio_enabled message\n"); -- 2.28.0 -- Be careful of reading health books, you might die of a misprint. -- Mark Twain -- Be careful of reading health books, you might die of a misprint. -- Mark Twain
Re: [PATCH v1 1/1] PCI/ERR: don't clobber status after reset_link()
On Fri, Oct 09, 2020 at 04:46 Raj, Ashok wrote: Hi Ashok, Thanks for looking into this. On Fri, Oct 09, 2020 at 03:52:51AM +0100, Hedi Berriche wrote: Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") changed pcie_do_recovery() so that status is updated with the return value from reset_link(); this was to fix the problem where we would wrongly report recovery failure, despite a successful reset_link(), whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Unfortunately this breaks the flow of pcie_do_recovery() as it prevents What is the reference to "this breaks" above? The code change introduced by commit 6d2c89441571; would "this code change" instead of "this breaks" work better? If not, I can also rephrase the whole paragraph along the following lines: Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") breaks the flow of pcie_do_recovery() as it prevents the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes error recovery to fail. ... and do away with the first paragraph. the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes error recovery to fail. Don't clobber status after reset_link() to restore the intended flow in pcie_do_recovery(). Fix the original problem by saving the return value from reset_link() and use it later on to decide whether error recovery should be deemed successful in the scenarios where the initial error status is PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. I would rather rephrase the above to make it clear what is being proposed. Since the description seems to talk about the old problem and new solution all mixed up. OK; will do that to clarify that what's being proposed here is: 1. fix the regression introduced by commit 6d2c89441571 2. address the problem that commit 6d2c89441571 aimed to fix Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Keith Busch Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..dbd0b56bd6c1 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_channel_state_t state, pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) { - pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; + pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; why call it post_reset_status? Perhaps post_reset_status is not a great choice; would reset_result or reset_link_result be better? Cheers, Hedi. struct pci_bus *bus; /* @@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + post_reset_status = reset_link(dev); + if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; } @@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_walk_bus(bus, report_normal_detected, ); } + if ((status == PCI_ERS_RESULT_DISCONNECT || +status == PCI_ERS_RESULT_NO_AER_DRIVER) && +post_reset_status == PCI_ERS_RESULT_RECOVERED) { + /* error recovery succeeded thanks to reset_link() */ + status = PCI_ERS_RESULT_RECOVERED; + } + if (status == PCI_ERS_RESULT_CAN_RECOVER) { status = PCI_ERS_RESULT_RECOVERED; pci_dbg(dev, "broadcast mmio_enabled message\n"); -- 2.28.0 -- Be careful of reading health books, you might die of a misprint. -- Mark Twain
[PATCH v1 1/1] PCI/ERR: don't clobber status after reset_link()
Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") changed pcie_do_recovery() so that status is updated with the return value from reset_link(); this was to fix the problem where we would wrongly report recovery failure, despite a successful reset_link(), whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Unfortunately this breaks the flow of pcie_do_recovery() as it prevents the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes error recovery to fail. Don't clobber status after reset_link() to restore the intended flow in pcie_do_recovery(). Fix the original problem by saving the return value from reset_link() and use it later on to decide whether error recovery should be deemed successful in the scenarios where the initial error status is PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche Cc: Russ Anderson Cc: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas Cc: Ashok Raj Cc: Keith Busch Cc: Joerg Roedel Cc: sta...@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..dbd0b56bd6c1 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_channel_state_t state, pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) { - pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; + pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; struct pci_bus *bus; /* @@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, ); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + post_reset_status = reset_link(dev); + if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; } @@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_walk_bus(bus, report_normal_detected, ); } + if ((status == PCI_ERS_RESULT_DISCONNECT || +status == PCI_ERS_RESULT_NO_AER_DRIVER) && +post_reset_status == PCI_ERS_RESULT_RECOVERED) { + /* error recovery succeeded thanks to reset_link() */ + status = PCI_ERS_RESULT_RECOVERED; + } + if (status == PCI_ERS_RESULT_CAN_RECOVER) { status = PCI_ERS_RESULT_RECOVERED; pci_dbg(dev, "broadcast mmio_enabled message\n"); -- 2.28.0
[Patch] ACPICA: Increase AE_OWNER_ID_LIMIT to 2047
32 sockets systems with 192 NVDIMMs run into the ACPI_OWNER_ID_MAX limit which is currently set to 255, and nfit kernel module initialisation fails with the following representative error messages: ACPI Error: Could not allocate new OwnerId (255 max), AE_OWNER_ID_LIMIT (20170303/utownerid-149 ACPI Error: Method parse/execution failed [\_SB.NVDR.N031.PCDR] (Node 9e2fffd8e280), AE_OWNER_ID_LIMIT (20170303/psparse-543) ACPI Error: Method parse/execution failed [\_SB.NVDR.N031.CR05] (Node 9547ffd91bb8), AE_OWNER_ID_LIMIT (20170303/psparse-543) ACPI Error: Method parse/execution failed [\_SB.NVDR.N031.CRLD] (Node 8e99ffd92550), AE_OWNER_ID_LIMIT (20170303/psparse-543) ACPI Error: Method parse/execution failed [\_SB.NVDR.N031._DSM] (Node adc5ffd90e88), AE_OWNER_ID_LIMIT (20170303/psparse-543) ACPI: \_SB_.NVDR.N031: failed to evaluate _DSM (0x1b) Further debugging shows that, on such a system, we end up using 1020 owner IDs, hence I'm suggesting that we bump ACPI_OWNER_ID_MAX up to 2047. Signed-off-by: Hedi Berriche Cc: Russ Anderson Cc: Mike Travis Cc: Frank Ramsay Cc: Robert Moore Cc: Erik Schmauss Cc: Rafael J. Wysocki --- drivers/acpi/acpica/utownerid.c | 6 +++--- include/acpi/acconfig.h | 4 ++-- include/acpi/actypes.h | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/acpi/acpica/utownerid.c b/drivers/acpi/acpica/utownerid.c index 5eb8b76ce9d8..c015a2c147d9 100644 --- a/drivers/acpi/acpica/utownerid.c +++ b/drivers/acpi/acpica/utownerid.c @@ -88,7 +88,7 @@ acpi_status acpi_ut_allocate_owner_id(acpi_owner_id *owner_id) /* * Construct encoded ID from the index and bit position * -* Note: Last [j].k (bit 255) is never used and is marked +* Note: Last [j].k (bit 2047) is never used and is marked * permanently allocated (prevents +1 overflow) */ *owner_id = @@ -116,7 +116,7 @@ acpi_status acpi_ut_allocate_owner_id(acpi_owner_id *owner_id) */ status = AE_OWNER_ID_LIMIT; ACPI_ERROR((AE_INFO, - "Could not allocate new OwnerId (255 max), AE_OWNER_ID_LIMIT")); + "Could not allocate new OwnerId (2047 max), AE_OWNER_ID_LIMIT")); exit: (void)acpi_ut_release_mutex(ACPI_MTX_CACHES); @@ -133,7 +133,7 @@ acpi_status acpi_ut_allocate_owner_id(acpi_owner_id *owner_id) * control method or unloading a table. Either way, we would * ignore any error anyway. * - * DESCRIPTION: Release a table or method owner ID. Valid IDs are 1 - 255 + * DESCRIPTION: Release a table or method owner ID. Valid IDs are 1 - 2047 * **/ diff --git a/include/acpi/acconfig.h b/include/acpi/acconfig.h index 16a83959e616..536fe9a81cb7 100644 --- a/include/acpi/acconfig.h +++ b/include/acpi/acconfig.h @@ -95,9 +95,9 @@ #define ACPI_DEFAULT_PAGE_SIZE 4096 /* Must be power of 2 */ -/* owner_id tracking. 8 entries allows for 255 owner_ids */ +/* owner_id tracking. 64 entries allow for 2047 owner_ids */ -#define ACPI_NUM_OWNERID_MASKS 8 +#define ACPI_NUM_OWNERID_MASKS 64 /* Size of the root table array is increased by this increment */ diff --git a/include/acpi/actypes.h b/include/acpi/actypes.h index ad6892a24015..f32a4d49ea13 100644 --- a/include/acpi/actypes.h +++ b/include/acpi/actypes.h @@ -442,8 +442,8 @@ typedef void *acpi_handle; /* Actually a ptr to a NS Node */ /* Owner IDs are used to track namespace nodes for selective deletion */ -typedef u8 acpi_owner_id; -#define ACPI_OWNER_ID_MAX 0xFF +typedef u16 acpi_owner_id; +#define ACPI_OWNER_ID_MAX 0x7FF #define ACPI_INTEGER_BIT_SIZE 64 #define ACPI_MAX_DECIMAL_DIGITS 20 /* 2^64 = 18,446,744,073,709,551,616 */ -- 2.20.1
[tip:x86/uv] x86/platform/UV: Use efi_enabled() instead of test_bit()
Commit-ID: 8945d96f7b3ead56e053ac79b8f7b0de98a30bfe Gitweb: https://git.kernel.org/tip/8945d96f7b3ead56e053ac79b8f7b0de98a30bfe Author: Hedi Berriche AuthorDate: Wed, 13 Feb 2019 19:34:12 + Committer: Borislav Petkov CommitDate: Fri, 15 Feb 2019 15:15:18 +0100 x86/platform/UV: Use efi_enabled() instead of test_bit() Use ad hoc efi_enabled() instead of fiddling with test_bit(). Cleanup, no functional changes. Signed-off-by: Hedi Berriche Signed-off-by: Borislav Petkov Reviewed-by: Ard Biesheuvel Reviewed-by: Russ Anderson Reviewed-by: Dimitri Sivanich Reviewed-by: Mike Travis Cc: Andy Shevchenko Cc: Bhupesh Sharma Cc: Darren Hart Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-efi Cc: platform-driver-...@vger.kernel.org Cc: Steve Wahl Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190213193413.25560-4-hedi.berri...@hpe.com --- arch/x86/platform/uv/bios_uv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 91e3d5285836..38a2e3431fc6 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -44,7 +44,7 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) * If EFI_OLD_MEMMAP is set, we need to fall back to using our old EFI * callback method, which uses efi_call() directly, with the kernel page tables: */ - if (unlikely(test_bit(EFI_OLD_MEMMAP, ))) + if (unlikely(efi_enabled(EFI_OLD_MEMMAP))) ret = efi_call((void *)__va(tab->function), (u64)which, a1, a2, a3, a4, a5); else ret = efi_call_virt_pointer(tab, function, (u64)which, a1, a2, a3, a4, a5);
[tip:x86/uv] x86/platform/UV: Remove unnecessary #ifdef CONFIG_EFI
Commit-ID: 30ad3e031d2feae075cd5fd2c443baa2d86c0195 Gitweb: https://git.kernel.org/tip/30ad3e031d2feae075cd5fd2c443baa2d86c0195 Author: Hedi Berriche AuthorDate: Wed, 13 Feb 2019 19:34:10 + Committer: Borislav Petkov CommitDate: Fri, 15 Feb 2019 15:05:15 +0100 x86/platform/UV: Remove unnecessary #ifdef CONFIG_EFI CONFIG_EFI is implied by CONFIG_X86_UV and x86/platform/uv/bios_uv.c requires the latter, get rid of the redundant #ifdef CONFIG_EFI directives. Cleanup, no functional changes. Signed-off-by: Hedi Berriche Signed-off-by: Borislav Petkov Reviewed-by: Ard Biesheuvel Reviewed-by: Russ Anderson Reviewed-by: Dimitri Sivanich Reviewed-by: Mike Travis Cc: Andy Shevchenko Cc: Bhupesh Sharma Cc: Darren Hart Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-efi Cc: platform-driver-...@vger.kernel.org Cc: Steve Wahl Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190213193413.25560-2-hedi.berri...@hpe.com --- arch/x86/include/asm/uv/bios.h | 4 arch/x86/platform/uv/bios_uv.c | 2 -- 2 files changed, 6 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index e652a7cc6186..00d862cfbcbe 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -151,11 +151,7 @@ extern s64 uv_bios_change_memprotect(u64, u64, enum uv_memprotect); extern s64 uv_bios_reserved_page_pa(u64, u64 *, u64 *, u64 *); extern int uv_bios_set_legacy_vga_target(bool decode, int domain, int bus); -#ifdef CONFIG_EFI extern void uv_bios_init(void); -#else -void uv_bios_init(void) { } -#endif extern unsigned long sn_rtc_cycles_per_second; extern int uv_type; diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 4a6a5a26c582..4a61ed2a7bb8 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -188,7 +188,6 @@ int uv_bios_set_legacy_vga_target(bool decode, int domain, int bus) } EXPORT_SYMBOL_GPL(uv_bios_set_legacy_vga_target); -#ifdef CONFIG_EFI void uv_bios_init(void) { uv_systab = NULL; @@ -218,4 +217,3 @@ void uv_bios_init(void) } pr_info("UV: UVsystab: Revision:%x\n", uv_systab->revision); } -#endif
[tip:x86/uv] x86/platform/UV: Remove uv_bios_call_reentrant()
Commit-ID: f816525d615fff0336d0c881e5d960bbec5ea016 Gitweb: https://git.kernel.org/tip/f816525d615fff0336d0c881e5d960bbec5ea016 Author: Hedi Berriche AuthorDate: Wed, 13 Feb 2019 19:34:11 + Committer: Borislav Petkov CommitDate: Fri, 15 Feb 2019 15:13:48 +0100 x86/platform/UV: Remove uv_bios_call_reentrant() uv_bios_call_reentrant() has no callers nor is it exported, remove it. Cleanup, no functional changes. Signed-off-by: Hedi Berriche Signed-off-by: Borislav Petkov Reviewed-by: Ard Biesheuvel Reviewed-by: Russ Anderson Reviewed-by: Dimitri Sivanich Reviewed-by: Mike Travis Cc: Andy Shevchenko Cc: Bhupesh Sharma Cc: Darren Hart Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-efi Cc: platform-driver-...@vger.kernel.org Cc: Steve Wahl Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190213193413.25560-3-hedi.berri...@hpe.com --- arch/x86/include/asm/uv/bios.h | 1 - arch/x86/platform/uv/bios_uv.c | 12 2 files changed, 13 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 00d862cfbcbe..8c6ac271b5b3 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -140,7 +140,6 @@ enum uv_memprotect { */ extern s64 uv_bios_call(enum uv_bios_cmd, u64, u64, u64, u64, u64); extern s64 uv_bios_call_irqsave(enum uv_bios_cmd, u64, u64, u64, u64, u64); -extern s64 uv_bios_call_reentrant(enum uv_bios_cmd, u64, u64, u64, u64, u64); extern s64 uv_bios_get_sn_info(int, int *, long *, long *, long *, long *); extern s64 uv_bios_freq_base(u64, u64 *); diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 4a61ed2a7bb8..91e3d5285836 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -66,18 +66,6 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, return ret; } -s64 uv_bios_call_reentrant(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, - u64 a4, u64 a5) -{ - s64 ret; - - preempt_disable(); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); - preempt_enable(); - - return ret; -} - long sn_partition_id; EXPORT_SYMBOL_GPL(sn_partition_id);
[tip:x86/urgent] x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls
Commit-ID: f331e766c4be33f4338574f3c9f7f77e98ab4571 Gitweb: https://git.kernel.org/tip/f331e766c4be33f4338574f3c9f7f77e98ab4571 Author: Hedi Berriche AuthorDate: Wed, 13 Feb 2019 19:34:13 + Committer: Borislav Petkov CommitDate: Fri, 15 Feb 2019 15:19:56 +0100 x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls Calls into UV firmware must be protected against concurrency, expose the efi_runtime_lock to the UV platform, and use it to serialise UV BIOS calls. Signed-off-by: Hedi Berriche Signed-off-by: Borislav Petkov Reviewed-by: Ard Biesheuvel Reviewed-by: Russ Anderson Reviewed-by: Dimitri Sivanich Reviewed-by: Mike Travis Cc: Andy Shevchenko Cc: Bhupesh Sharma Cc: Darren Hart Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-efi Cc: platform-driver-...@vger.kernel.org Cc: sta...@vger.kernel.org # v4.9+ Cc: Steve Wahl Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190213193413.25560-5-hedi.berri...@hpe.com --- arch/x86/include/asm/uv/bios.h | 8 +++- arch/x86/platform/uv/bios_uv.c | 23 +-- drivers/firmware/efi/runtime-wrappers.c | 7 +++ 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index e652a7cc6186..3f697a9e3f59 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -48,7 +48,8 @@ enum { BIOS_STATUS_SUCCESS = 0, BIOS_STATUS_UNIMPLEMENTED = -ENOSYS, BIOS_STATUS_EINVAL = -EINVAL, - BIOS_STATUS_UNAVAIL = -EBUSY + BIOS_STATUS_UNAVAIL = -EBUSY, + BIOS_STATUS_ABORT = -EINTR, }; /* Address map parameters */ @@ -167,4 +168,9 @@ extern long system_serial_number; extern struct kobject *sgi_uv_kobj;/* /sys/firmware/sgi_uv */ +/* + * EFI runtime lock; cf. firmware/efi/runtime-wrappers.c for details + */ +extern struct semaphore __efi_uv_runtime_lock; + #endif /* _ASM_X86_UV_BIOS_H */ diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 4a6a5a26c582..eb33432f2f24 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -29,7 +29,8 @@ struct uv_systab *uv_systab; -s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +static s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, + u64 a4, u64 a5) { struct uv_systab *tab = uv_systab; s64 ret; @@ -51,6 +52,19 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) return ret; } + +s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +{ + s64 ret; + + if (down_interruptible(&__efi_uv_runtime_lock)) + return BIOS_STATUS_ABORT; + + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); + up(&__efi_uv_runtime_lock); + + return ret; +} EXPORT_SYMBOL_GPL(uv_bios_call); s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, @@ -59,10 +73,15 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, unsigned long bios_flags; s64 ret; + if (down_interruptible(&__efi_uv_runtime_lock)) + return BIOS_STATUS_ABORT; + local_irq_save(bios_flags); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); local_irq_restore(bios_flags); + up(&__efi_uv_runtime_lock); + return ret; } diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 8903b9ccfc2b..e2abfdb5cee6 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -146,6 +146,13 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) */ static DEFINE_SEMAPHORE(efi_runtime_lock); +/* + * Expose the EFI runtime lock to the UV platform + */ +#ifdef CONFIG_X86_UV +extern struct semaphore __efi_uv_runtime_lock __alias(efi_runtime_lock); +#endif + /* * Calls the appropriate efi_runtime_service() with the appropriate * arguments.
[Patch v3 4/4] x86/platform/UV: use efi_runtime_lock to serialise BIOS calls
Calls into UV firmware must be protected against concurrency, expose the efi_runtime_lock to the UV platform, and use it to serialise UV BIOS calls. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org # v4.9+ Signed-off-by: Hedi Berriche --- arch/x86/include/asm/uv/bios.h | 8 +++- arch/x86/platform/uv/bios_uv.c | 23 +-- drivers/firmware/efi/runtime-wrappers.c | 7 +++ 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 8c6ac271b5b3..8cfccc3cbbf4 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -48,7 +48,8 @@ enum { BIOS_STATUS_SUCCESS = 0, BIOS_STATUS_UNIMPLEMENTED = -ENOSYS, BIOS_STATUS_EINVAL = -EINVAL, - BIOS_STATUS_UNAVAIL = -EBUSY + BIOS_STATUS_UNAVAIL = -EBUSY, + BIOS_STATUS_ABORT = -EINTR, }; /* Address map parameters */ @@ -162,4 +163,9 @@ extern long system_serial_number; extern struct kobject *sgi_uv_kobj;/* /sys/firmware/sgi_uv */ +/* + * EFI runtime lock; cf. firmware/efi/runtime-wrappers.c for details + */ +extern struct semaphore __efi_uv_runtime_lock; + #endif /* _ASM_X86_UV_BIOS_H */ diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 38a2e3431fc6..ef60d789c76e 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -29,7 +29,8 @@ struct uv_systab *uv_systab; -s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +static s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, + u64 a4, u64 a5) { struct uv_systab *tab = uv_systab; s64 ret; @@ -51,6 +52,19 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) return ret; } + +s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +{ + s64 ret; + + if (down_interruptible(&__efi_uv_runtime_lock)) + return BIOS_STATUS_ABORT; + + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); + up(&__efi_uv_runtime_lock); + + return ret; +} EXPORT_SYMBOL_GPL(uv_bios_call); s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, @@ -59,10 +73,15 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, unsigned long bios_flags; s64 ret; + if (down_interruptible(&__efi_uv_runtime_lock)) + return BIOS_STATUS_ABORT; + local_irq_save(bios_flags); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); local_irq_restore(bios_flags); + up(&__efi_uv_runtime_lock); + return ret; } diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 8903b9ccfc2b..e2abfdb5cee6 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -146,6 +146,13 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) */ static DEFINE_SEMAPHORE(efi_runtime_lock); +/* + * Expose the EFI runtime lock to the UV platform + */ +#ifdef CONFIG_X86_UV +extern struct semaphore __efi_uv_runtime_lock __alias(efi_runtime_lock); +#endif + /* * Calls the appropriate efi_runtime_service() with the appropriate * arguments. -- 2.20.1
[Patch v3 2/4] x86/platform/UV: kill uv_bios_call_reentrant()
uv_bios_call_reentrant() has no callers nor is it exported, kill it. Cleanup, no functional changes. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org # v4.9+ Signed-off-by: Hedi Berriche --- arch/x86/include/asm/uv/bios.h | 1 - arch/x86/platform/uv/bios_uv.c | 12 2 files changed, 13 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 00d862cfbcbe..8c6ac271b5b3 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -140,7 +140,6 @@ enum uv_memprotect { */ extern s64 uv_bios_call(enum uv_bios_cmd, u64, u64, u64, u64, u64); extern s64 uv_bios_call_irqsave(enum uv_bios_cmd, u64, u64, u64, u64, u64); -extern s64 uv_bios_call_reentrant(enum uv_bios_cmd, u64, u64, u64, u64, u64); extern s64 uv_bios_get_sn_info(int, int *, long *, long *, long *, long *); extern s64 uv_bios_freq_base(u64, u64 *); diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 4a61ed2a7bb8..91e3d5285836 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -66,18 +66,6 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, return ret; } -s64 uv_bios_call_reentrant(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, - u64 a4, u64 a5) -{ - s64 ret; - - preempt_disable(); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); - preempt_enable(); - - return ret; -} - long sn_partition_id; EXPORT_SYMBOL_GPL(sn_partition_id); -- 2.20.1
[Patch v3 1/4] x86/platform/UV: remove unnecessary #ifdef CONFIG_EFI
CONFIG_EFI is implied by CONFIG_X86_UV and x86/platform/uv/bios_uv.c requires the latter, get rid of the redundant #ifdef CONFIG_EFI directives. Cleanup, no functional changes. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org # v4.9+ Signed-off-by: Hedi Berriche --- arch/x86/include/asm/uv/bios.h | 4 arch/x86/platform/uv/bios_uv.c | 2 -- 2 files changed, 6 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index e652a7cc6186..00d862cfbcbe 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -151,11 +151,7 @@ extern s64 uv_bios_change_memprotect(u64, u64, enum uv_memprotect); extern s64 uv_bios_reserved_page_pa(u64, u64 *, u64 *, u64 *); extern int uv_bios_set_legacy_vga_target(bool decode, int domain, int bus); -#ifdef CONFIG_EFI extern void uv_bios_init(void); -#else -void uv_bios_init(void) { } -#endif extern unsigned long sn_rtc_cycles_per_second; extern int uv_type; diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 4a6a5a26c582..4a61ed2a7bb8 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -188,7 +188,6 @@ int uv_bios_set_legacy_vga_target(bool decode, int domain, int bus) } EXPORT_SYMBOL_GPL(uv_bios_set_legacy_vga_target); -#ifdef CONFIG_EFI void uv_bios_init(void) { uv_systab = NULL; @@ -218,4 +217,3 @@ void uv_bios_init(void) } pr_info("UV: UVsystab: Revision:%x\n", uv_systab->revision); } -#endif -- 2.20.1
[Patch v3 0/4] Protect against concurrent calls into UV BIOS
- Changes since v2 Addressed comments from Ard Biesheuvel: * expose efi_runtime_lock to UV platform only instead of globally * remove unnecessary #ifdef CONFIG_EFI from bios_uv.c - Changes since v1: Addressed comments from Bhupesh Sharma, Thomas Gleixner, and Ard Biesheuvel: * made __uv_bios_call() static * moved the efi_enabled() cleanup to its own patchlet * explained the reason for renaming the efi_runtime_lock semaphore * dropped the reviewed-bys as they should be given on the mailing list * Cc'ng sta...@vger.kernel.org given the nature of the problem addressed by the patches --- Calls into UV BIOS were not being serialised which is wrong as it violates EFI runtime rules, and bad as it does result in all sorts of potentially hard to track down hangs and panics when efi_scratch.prev_mm gets clobbered whenever efi_switch_mm() gets called concurrently from two different CPUs. Patch #1 removes an unnecessary #ifdef CONFIG_EFI guard from bios_uv.c. Patch #2 removes uv_bios_call_reentrant() because it's dead code. Patch #3 is a cleanup that drops test_bit() in favour of the ad hoc efi_enabled(). Patch #4 makes uv_bios_call() variants use the efi_runtime_lock semaphore to protect against concurrency. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org # v4.9+ Hedi Berriche (4): x86/platform/UV: remove unnecessary #ifdef CONFIG_EFI x86/platform/UV: kill uv_bios_call_reentrant() as it has no callers x86/platform/UV: use efi_enabled() instead of test_bit() x86/platform/UV: use efi_runtime_lock to serialise BIOS calls arch/x86/include/asm/uv/bios.h | 13 - arch/x86/platform/uv/bios_uv.c | 35 ++--- drivers/firmware/efi/runtime-wrappers.c | 7 + 3 files changed, 34 insertions(+), 21 deletions(-) -- 2.20.1
[Patch v3 3/4] x86/platform/UV: use efi_enabled() instead of test_bit()
Use ad hoc efi_enabled() instead of fiddling with test_bit(). Cleanup, no functional changes. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org # v4.9+ Signed-off-by: Hedi Berriche --- arch/x86/platform/uv/bios_uv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 91e3d5285836..38a2e3431fc6 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -44,7 +44,7 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) * If EFI_OLD_MEMMAP is set, we need to fall back to using our old EFI * callback method, which uses efi_call() directly, with the kernel page tables: */ - if (unlikely(test_bit(EFI_OLD_MEMMAP, ))) + if (unlikely(efi_enabled(EFI_OLD_MEMMAP))) ret = efi_call((void *)__va(tab->function), (u64)which, a1, a2, a3, a4, a5); else ret = efi_call_virt_pointer(tab, function, (u64)which, a1, a2, a3, a4, a5); -- 2.20.1
Re: [Patch v2 1/4] efi/x86: turn EFI runtime semaphore into a global lock
On Tue, Feb 12, 2019 at 17:25 Hedi Berriche wrote: On Thu, Feb 07, 2019 at 17:38 Hedi Berriche wrote: On Thu, Feb 07, 2019 at 16:05 Ard Biesheuvel wrote: On Thu, 7 Feb 2019 at 05:23, Hedi Berriche wrote: Make efi_runtime_lock semaphore global so that it can be used by EFI runtime callers that may be defined outside efi/runtime-wrappers.c. Also now that efi_runtime_lock semaphore is no longer static, rename it to efi_runtime_sem so it doesn't clash with the efi_runtime_lock spinlock defined in arch/x86/platform/efi/efi_64.c when CONFIG_EFI_MIXED is enabled. The immediate motivation of this change is to serialise UV platform BIOS calls using the efi_runtime_sem semaphore. No functional changes. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org Signed-off-by: Hedi Berriche Hello Hedi, Hi Ard, Same feedback as on v1: please don't rename the lock. With the efi_runtime_lock semaphore being no longer static, not renaming it will cause a compile failure as it will clash with the declaration of the efi_runtime_lock *spinlock* in arch/x86/platform/efi/efi_64.c in the CONFIG_EFI_MIXED case. Ard, Any comments on whether resolving the conflict between {efi_runtime_lock, drivers/firmware/efi/runtime-wrappers.c} and {efi_runtime_lock, arch/x86/platform/efi/efi_64.c} provides the required justification for renaming the efi_runtime_lock semaphore? Ard, Apologies for sending this email which must have come across as absurd given the *second* email you sent on 2019-02-07 19:38:42. The trouble is that I *never* received that email (nor the one you sent today 2019-02-12 17:26:06 as reply to this one) because for some reason my email address was dropped from the distribution list. It's only about 30 minutes ago that a colleague brought it up to my attention and forwarded both emails: Thu, 7 Feb 2019 20:38:42 +0100 Tue, 12 Feb 2019 18:26:06 +0100 No idea how my email address got dropped but I wanted to make sure to explain the seemingly absurd email I sent today as well as the lack of comment on your earlier email. Cheers, Hedi. --- drivers/firmware/efi/runtime-wrappers.c | 60 - include/linux/efi.h | 3 ++ 2 files changed, 33 insertions(+), 30 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 8903b9ccfc2b..ec60d6227925 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -53,7 +53,7 @@ struct efi_runtime_work efi_rts_work; * @rts_arg<1-5>: efi_runtime_service() function arguments * * Accesses to efi_runtime_services() are serialized by a binary - * semaphore (efi_runtime_lock) and caller waits until the work is + * semaphore (efi_runtime_sem) and caller waits until the work is * finished, hence _only_ one work is queued at a time and the caller * thread waits for completion. */ @@ -144,7 +144,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) * none of the remaining functions are actually ever called at runtime. * So let's just use a single lock to serialize all Runtime Services calls. */ -static DEFINE_SEMAPHORE(efi_runtime_lock); +DEFINE_SEMAPHORE(efi_runtime_sem); /* * Calls the appropriate efi_runtime_service() with the appropriate @@ -233,10 +233,10 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -244,10 +244,10 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -257,11 +257,11 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -269,11 +269,11 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_WAKEUP_TIME, , tm, NULL, NULL, NULL); - up(_runtime_lock); +
Re: [Patch v2 1/4] efi/x86: turn EFI runtime semaphore into a global lock
On Thu, Feb 07, 2019 at 17:38 Hedi Berriche wrote: On Thu, Feb 07, 2019 at 16:05 Ard Biesheuvel wrote: On Thu, 7 Feb 2019 at 05:23, Hedi Berriche wrote: Make efi_runtime_lock semaphore global so that it can be used by EFI runtime callers that may be defined outside efi/runtime-wrappers.c. Also now that efi_runtime_lock semaphore is no longer static, rename it to efi_runtime_sem so it doesn't clash with the efi_runtime_lock spinlock defined in arch/x86/platform/efi/efi_64.c when CONFIG_EFI_MIXED is enabled. The immediate motivation of this change is to serialise UV platform BIOS calls using the efi_runtime_sem semaphore. No functional changes. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org Signed-off-by: Hedi Berriche Hello Hedi, Hi Ard, Same feedback as on v1: please don't rename the lock. With the efi_runtime_lock semaphore being no longer static, not renaming it will cause a compile failure as it will clash with the declaration of the efi_runtime_lock *spinlock* in arch/x86/platform/efi/efi_64.c in the CONFIG_EFI_MIXED case. Ard, Any comments on whether resolving the conflict between {efi_runtime_lock, drivers/firmware/efi/runtime-wrappers.c} and {efi_runtime_lock, arch/x86/platform/efi/efi_64.c} provides the required justification for renaming the efi_runtime_lock semaphore? Cheers, Hedi. --- drivers/firmware/efi/runtime-wrappers.c | 60 - include/linux/efi.h | 3 ++ 2 files changed, 33 insertions(+), 30 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 8903b9ccfc2b..ec60d6227925 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -53,7 +53,7 @@ struct efi_runtime_work efi_rts_work; * @rts_arg<1-5>: efi_runtime_service() function arguments * * Accesses to efi_runtime_services() are serialized by a binary - * semaphore (efi_runtime_lock) and caller waits until the work is + * semaphore (efi_runtime_sem) and caller waits until the work is * finished, hence _only_ one work is queued at a time and the caller * thread waits for completion. */ @@ -144,7 +144,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) * none of the remaining functions are actually ever called at runtime. * So let's just use a single lock to serialize all Runtime Services calls. */ -static DEFINE_SEMAPHORE(efi_runtime_lock); +DEFINE_SEMAPHORE(efi_runtime_sem); /* * Calls the appropriate efi_runtime_service() with the appropriate @@ -233,10 +233,10 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -244,10 +244,10 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -257,11 +257,11 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -269,11 +269,11 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_WAKEUP_TIME, , tm, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -285,11 +285,11 @@ static efi_status_t virt_efi_get_variable(efi_char16_t *name, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size, data); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -299,11 +299,11 @@ static efi_status_t virt_efi_get_next_variable(unsigned long *name_size, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_A
Re: [Patch v2 1/4] efi/x86: turn EFI runtime semaphore into a global lock
On Thu, Feb 07, 2019 at 16:05 Ard Biesheuvel wrote: On Thu, 7 Feb 2019 at 05:23, Hedi Berriche wrote: Make efi_runtime_lock semaphore global so that it can be used by EFI runtime callers that may be defined outside efi/runtime-wrappers.c. Also now that efi_runtime_lock semaphore is no longer static, rename it to efi_runtime_sem so it doesn't clash with the efi_runtime_lock spinlock defined in arch/x86/platform/efi/efi_64.c when CONFIG_EFI_MIXED is enabled. The immediate motivation of this change is to serialise UV platform BIOS calls using the efi_runtime_sem semaphore. No functional changes. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org Signed-off-by: Hedi Berriche Hello Hedi, Hi Ard, Same feedback as on v1: please don't rename the lock. With the efi_runtime_lock semaphore being no longer static, not renaming it will cause a compile failure as it will clash with the declaration of the efi_runtime_lock *spinlock* in arch/x86/platform/efi/efi_64.c in the CONFIG_EFI_MIXED case. --- drivers/firmware/efi/runtime-wrappers.c | 60 - include/linux/efi.h | 3 ++ 2 files changed, 33 insertions(+), 30 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 8903b9ccfc2b..ec60d6227925 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -53,7 +53,7 @@ struct efi_runtime_work efi_rts_work; * @rts_arg<1-5>: efi_runtime_service() function arguments * * Accesses to efi_runtime_services() are serialized by a binary - * semaphore (efi_runtime_lock) and caller waits until the work is + * semaphore (efi_runtime_sem) and caller waits until the work is * finished, hence _only_ one work is queued at a time and the caller * thread waits for completion. */ @@ -144,7 +144,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) * none of the remaining functions are actually ever called at runtime. * So let's just use a single lock to serialize all Runtime Services calls. */ -static DEFINE_SEMAPHORE(efi_runtime_lock); +DEFINE_SEMAPHORE(efi_runtime_sem); /* * Calls the appropriate efi_runtime_service() with the appropriate @@ -233,10 +233,10 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -244,10 +244,10 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -257,11 +257,11 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -269,11 +269,11 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_WAKEUP_TIME, , tm, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -285,11 +285,11 @@ static efi_status_t virt_efi_get_variable(efi_char16_t *name, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size, data); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -299,11 +299,11 @@ static efi_status_t virt_efi_get_next_variable(unsigned long *name_size, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_NEXT_VARIABLE, name_size, name, vendor, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -315,11 +315,11 @@ static efi_status_t virt_efi_set_variable(efi_char16_t
[Patch v2 3/4] x86/platform/UV: use efi_enabled() instead of test_bit()
Use ad hoc efi_enabled() instead of fiddling with test_bit(). Cleanup, no functional changes. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org Signed-off-by: Hedi Berriche --- arch/x86/platform/uv/bios_uv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index cd05af157763..8bace0ca9e57 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -44,7 +44,7 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) * If EFI_OLD_MEMMAP is set, we need to fall back to using our old EFI * callback method, which uses efi_call() directly, with the kernel page tables: */ - if (unlikely(test_bit(EFI_OLD_MEMMAP, ))) + if (unlikely(efi_enabled(EFI_OLD_MEMMAP))) ret = efi_call((void *)__va(tab->function), (u64)which, a1, a2, a3, a4, a5); else ret = efi_call_virt_pointer(tab, function, (u64)which, a1, a2, a3, a4, a5); -- 2.20.1
[Patch v2 2/4] x86/platform/UV: kill uv_bios_call_reentrant() as it has no callers
uv_bios_call_reentrant() has no callers nor is it exported, kill it. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org Signed-off-by: Hedi Berriche --- arch/x86/include/asm/uv/bios.h | 1 - arch/x86/platform/uv/bios_uv.c | 12 2 files changed, 13 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index e652a7cc6186..4eee646544b2 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -140,7 +140,6 @@ enum uv_memprotect { */ extern s64 uv_bios_call(enum uv_bios_cmd, u64, u64, u64, u64, u64); extern s64 uv_bios_call_irqsave(enum uv_bios_cmd, u64, u64, u64, u64, u64); -extern s64 uv_bios_call_reentrant(enum uv_bios_cmd, u64, u64, u64, u64, u64); extern s64 uv_bios_get_sn_info(int, int *, long *, long *, long *, long *); extern s64 uv_bios_freq_base(u64, u64 *); diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 4a6a5a26c582..cd05af157763 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -66,18 +66,6 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, return ret; } -s64 uv_bios_call_reentrant(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, - u64 a4, u64 a5) -{ - s64 ret; - - preempt_disable(); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); - preempt_enable(); - - return ret; -} - long sn_partition_id; EXPORT_SYMBOL_GPL(sn_partition_id); -- 2.20.1
[Patch v2 0/4] Protect against concurrent calls into UV BIOS
Changes since v1: Addressed comments from Bhupesh Sharma, Thomas Gleixner, and Ard Biesheuvel: * made __uv_bios_call() static * moved the efi_enabled() cleanup to its own patchlet * explained the reason for renaming the efi_runtime_lock semaphore * dropped the reviewed-bys as they should be given on the mailing list * Cc'ng sta...@vger.kernel.org given the nature of the problem addressed by the patches --- Calls into UV BIOS were not being serialised which is wrong as it violates EFI runtime rules, and bad as it does result in all sorts of potentially hard to track down hangs and panics when efi_scratch.prev_mm gets clobbered whenever efi_switch_mm() gets called concurrently from two different CPUs. Patch #1 makes the efi_runtime_lock semaphore visible to EFI runtime callers defined outside drivers/firmware/efi/runtime-wrappers.c in preparation for using it to serialise calls into UV BIOS; the lock is also renamed to efi_runtime_sem so that it can coexist with the efi_runtime_lock spinlock defined in arch/x86/platform/efi/efi_64.c when CONFIG_EFI_MIXED is enabled. Patch #2 removes uv_bios_call_reentrant() because it's dead code. Patch #3 is a cleanup that drops test_bit() in favour of the ad hoc efi_enabled(). Patch #4 makes uv_bios_call() variants use efi_runtime_sem to protect against concurrency. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org Hedi Berriche (4): efi/x86: turn EFI runtime semaphore into a global lock x86/platform/UV: kill uv_bios_call_reentrant() as it has no callers x86/platform/UV: use efi_enabled() instead of test_bit() x86/platform/UV: use efi_runtime_sem to serialise BIOS calls arch/x86/include/asm/uv/bios.h | 4 +- arch/x86/platform/uv/bios_uv.c | 33 -- drivers/firmware/efi/runtime-wrappers.c | 60 - include/linux/efi.h | 3 ++ 4 files changed, 55 insertions(+), 45 deletions(-) -- 2.20.1
[Patch v2 4/4] x86/platform/UV: use efi_runtime_sem to serialise BIOS calls
Calls into UV firmware must be protected against concurrency, use the now visible efi_runtime_sem lock to serialise them. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org Signed-off-by: Hedi Berriche --- arch/x86/include/asm/uv/bios.h | 3 ++- arch/x86/platform/uv/bios_uv.c | 23 +-- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 4eee646544b2..581e2978a16c 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -48,7 +48,8 @@ enum { BIOS_STATUS_SUCCESS = 0, BIOS_STATUS_UNIMPLEMENTED = -ENOSYS, BIOS_STATUS_EINVAL = -EINVAL, - BIOS_STATUS_UNAVAIL = -EBUSY + BIOS_STATUS_UNAVAIL = -EBUSY, + BIOS_STATUS_ABORT = -EINTR, }; /* Address map parameters */ diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 8bace0ca9e57..e779b2a128ea 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -29,7 +29,8 @@ struct uv_systab *uv_systab; -s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +static s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, + u64 a4, u64 a5) { struct uv_systab *tab = uv_systab; s64 ret; @@ -51,6 +52,19 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) return ret; } + +s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +{ + s64 ret; + + if (down_interruptible(_runtime_sem)) + return BIOS_STATUS_ABORT; + + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); + up(_runtime_sem); + + return ret; +} EXPORT_SYMBOL_GPL(uv_bios_call); s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, @@ -59,10 +73,15 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, unsigned long bios_flags; s64 ret; + if (down_interruptible(_runtime_sem)) + return BIOS_STATUS_ABORT; + local_irq_save(bios_flags); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); local_irq_restore(bios_flags); + up(_runtime_sem); + return ret; } -- 2.20.1
[Patch v2 1/4] efi/x86: turn EFI runtime semaphore into a global lock
Make efi_runtime_lock semaphore global so that it can be used by EFI runtime callers that may be defined outside efi/runtime-wrappers.c. Also now that efi_runtime_lock semaphore is no longer static, rename it to efi_runtime_sem so it doesn't clash with the efi_runtime_lock spinlock defined in arch/x86/platform/efi/efi_64.c when CONFIG_EFI_MIXED is enabled. The immediate motivation of this change is to serialise UV platform BIOS calls using the efi_runtime_sem semaphore. No functional changes. Cc: Russ Anderson Cc: Mike Travis Cc: Dimitri Sivanich Cc: Steve Wahl Cc: sta...@vger.kernel.org Signed-off-by: Hedi Berriche --- drivers/firmware/efi/runtime-wrappers.c | 60 - include/linux/efi.h | 3 ++ 2 files changed, 33 insertions(+), 30 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 8903b9ccfc2b..ec60d6227925 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -53,7 +53,7 @@ struct efi_runtime_work efi_rts_work; * @rts_arg<1-5>: efi_runtime_service() function arguments * * Accesses to efi_runtime_services() are serialized by a binary - * semaphore (efi_runtime_lock) and caller waits until the work is + * semaphore (efi_runtime_sem) and caller waits until the work is * finished, hence _only_ one work is queued at a time and the caller * thread waits for completion. */ @@ -144,7 +144,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) * none of the remaining functions are actually ever called at runtime. * So let's just use a single lock to serialize all Runtime Services calls. */ -static DEFINE_SEMAPHORE(efi_runtime_lock); +DEFINE_SEMAPHORE(efi_runtime_sem); /* * Calls the appropriate efi_runtime_service() with the appropriate @@ -233,10 +233,10 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -244,10 +244,10 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -257,11 +257,11 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -269,11 +269,11 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_WAKEUP_TIME, , tm, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -285,11 +285,11 @@ static efi_status_t virt_efi_get_variable(efi_char16_t *name, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size, data); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -299,11 +299,11 @@ static efi_status_t virt_efi_get_next_variable(unsigned long *name_size, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_NEXT_VARIABLE, name_size, name, vendor, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -315,11 +315,11 @@ static efi_status_t virt_efi_set_variable(efi_char16_t *name, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_VARIABLE, name, vendor, , _size, data); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -330,12 +330,12 @@ virt_efi_set_variable_nonblocking(efi_char16_t
Re: [PATCH 1/3] efi/x86: turn EFI runtime semaphore into a global lock
On Tue, Jan 15, 2019 at 18:55 Thomas Gleixner wrote: On Wed, 9 Jan 2019, Hedi Berriche wrote: Make efi_runtime_lock semaphore global so that it can be used by EFI runtime callers that may be defined outside efi/runtime-wrappers.c. The changelog should mention why the lock is renamed. OK; will add to V2 that it was renamed so that it doesn't clash with the efi_runtime_lock spinlock defined by CONFIG_EFI_MIXED. I have no strong opinion, but to apply that I need an Ack from Ard. OK; thanks for the feedback. Cheers, Hedi. -- Be careful of reading health books, you might die of a misprint. -- Mark Twain
Re: [PATCH 3/3] x86/platform/UV: use efi_runtime_sem to serialise BIOS calls
On Thu, Jan 10, 2019 at 07:14 Bhupesh Sharma wrote: Hi Hedi, Hi Bhupesh, Thanks for the patchset. Thanks for looking at it. I will give this a go on my sgi-uv300 machine and come back with more detailed inputs, and for testing it. but I wanted to ask about the hang/panic you mentioned in the cover letter when efi_scratch gets clobbered. Can you describe the same (for e.g. how to reproduce this). When efi_switch_mm() gets called concurrently from two different CPUs --via arch_efi_call_virt_setup()-- due to lack of serialisation in uv_bios_call(), efi_scratch.prev_mm is overwritten and that's how all hell breaks loose, and that's when you see either a hang (the more frequent failure mode) or a panic. In order to reproduce the problem you'd need, for example, a kernel module that makes use of uv_bios_call(), in which case a test case would be a loop with: - 2 concurrent tasks both invoking uv_bios_call() or - 2 concurrent tasks - one invoking uv_bios_call() - one, for example, accessing an EFI vars via efivars Nitpicks below: On 01/09/2019 04:15 PM, Hedi Berriche wrote: Calls into UV firmware must be protected against concurrency, use the now visible efi_runtime_sem lock to serialise them. Signed-off-by: Hedi Berriche Reviewed-by: Russ Anderson Reviewed-by: Mike Travis Reviewed-by: Dimitri Sivanich Reviewed-by: Steve Wahl --- arch/x86/include/asm/uv/bios.h | 3 ++- arch/x86/platform/uv/bios_uv.c | 25 ++--- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 4eee646544b2..33e94aa0b1ff 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -48,7 +48,8 @@ enum { BIOS_STATUS_SUCCESS = 0, BIOS_STATUS_UNIMPLEMENTED = -ENOSYS, BIOS_STATUS_EINVAL = -EINVAL, - BIOS_STATUS_UNAVAIL = -EBUSY + BIOS_STATUS_UNAVAIL = -EBUSY, + BIOS_STATUS_ABORT = -EINTR }; /* Address map parameters */ diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index cd05af157763..92f960798e20 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -29,7 +29,8 @@ struct uv_systab *uv_systab; -s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, + u64 a4, u64 a5) Can we make this static? Will do. { struct uv_systab *tab = uv_systab; s64 ret; @@ -44,13 +45,26 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) * If EFI_OLD_MEMMAP is set, we need to fall back to using our old EFI * callback method, which uses efi_call() directly, with the kernel page tables: */ - if (unlikely(test_bit(EFI_OLD_MEMMAP, ))) + if (unlikely(efi_enabled(EFI_OLD_MEMMAP))) ret = efi_call((void *)__va(tab->function), (u64)which, a1, a2, a3, a4, a5); else ret = efi_call_virt_pointer(tab, function, (u64)which, a1, a2, a3, a4, a5); return ret; } + +s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +{ + s64 ret; + + if (down_interruptible(_runtime_sem)) + return BIOS_STATUS_ABORT; + + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); + up(_runtime_sem); + + return ret; +} EXPORT_SYMBOL_GPL(uv_bios_call); s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, @@ -59,10 +73,15 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, unsigned long bios_flags; s64 ret; + if (down_interruptible(_runtime_sem)) + return BIOS_STATUS_ABORT; + local_irq_save(bios_flags); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); local_irq_restore(bios_flags); + up(_runtime_sem); + return ret; } Thanks, Bhupesh Cheers, Hedi. -- Be careful of reading health books, you might die of a misprint. -- Mark Twain
[PATCH 3/3] x86/platform/UV: use efi_runtime_sem to serialise BIOS calls
Calls into UV firmware must be protected against concurrency, use the now visible efi_runtime_sem lock to serialise them. Signed-off-by: Hedi Berriche Reviewed-by: Russ Anderson Reviewed-by: Mike Travis Reviewed-by: Dimitri Sivanich Reviewed-by: Steve Wahl --- arch/x86/include/asm/uv/bios.h | 3 ++- arch/x86/platform/uv/bios_uv.c | 25 ++--- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 4eee646544b2..33e94aa0b1ff 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -48,7 +48,8 @@ enum { BIOS_STATUS_SUCCESS = 0, BIOS_STATUS_UNIMPLEMENTED = -ENOSYS, BIOS_STATUS_EINVAL = -EINVAL, - BIOS_STATUS_UNAVAIL = -EBUSY + BIOS_STATUS_UNAVAIL = -EBUSY, + BIOS_STATUS_ABORT = -EINTR }; /* Address map parameters */ diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index cd05af157763..92f960798e20 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -29,7 +29,8 @@ struct uv_systab *uv_systab; -s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, + u64 a4, u64 a5) { struct uv_systab *tab = uv_systab; s64 ret; @@ -44,13 +45,26 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) * If EFI_OLD_MEMMAP is set, we need to fall back to using our old EFI * callback method, which uses efi_call() directly, with the kernel page tables: */ - if (unlikely(test_bit(EFI_OLD_MEMMAP, ))) + if (unlikely(efi_enabled(EFI_OLD_MEMMAP))) ret = efi_call((void *)__va(tab->function), (u64)which, a1, a2, a3, a4, a5); else ret = efi_call_virt_pointer(tab, function, (u64)which, a1, a2, a3, a4, a5); return ret; } + +s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5) +{ + s64 ret; + + if (down_interruptible(_runtime_sem)) + return BIOS_STATUS_ABORT; + + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); + up(_runtime_sem); + + return ret; +} EXPORT_SYMBOL_GPL(uv_bios_call); s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, @@ -59,10 +73,15 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, unsigned long bios_flags; s64 ret; + if (down_interruptible(_runtime_sem)) + return BIOS_STATUS_ABORT; + local_irq_save(bios_flags); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); + ret = __uv_bios_call(which, a1, a2, a3, a4, a5); local_irq_restore(bios_flags); + up(_runtime_sem); + return ret; } -- 2.20.0
[PATCH 1/3] efi/x86: turn EFI runtime semaphore into a global lock
Make efi_runtime_lock semaphore global so that it can be used by EFI runtime callers that may be defined outside efi/runtime-wrappers.c. The immediate motivation is to piggy-back it to serialise UV platform BIOS calls. No functional changes. Signed-off-by: Hedi Berriche Reviewed-by: Russ Anderson Reviewed-by: Mike Travis Reviewed-by: Dimitri Sivanich Reviewed-by: Steve Wahl --- drivers/firmware/efi/runtime-wrappers.c | 60 include/linux/efi.h |3 + 2 files changed, 33 insertions(+), 30 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 8903b9ccfc2b..ec60d6227925 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -53,7 +53,7 @@ struct efi_runtime_work efi_rts_work; * @rts_arg<1-5>: efi_runtime_service() function arguments * * Accesses to efi_runtime_services() are serialized by a binary - * semaphore (efi_runtime_lock) and caller waits until the work is + * semaphore (efi_runtime_sem) and caller waits until the work is * finished, hence _only_ one work is queued at a time and the caller * thread waits for completion. */ @@ -144,7 +144,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) * none of the remaining functions are actually ever called at runtime. * So let's just use a single lock to serialize all Runtime Services calls. */ -static DEFINE_SEMAPHORE(efi_runtime_lock); +DEFINE_SEMAPHORE(efi_runtime_sem); /* * Calls the appropriate efi_runtime_service() with the appropriate @@ -233,10 +233,10 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -244,10 +244,10 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -257,11 +257,11 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -269,11 +269,11 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_WAKEUP_TIME, , tm, NULL, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -285,11 +285,11 @@ static efi_status_t virt_efi_get_variable(efi_char16_t *name, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size, data); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -299,11 +299,11 @@ static efi_status_t virt_efi_get_next_variable(unsigned long *name_size, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(GET_NEXT_VARIABLE, name_size, name, vendor, NULL, NULL); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -315,11 +315,11 @@ static efi_status_t virt_efi_set_variable(efi_char16_t *name, { efi_status_t status; - if (down_interruptible(_runtime_lock)) + if (down_interruptible(_runtime_sem)) return EFI_ABORTED; status = efi_queue_work(SET_VARIABLE, name, vendor, , _size, data); - up(_runtime_lock); + up(_runtime_sem); return status; } @@ -330,12 +330,12 @@ virt_efi_set_variable_nonblocking(efi_char16_t *name, efi_guid_t *vendor, { efi_status_t status; - if (down_trylock(_runtime_lock)) + if (down_trylock(_runtime_sem)) return EFI_NOT_READY; status = efi_call_virt(set_variable, name,
[PATCH 0/3] Protect against concurrent calls into UV BIOS
Calls into UV BIOS were not being serialised which is wrong as it violates EFI runtime rules, and bad as it does result in all sorts of potentially hard to track down hangs and panics when efi_scratch gets clobbered. Patch #1 makes the efi_runtime_lock semaphore visible to EFI runtime callers defined outside drivers/firmware/efi/runtime-wrappers.c in preparation for using it to serialise calls into UV BIOS. Patch #2 removes uv_bios_call_reentrant() because it's dead code. Patch #3 makes uv_bios_call() variants use efi_runtime_sem to protect against concurrency. Hedi Berriche (3): efi/x86: turn EFI runtime semaphore into a global lock x86/platform/UV: kill uv_bios_call_reentrant() as it has no callers x86/platform/UV: use efi_runtime_sem to serialise BIOS calls arch/x86/include/asm/uv/bios.h |4 +- arch/x86/platform/uv/bios_uv.c | 37 +++ drivers/firmware/efi/runtime-wrappers.c | 60 include/linux/efi.h |3 + 4 files changed, 57 insertions(+), 47 deletions(-) -- 2.20.0
[PATCH 2/3] x86/platform/UV: kill uv_bios_call_reentrant() as it has no callers
uv_bios_call_reentrant() has no callers nor is it exported, kill it. Signed-off-by: Hedi Berriche Reviewed-by: Russ Anderson Reviewed-by: Mike Travis Reviewed-by: Dimitri Sivanich Reviewed-by: Steve Wahl --- arch/x86/include/asm/uv/bios.h | 1 - arch/x86/platform/uv/bios_uv.c | 12 2 files changed, 13 deletions(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index e652a7cc6186..4eee646544b2 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -140,7 +140,6 @@ enum uv_memprotect { */ extern s64 uv_bios_call(enum uv_bios_cmd, u64, u64, u64, u64, u64); extern s64 uv_bios_call_irqsave(enum uv_bios_cmd, u64, u64, u64, u64, u64); -extern s64 uv_bios_call_reentrant(enum uv_bios_cmd, u64, u64, u64, u64, u64); extern s64 uv_bios_get_sn_info(int, int *, long *, long *, long *, long *); extern s64 uv_bios_freq_base(u64, u64 *); diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c index 4a6a5a26c582..cd05af157763 100644 --- a/arch/x86/platform/uv/bios_uv.c +++ b/arch/x86/platform/uv/bios_uv.c @@ -66,18 +66,6 @@ s64 uv_bios_call_irqsave(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, return ret; } -s64 uv_bios_call_reentrant(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3, - u64 a4, u64 a5) -{ - s64 ret; - - preempt_disable(); - ret = uv_bios_call(which, a1, a2, a3, a4, a5); - preempt_enable(); - - return ret; -} - long sn_partition_id; EXPORT_SYMBOL_GPL(sn_partition_id); -- 2.20.0
Re: [PATCH] kernel, add panic_on_warn
On Thu, Oct 30, 2014 at 17:06 Prarit Bhargava wrote: | There have been several times where I have had to rebuild a kernel to | cause a panic when hitting a WARN() in the code in order to get a crash | dump from a system. Sometimes this is easy to do, other times (such as | in the case of a remote admin) it is not trivial to send new images to the | user. | | A much easier method would be a switch to change the WARN() over to a | panic. This makes debugging easier in that I can now test the actual | image the WARN() was seen on and I do not have to engage in remote | debugging. Do we want to leave it to usersspace[1] to ensure panic_on_warn is out of the way in when the kdump kernel boots? or would a self-contained approach be more preferable i.e. test whether we're a kdump kernel before bothering with panic_on_warn? Cheers, Hedi. [1] kexec-tools in the case of the boot param by filtering it out of the kdump kernel cmdline. In the case of sysctl.conf, it would depend on whether there are distros out there that include it in the kdump initrd. -- Hedi Berriche Linux Kernel Engineer http://www.sgi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel, add panic_on_warn
On Thu, Oct 30, 2014 at 17:06 Prarit Bhargava wrote: | There have been several times where I have had to rebuild a kernel to | cause a panic when hitting a WARN() in the code in order to get a crash | dump from a system. Sometimes this is easy to do, other times (such as | in the case of a remote admin) it is not trivial to send new images to the | user. | | A much easier method would be a switch to change the WARN() over to a | panic. This makes debugging easier in that I can now test the actual | image the WARN() was seen on and I do not have to engage in remote | debugging. Do we want to leave it to usersspace[1] to ensure panic_on_warn is out of the way in when the kdump kernel boots? or would a self-contained approach be more preferable i.e. test whether we're a kdump kernel before bothering with panic_on_warn? Cheers, Hedi. [1] kexec-tools in the case of the boot param by filtering it out of the kdump kernel cmdline. In the case of sysctl.conf, it would depend on whether there are distros out there that include it in the kdump initrd. -- Hedi Berriche Linux Kernel Engineer http://www.sgi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] sched: division by zero in find_busiest_group()
On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote: | Folks, | | The following panic occurs *early* at boot time on high *enough* CPU count | machines: | | divide error: [#1] SMP | Modules linked in: | CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8 | Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 | task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000 | RIP: 0010:[] [] find_busiest_group+0x26b/0x890 | RSP: :8827d4a19b68 EFLAGS: 00010006 | RAX: 7fff RBX: 8000 RCX: 0200 | RDX: RSI: 8000 RDI: 0020 | RBP: 8827d4a19cc0 R08: R09: | R10: R11: R12: | R13: 8827d4a19d28 R14: 8827d4a19b98 R15: | FS: () GS:8827dfd8() knlGS: | CS: 0010 DS: ES: CR0: 8005003b | CR2: 00b8 CR3: 018da000 CR4: 07e0 | Stack: | 8827d4b35800 00014600 00014600 | 8827d4b35818 | 8000 | Call Trace: | [] load_balance+0x166/0x7f0 | [] idle_balance+0x10e/0x1b0 | [] __schedule+0x723/0x780 | [] schedule+0x29/0x70 | [] worker_thread+0x1c9/0x400 | [] ? rescuer_thread+0x3e0/0x3e0 | [] kthread+0xd2/0xf0 | [] ? kthread_create_on_node+0x180/0x180 | [] ret_from_fork+0x7c/0xb0 | [] ? kthread_create_on_node+0x180/0x180 Hmm...had time to dig into this a bit deeper and looking at build_overlap_sched_groups(), specifically this bit of code: kernel/sched/core.c: 5066 static int 5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu) 5068 { ... 5109 /* 5110 * Initialize sgp->power such that even if we mess up the 5111 * domains and no possible iteration will get us here, we won't 5112 * die on a /0 trap. 5113 */ 5114 sg->sgp->power = SCHED_POWER_SCALE * cpumask_weight(sg_span); I'm wondering whether the same precaution should be used when it comes to sg->sgp->power_orig. Cheers, Hedi. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] sched: division by zero in find_busiest_group()
On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote: | Folks, | | The following panic occurs *early* at boot time on high *enough* CPU count | machines: | | divide error: [#1] SMP | Modules linked in: | CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8 | Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 | task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000 | RIP: 0010:[810a345b] [810a345b] find_busiest_group+0x26b/0x890 | RSP: :8827d4a19b68 EFLAGS: 00010006 | RAX: 7fff RBX: 8000 RCX: 0200 | RDX: RSI: 8000 RDI: 0020 | RBP: 8827d4a19cc0 R08: R09: | R10: R11: R12: | R13: 8827d4a19d28 R14: 8827d4a19b98 R15: | FS: () GS:8827dfd8() knlGS: | CS: 0010 DS: ES: CR0: 8005003b | CR2: 00b8 CR3: 018da000 CR4: 07e0 | Stack: | 8827d4b35800 00014600 00014600 | 8827d4b35818 | 8000 | Call Trace: | [810a3be6] load_balance+0x166/0x7f0 | [810a477e] idle_balance+0x10e/0x1b0 | [815d83d3] __schedule+0x723/0x780 | [815d8459] schedule+0x29/0x70 | [810818b9] worker_thread+0x1c9/0x400 | [810816f0] ? rescuer_thread+0x3e0/0x3e0 | [81088562] kthread+0xd2/0xf0 | [81088490] ? kthread_create_on_node+0x180/0x180 | [815e437c] ret_from_fork+0x7c/0xb0 | [81088490] ? kthread_create_on_node+0x180/0x180 Hmm...had time to dig into this a bit deeper and looking at build_overlap_sched_groups(), specifically this bit of code: kernel/sched/core.c: 5066 static int 5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu) 5068 { ... 5109 /* 5110 * Initialize sgp-power such that even if we mess up the 5111 * domains and no possible iteration will get us here, we won't 5112 * die on a /0 trap. 5113 */ 5114 sg-sgp-power = SCHED_POWER_SCALE * cpumask_weight(sg_span); I'm wondering whether the same precaution should be used when it comes to sg-sgp-power_orig. Cheers, Hedi. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Regression] sched: division by zero in find_busiest_group()
Folks, The following panic occurs *early* at boot time on high *enough* CPU count machines: divide error: [#1] SMP Modules linked in: CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8 Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000 RIP: 0010:[] [] find_busiest_group+0x26b/0x890 RSP: :8827d4a19b68 EFLAGS: 00010006 RAX: 7fff RBX: 8000 RCX: 0200 RDX: RSI: 8000 RDI: 0020 RBP: 8827d4a19cc0 R08: R09: R10: R11: R12: R13: 8827d4a19d28 R14: 8827d4a19b98 R15: FS: () GS:8827dfd8() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 00b8 CR3: 018da000 CR4: 07e0 Stack: 8827d4b35800 00014600 00014600 8827d4b35818 8000 Call Trace: [] load_balance+0x166/0x7f0 [] idle_balance+0x10e/0x1b0 [] __schedule+0x723/0x780 [] schedule+0x29/0x70 [] worker_thread+0x1c9/0x400 [] ? rescuer_thread+0x3e0/0x3e0 [] kthread+0xd2/0xf0 [] ? kthread_create_on_node+0x180/0x180 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x180/0x180 Bisection points to 9abf24d sched: Check sched_domain before computing group power but without it (as clearly indicated in the changelog) the kernel panics thus: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: [] update_group_power+0xa2/0x250 PGD 0 Oops: [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-00122-gdea4f48 #10 Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 task: 881054528000 ti: 88105453 task.ti: 88105453 RIP: 0010:[] [] update_group_power+0xa2/0x250 RSP: :881054531d48 EFLAGS: 00010287 RAX: RBX: RCX: RDX: 00f0 RSI: 0100 RDI: 00c0 RBP: 881054531d70 R08: 89e7d4ae6018 R09: 0004 R10: 89e7d4ae6818 R11: 81098d5d R12: R13: 000146c0 R14: 89e7d4ae6000 R15: 89e7d4ae6018 FS: () GS:88105fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0010 CR3: 018c2000 CR4: 07f0 Stack: 89e7d4ae6000 89e7d4ac 00f0 00f0 898fd3a68c00 881054531e30 8109933f 0100 00ff 00010448 00ff 01000100 Call Trace: [] build_sched_domains+0xbff/0xc80 [] sched_init_smp+0x3ad/0x469 [] kernel_init_freeable+0xfa/0x207 [] ? rest_init+0x80/0x80 [] kernel_init+0xe/0x120 [] ret_from_fork+0x7c/0xb0 [] ? rest_init+0x80/0x80 and this is because of: 863bffc sched/fair: Fix group power_orig computation IOW, 9abf24d can't be blamed for it all, and this is not a case of a straightforward revert of a single commit. Back to the division by zero itself, it's taking place in the inlined sg_capacity(): find_busiest_group update_sd_lb_stats update_sg_lb_stats sg_capacity 5492 static inline int sg_capacity(struct lb_env *env, struct sched_group *group) 5493 { 5494 unsigned int capacity, smt, cpus; 5495 unsigned int power, power_orig; 5496 5497 power = group->sgp->power; 5498 power_orig = group->sgp->power_orig; 5499 cpus = group->group_weight; 5500 5501 /* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */ 5502 smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig); <-- HERE so we're arriving here with group->sgp->power_orig == 0. Cheers, Hedi. P.S. The following *works* around the panic: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e85cda2..48c8d0b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5735,6 +5735,9 @@ static int __sdt_alloc(const struct cpumask *cpu_map) if (!sgp) return -ENOMEM; + /* WAR: avoid a divison by zero in sg_capacity() */ + sgp->power_orig = 1; + *per_cpu_ptr(sdd->sgp, j) = sgp; } } and I wonder whether the following --on its own-- would make sense as a fix: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fd773ad..57578b3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5495,7 +5495,7 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group) unsigned int power, power_orig; power = group->sgp->power; -
[Regression] sched: division by zero in find_busiest_group()
Folks, The following panic occurs *early* at boot time on high *enough* CPU count machines: divide error: [#1] SMP Modules linked in: CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8 Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000 RIP: 0010:[810a345b] [810a345b] find_busiest_group+0x26b/0x890 RSP: :8827d4a19b68 EFLAGS: 00010006 RAX: 7fff RBX: 8000 RCX: 0200 RDX: RSI: 8000 RDI: 0020 RBP: 8827d4a19cc0 R08: R09: R10: R11: R12: R13: 8827d4a19d28 R14: 8827d4a19b98 R15: FS: () GS:8827dfd8() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 00b8 CR3: 018da000 CR4: 07e0 Stack: 8827d4b35800 00014600 00014600 8827d4b35818 8000 Call Trace: [810a3be6] load_balance+0x166/0x7f0 [810a477e] idle_balance+0x10e/0x1b0 [815d83d3] __schedule+0x723/0x780 [815d8459] schedule+0x29/0x70 [810818b9] worker_thread+0x1c9/0x400 [810816f0] ? rescuer_thread+0x3e0/0x3e0 [81088562] kthread+0xd2/0xf0 [81088490] ? kthread_create_on_node+0x180/0x180 [815e437c] ret_from_fork+0x7c/0xb0 [81088490] ? kthread_create_on_node+0x180/0x180 Bisection points to 9abf24d sched: Check sched_domain before computing group power but without it (as clearly indicated in the changelog) the kernel panics thus: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: [810a3542] update_group_power+0xa2/0x250 PGD 0 Oops: [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-00122-gdea4f48 #10 Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 task: 881054528000 ti: 88105453 task.ti: 88105453 RIP: 0010:[810a3542] [810a3542] update_group_power+0xa2/0x250 RSP: :881054531d48 EFLAGS: 00010287 RAX: RBX: RCX: RDX: 00f0 RSI: 0100 RDI: 00c0 RBP: 881054531d70 R08: 89e7d4ae6018 R09: 0004 R10: 89e7d4ae6818 R11: 81098d5d R12: R13: 000146c0 R14: 89e7d4ae6000 R15: 89e7d4ae6018 FS: () GS:88105fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0010 CR3: 018c2000 CR4: 07f0 Stack: 89e7d4ae6000 89e7d4ac 00f0 00f0 898fd3a68c00 881054531e30 8109933f 0100 00ff 00010448 00ff 01000100 Call Trace: [8109933f] build_sched_domains+0xbff/0xc80 [81a3c89e] sched_init_smp+0x3ad/0x469 [81a1c00c] kernel_init_freeable+0xfa/0x207 [815b3ea0] ? rest_init+0x80/0x80 [815b3eae] kernel_init+0xe/0x120 [815d547c] ret_from_fork+0x7c/0xb0 [815b3ea0] ? rest_init+0x80/0x80 and this is because of: 863bffc sched/fair: Fix group power_orig computation IOW, 9abf24d can't be blamed for it all, and this is not a case of a straightforward revert of a single commit. Back to the division by zero itself, it's taking place in the inlined sg_capacity(): find_busiest_group update_sd_lb_stats update_sg_lb_stats sg_capacity 5492 static inline int sg_capacity(struct lb_env *env, struct sched_group *group) 5493 { 5494 unsigned int capacity, smt, cpus; 5495 unsigned int power, power_orig; 5496 5497 power = group-sgp-power; 5498 power_orig = group-sgp-power_orig; 5499 cpus = group-group_weight; 5500 5501 /* smt := ceil(cpus / power), assumes: 1 smt_power 2 */ 5502 smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig); -- HERE so we're arriving here with group-sgp-power_orig == 0. Cheers, Hedi. P.S. The following *works* around the panic: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e85cda2..48c8d0b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5735,6 +5735,9 @@ static int __sdt_alloc(const struct cpumask *cpu_map) if (!sgp) return -ENOMEM; + /* WAR: avoid a divison by zero in sg_capacity() */ + sgp-power_orig = 1; + *per_cpu_ptr(sdd-sgp, j) = sgp; } } and I wonder whether the following --on its own-- would make
Commit 07f9b61 breaks systems that don't implement a _CBA method
Chaps, The following failure was encountered on hardware that does *not* implement a _CBA method which is AFAICT (and confirmed to me by BIOS chaps) optional. [1.230647] PCI: MMCONFIG for domain [bus 00-0c] at [mem 0x8000-0x80cf] (base 0x8000) [1.241046] PCI: MMCONFIG for domain 0001 [bus 00-02] at [mem 0xff8400-0xff842f] (base 0xff8400) [1.252025] PCI: MMCONFIG for domain 1000 [bus 3f-3f] at [mem 0xff83f0-0xff83ff] (base 0xff8000) [1.263006] PCI: MMCONFIG for domain 1001 [bus 3f-3f] at [mem 0xff87f0-0xff87ff] (base 0xff8400) [1.273984] PCI: MMCONFIG at [mem 0x8000-0x80cf] reserved in E820 [1.281564] PCI: MMCONFIG at [mem 0xff8400-0xff842f] reserved in E820 [1.289535] PCI: MMCONFIG at [mem 0xff83f0-0xff83ff] reserved in E820 [1.297505] PCI: MMCONFIG at [mem 0xff87f0-0xff87ff] reserved in E820 [1.427926] ACPI: PCI Root Bridge [I001] (domain 0001 [bus 00-3d]) [1.434968] acpi PNP0A08:00: fail to add MMCONFIG information, can't access PCI configuration space under this host bridge. [1.447405] acpi PNP0A08:00: Bus 0001:00 not present in PCI namespace ... [1.454608] ACPI: PCI Root Bridge [S000] (domain 1000 [bus 3f]) [1.461300] acpi PNP0A08:01: fail to add MMCONFIG information, can't access PCI configuration space under this host bridge. [1.473735] acpi PNP0A08:01: Bus 1000:3f not present in PCI namespace ... [1.480935] ACPI: PCI Root Bridge [S001] (domain 1001 [bus 3f]) [1.487630] acpi PNP0A08:02: fail to add MMCONFIG information, can't access PCI configuration space under this host bridge. [1.500066] acpi PNP0A08:02: Bus 1001:3f not present in PCI namespace Bisection points to this commit (included in full given its brevity): commit 07f9b61c3915e8eb156cb4461b3946736356ad02 Author: ethan.zhao Date: Fri Jul 26 11:21:24 2013 -0600 x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero We can check for addr being zero earlier and thus avoid the mutex_unlock() cleanup path. [bhelgaas: drop warning printk] Signed-off-by: ethan.zhao Signed-off-by: Bjorn Helgaas Acked-by: Yinghai Lu diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 082e881..5596c7b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -700,7 +700,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed) return -ENODEV; - if (start > end) + if (start > end || !addr) return -EINVAL; mutex_lock(_mmcfg_lock); @@ -716,11 +716,6 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, return -EEXIST; } - if (!addr) { - mutex_unlock(_mmcfg_lock); - return -EINVAL; - } - rc = -EBUSY; cfg = pci_mmconfig_alloc(seg, start, end, addr); if (cfg == NULL) { With this change in place, pci_mmconfig_insert(), when called with addr==0 bails out (too) early with -EINVAL and no longer calls into pci_mmconfig_lookup(): 693 int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, 694 phys_addr_t addr) 695 { 696 int rc; 697 struct resource *tmp = NULL; 698 struct pci_mmcfg_region *cfg; 699 700 if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed) 701 return -ENODEV; 702 703 if (start > end || !addr) 704 return -EINVAL; 705 706 mutex_lock(_mmcfg_lock); 707 cfg = pci_mmconfig_lookup(seg, start); 708 if (cfg) { 709 if (cfg->end_bus < end) 710 dev_info(dev, FW_INFO 711 "MMCONFIG for " 712 "domain %04x [bus %02x-%02x] " 713 "only partially covers this bridge\n", 714 cfg->segment, cfg->start_bus, cfg->end_bus); 715 mutex_unlock(_mmcfg_lock); 716 return -EEXIST; 717 } And given the code path reads: pci_acpi_scan_root() setup_mcfg_map() pci_mmconfig_insert() this causes setup_mcfg_map() to fail and go down the check_segment() error path: 171 static int setup_mcfg_map(struct pci_root_info *info, u16 seg, u8 start, 172 u8 end, phys_addr_t addr) 173 { ... 188 result = pci_mmconfig_insert(dev, seg, start, end, addr); 189 if (result == 0) { ... 194 } else if (result != -EEXIST) 195 return check_segment(seg, dev, 196 "fail to add MMCONFIG information,"); 197 198 return 0; 199 } and this finally causes pci_acpi_scan_root() to skip calling pci_create_root_bus() and all is doom and gloom: 473
Commit 07f9b61 breaks systems that don't implement a _CBA method
Chaps, The following failure was encountered on hardware that does *not* implement a _CBA method which is AFAICT (and confirmed to me by BIOS chaps) optional. [1.230647] PCI: MMCONFIG for domain [bus 00-0c] at [mem 0x8000-0x80cf] (base 0x8000) [1.241046] PCI: MMCONFIG for domain 0001 [bus 00-02] at [mem 0xff8400-0xff842f] (base 0xff8400) [1.252025] PCI: MMCONFIG for domain 1000 [bus 3f-3f] at [mem 0xff83f0-0xff83ff] (base 0xff8000) [1.263006] PCI: MMCONFIG for domain 1001 [bus 3f-3f] at [mem 0xff87f0-0xff87ff] (base 0xff8400) [1.273984] PCI: MMCONFIG at [mem 0x8000-0x80cf] reserved in E820 [1.281564] PCI: MMCONFIG at [mem 0xff8400-0xff842f] reserved in E820 [1.289535] PCI: MMCONFIG at [mem 0xff83f0-0xff83ff] reserved in E820 [1.297505] PCI: MMCONFIG at [mem 0xff87f0-0xff87ff] reserved in E820 snip [1.427926] ACPI: PCI Root Bridge [I001] (domain 0001 [bus 00-3d]) [1.434968] acpi PNP0A08:00: fail to add MMCONFIG information, can't access PCI configuration space under this host bridge. [1.447405] acpi PNP0A08:00: Bus 0001:00 not present in PCI namespace ... [1.454608] ACPI: PCI Root Bridge [S000] (domain 1000 [bus 3f]) [1.461300] acpi PNP0A08:01: fail to add MMCONFIG information, can't access PCI configuration space under this host bridge. [1.473735] acpi PNP0A08:01: Bus 1000:3f not present in PCI namespace ... [1.480935] ACPI: PCI Root Bridge [S001] (domain 1001 [bus 3f]) [1.487630] acpi PNP0A08:02: fail to add MMCONFIG information, can't access PCI configuration space under this host bridge. [1.500066] acpi PNP0A08:02: Bus 1001:3f not present in PCI namespace Bisection points to this commit (included in full given its brevity): commit 07f9b61c3915e8eb156cb4461b3946736356ad02 Author: ethan.zhao ethan.z...@oracle.com Date: Fri Jul 26 11:21:24 2013 -0600 x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero We can check for addr being zero earlier and thus avoid the mutex_unlock() cleanup path. [bhelgaas: drop warning printk] Signed-off-by: ethan.zhao ethan.z...@oracle.com Signed-off-by: Bjorn Helgaas bhelg...@google.com Acked-by: Yinghai Lu ying...@kernel.org diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 082e881..5596c7b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -700,7 +700,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, if (!(pci_probe PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed) return -ENODEV; - if (start end) + if (start end || !addr) return -EINVAL; mutex_lock(pci_mmcfg_lock); @@ -716,11 +716,6 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, return -EEXIST; } - if (!addr) { - mutex_unlock(pci_mmcfg_lock); - return -EINVAL; - } - rc = -EBUSY; cfg = pci_mmconfig_alloc(seg, start, end, addr); if (cfg == NULL) { With this change in place, pci_mmconfig_insert(), when called with addr==0 bails out (too) early with -EINVAL and no longer calls into pci_mmconfig_lookup(): 693 int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, 694 phys_addr_t addr) 695 { 696 int rc; 697 struct resource *tmp = NULL; 698 struct pci_mmcfg_region *cfg; 699 700 if (!(pci_probe PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed) 701 return -ENODEV; 702 703 if (start end || !addr) 704 return -EINVAL; 705 706 mutex_lock(pci_mmcfg_lock); 707 cfg = pci_mmconfig_lookup(seg, start); 708 if (cfg) { 709 if (cfg-end_bus end) 710 dev_info(dev, FW_INFO 711 MMCONFIG for 712 domain %04x [bus %02x-%02x] 713 only partially covers this bridge\n, 714 cfg-segment, cfg-start_bus, cfg-end_bus); 715 mutex_unlock(pci_mmcfg_lock); 716 return -EEXIST; 717 } And given the code path reads: pci_acpi_scan_root() setup_mcfg_map() pci_mmconfig_insert() this causes setup_mcfg_map() to fail and go down the check_segment() error path: 171 static int setup_mcfg_map(struct pci_root_info *info, u16 seg, u8 start, 172 u8 end, phys_addr_t addr) 173 { ... 188 result = pci_mmconfig_insert(dev, seg, start, end, addr); 189 if (result == 0) { ... 194 } else if (result != -EEXIST) 195 return check_segment(seg, dev, 196 fail to add MMCONFIG information,); 197 198 return 0; 199 } and this finally causes
Re: [PATCH TRIVIAL] kexec: Typo s/the/then/
On Sun, Sep 15, 2013 at 10:36 Geert Uytterhoeven wrote: | Signed-off-by: Geert Uytterhoeven | --- | kernel/kexec.c |2 +- | 1 file changed, 1 insertion(+), 1 deletion(-) | | diff --git a/kernel/kexec.c b/kernel/kexec.c | index 2a74f30..490afc0 100644 | --- a/kernel/kexec.c | +++ b/kernel/kexec.c | @@ -921,7 +921,7 @@ static int kimage_load_segment(struct kimage *image, | * reinitialize them. | * | * - A machine specific part that includes the syscall number | - * and the copies the image to it's final destination. And | + * and then copies the image to it's final destination. And | * jumps into the image at entry. Well, if you're fixing typos, might as well get rid of the wrong "it's" and turn into "its". Cheers, Hedi. -- Be careful of reading health books, you might die of a misprint. -- Mark Twain -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH TRIVIAL] kexec: Typo s/the/then/
On Sun, Sep 15, 2013 at 10:36 Geert Uytterhoeven wrote: | Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org | --- | kernel/kexec.c |2 +- | 1 file changed, 1 insertion(+), 1 deletion(-) | | diff --git a/kernel/kexec.c b/kernel/kexec.c | index 2a74f30..490afc0 100644 | --- a/kernel/kexec.c | +++ b/kernel/kexec.c | @@ -921,7 +921,7 @@ static int kimage_load_segment(struct kimage *image, | * reinitialize them. | * | * - A machine specific part that includes the syscall number | - * and the copies the image to it's final destination. And | + * and then copies the image to it's final destination. And | * jumps into the image at entry. Well, if you're fixing typos, might as well get rid of the wrong it's and turn into its. Cheers, Hedi. -- Be careful of reading health books, you might die of a misprint. -- Mark Twain -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
On Thu, Sep 12, 2013 at 19:59 Mike Travis wrote: | On 9/12/2013 10:27 AM, Paul E. McKenney wrote: | | > But what is it that you are looking for? If you want to silence it | > completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what | > you want to use. | | We have by default rcutree.rcu_cpu_stall_suppress=1 on the kernel | cmdline. I'll double check if it was set during my testing. FWIW, for recent enough kernels the correct boot parameter is rcupdate.rcu_cpu_stall_suppress. It used to be rcutree.rcu_cpu_stall_suppress, but that has changed after commit 6bfc09e. Cheers, Hedi. -- Be careful of reading health books, you might die of a misprint. -- Mark Twain -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
On Thu, Sep 12, 2013 at 19:59 Mike Travis wrote: | On 9/12/2013 10:27 AM, Paul E. McKenney wrote: | | But what is it that you are looking for? If you want to silence it | completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what | you want to use. | | We have by default rcutree.rcu_cpu_stall_suppress=1 on the kernel | cmdline. I'll double check if it was set during my testing. FWIW, for recent enough kernels the correct boot parameter is rcupdate.rcu_cpu_stall_suppress. It used to be rcutree.rcu_cpu_stall_suppress, but that has changed after commit 6bfc09e. Cheers, Hedi. -- Be careful of reading health books, you might die of a misprint. -- Mark Twain -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/