Kernel SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2020-July/thread.html#112154
Updating status to 'In Progress'.
** Changed in: linux (Ubuntu Focal)
Status: New => In Progress
** Changed in: ubuntu-z-systems
Status: Triaged => In Progress
** Description changed:
+ SRU Justification:
+ ==================
+
+ [Impact]
+
+ * Linux kernel panics due to kernel page fault in IRQ context when
+ running zfcp_erp_timeout_handler() calling zfcp_erp_notify().
+
+ [Fix]
+
+ * 936e6b85da0476dd2edac7c51c68072da9fb4ba2 936e6b85da04 "scsi: zfcp: Fix
+ panic on ERP timeout for previously dismissed ERP action"
+
+ [Test Case]
+
+ * Requires an IBM z13/z13s or LinuxONE Rockhopper/Emperor system (or
+ newer) connected to zfcp capcble storage sub-system.
+
+ * Initiate an (ERP) timeout (maybe by injection or by causing a slow
+ recovery otherwise).
+
+ * Monitor the system log for any kernel panics.
+
+ [Regression Potential]
+
+ * The regression can be considered as medium since the modification is
+ platform specific / limited to s390x and again limited to the zfcp
+ layer.
+
+ * Within zfcp it's further limited to the error recovery procedure (ERP)
+ of fcp and only touches zfcp_erp.c, means the code path is mainly active
+ under error conditions.
+
+ [Other]
+
+ * The above fix is upstream accepted with v5.8-rc3, hence will make it's
+ way to groovy with kernel 5.8.
+
+ * Therefore this SRU request was submitted for bionic and focal only and
+ not for groovy.
+
+ __________
+
Description: zfcp: Fix panic on ERP timeout for previously dismissed ERP
Symptom: Linux kernel panic due to kernel page fault in IRQ context
- when running zfcp_erp_timeout_handler() calling
- zfcp_erp_notify().
+ when running zfcp_erp_timeout_handler() calling
+ zfcp_erp_notify().
Problem: Suppose that, for unrelated reasons, FSF requests on behalf
- of recovery are very slow and can run into the ERP timeout.
- In the case at hand, we did adapter recovery to a large
- degree. However due to the slowness a LUN open is pending so
- the corresponding fc_rport remains blocked. After
- fast_io_fail_tmo we trigger close physical port recovery for
- the port under which the LUN should have been opened. The
- new higher order port recovery dismisses the pending LUN
- open ERP action and dismisses the pending LUN open FSF
- request. Such dismissal decouples the ERP action from the
- pending corresponding FSF request by setting
- zfcp_fsf_req->erp_action to NULL (among other things)
- [zfcp_erp_strategy_check_fsfreq()].
- If now the ERP timeout for the pending open LUN request runs
- out, we must not use zfcp_fsf_req->erp_action in the ERP
- timeout handler. This is a problem since v4.15 commit
- 75492a51568b ("s390/scsi: Convert timers to use
- timer_setup()"). Before that we intentionally only passed
- zfcp_erp_action as context argument to
- zfcp_erp_timeout_handler().
- Note: The lifetime of the corresponding zfcp_fsf_req object
- continues until a (late) response or an (unrelated) adapter
- recovery.
+ of recovery are very slow and can run into the ERP timeout.
+ In the case at hand, we did adapter recovery to a large
+ degree. However due to the slowness a LUN open is pending so
+ the corresponding fc_rport remains blocked. After
+ fast_io_fail_tmo we trigger close physical port recovery for
+ the port under which the LUN should have been opened. The
+ new higher order port recovery dismisses the pending LUN
+ open ERP action and dismisses the pending LUN open FSF
+ request. Such dismissal decouples the ERP action from the
+ pending corresponding FSF request by setting
+ zfcp_fsf_req->erp_action to NULL (among other things)
+ [zfcp_erp_strategy_check_fsfreq()].
+ If now the ERP timeout for the pending open LUN request runs
+ out, we must not use zfcp_fsf_req->erp_action in the ERP
+ timeout handler. This is a problem since v4.15 commit
+ 75492a51568b ("s390/scsi: Convert timers to use
+ timer_setup()"). Before that we intentionally only passed
+ zfcp_erp_action as context argument to
+ zfcp_erp_timeout_handler().
+ Note: The lifetime of the corresponding zfcp_fsf_req object
+ continues until a (late) response or an (unrelated) adapter
+ recovery.
Solution: Just like the regular response path ignores dismissed
- requests [zfcp_fsf_req_complete() =>
- zfcp_fsf_protstatus_eval() => return early] the ERP timeout
- handler now needs to ignore dismissed requests. So simply
- return early in the ERP timeout handler if the FSF request
- is marked as dismissed in its status flags. To protect
- against the race where zfcp_erp_strategy_check_fsfreq()
- dismisses and sets zfcp_fsf_req->erp_action to NULL after
- our previous status flag check, return early if
- zfcp_fsf_req->erp_action is NULL. After all, the former ERP
- action does not need to be woken up as that was already done
- as part of the dismissal above [zfcp_erp_action_dismiss()].
+ requests [zfcp_fsf_req_complete() =>
+ zfcp_fsf_protstatus_eval() => return early] the ERP timeout
+ handler now needs to ignore dismissed requests. So simply
+ return early in the ERP timeout handler if the FSF request
+ is marked as dismissed in its status flags. To protect
+ against the race where zfcp_erp_strategy_check_fsfreq()
+ dismisses and sets zfcp_fsf_req->erp_action to NULL after
+ our previous status flag check, return early if
+ zfcp_fsf_req->erp_action is NULL. After all, the former ERP
+ action does not need to be woken up as that was already done
+ as part of the dismissal above [zfcp_erp_action_dismiss()].
Upstream-ID: 936e6b85da0476dd2edac7c51c68072da9fb4ba2 -> kernel 5.8
Will be integrated by kernel 5.8 by groovy.
Please check that this also be integrated into 20.04
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1887774
Title:
[UBUNTU 20.04] zfcp: Fix panic on ERP timeout for previously dismissed
ERP
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887774/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs