Public bug reported:

Description:   s390/pci: Fix stale function handles in error handling

Symptom:       
In some error scenarios automatic recovery may ultimately fail after Linux 
initially recovered successfully when it then tries to handle another error 
event.

Problem:       
In some error scenarios multiple error events may be generated for the same PCI 
function before Linux even started its automatic recovery process. In this case 
Linux may succeed to recover based on the first event but then fails recovery 
when handling a subsequent event. This is because events capture the function 
handle as they are created. At the time when the secondary event is handled the 
handle stored with the error event is then stale and using it to reset the 
function will fail.

Solution:      
Fix this by retrieving a fresh function handle using the CLP List PCI Functions 
and only process events where the stored handle matches this handle. This 
effectively ignores error events which were captured before the latest 
disable/enable cycles. Relatedly if the current handle is already disabled do 
not attempt to simply reset the error state as a re-enable is necessary and 
clearing the error state would fail.

Reproduction:  
This may be reproduced in an artificial error scenario by issuing multiple 
zpcictl --reset-fw <dev> in quick succession generating multiple PEC 0x3A 
events with the same handle.

Required Fixes / Upstream-IDs:   
45537926dd2aaa9190ac0fac5a0fbeefcadfea95
b97a7972b1f4f81417840b9a2ab0c19722b577d5

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Skipper Bug Screeners (skipper-screen-team)
         Status: New


** Tags: architecture-s39064 bugnameltc-214779 severity-high 
targetmilestone-inin---

** Tags added: architecture-s39064 bugnameltc-214779 severity-high
targetmilestone-inin---

** Changed in: ubuntu
     Assignee: (unassigned) => Skipper Bug Screeners (skipper-screen-team)

** Package changed: ubuntu => linux (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2121149

Title:
  [UBUNTU 24.04] s390/pci: Fix stale function handles in error handling

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2121149/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to