** Description changed:

- --Problem Description---
+ [Impact]
+ This impacts the opal-prd userspace command from the skiboot package
+ The memory_error() hservice interface expects the memory_error() call to
+ just accept the offline request and return without actually offlining the
+ memory. Currently we will attempt to offline the marked pages before
+ returning to HBRT which can result in an excessively long time spent in the 
memory_error() hservice call which blocks HBRT from processing other
+ errors.
+ 
+ [Test Case]
+ Unfortunately due to the specific hardware requirement I wasn't able to 
reproduce this problem and provide a test case for it. However I was able to 
build this package into a ppa and got the IBM team to confirm this problem was 
resolved for groovy focal, bionic, xenial see comment #4 and #6
+ 
+ [What could go wrong]
+ Hopefully not much. The initial fix was prepared back in September and I 
would think regression could have been discovered by now
+ 
+ [Original Description]
  
  https://github.com/open-
  power/skiboot/commit/8cbd0de88d162e387f11569eee1bdecef8fad2e3
  
  opal-prd: Have a worker process handle page offlining
  
  The memory_error() hservice interface expects the memory_error() call to
  just accept the offline request and return without actually offlining the
  memory. Currently we will attempt to offline the marked pages before
  returning to HBRT which can result in an excessively long time spent in the
  memory_error() hservice call which blocks HBRT from processing other
  errors. Fix this by adding a worker process which performs the page
  offlining via the sysfs memory error interfaces.
  
  Reviewed-by: Vasant Hegde <[email protected]>
  Signed-off-by: Oliver O'Halloran <[email protected]>
  
  Thanks in advance for your support.
-  
- Machine Type = Power8 and Power9 OPAL systems 
-  
+ 
+ Machine Type = Power8 and Power9 OPAL systems
+ 
  ---Steps to Reproduce---
- * Inject memory error (UE) 
+ * Inject memory error (UE)
  * Verify that opal-prd doesn't return asynchronously to the platform after 
requesting the memory offlining operation
-  
- Userspace tool common name: opal-prd 
-  
+ 
+ Userspace tool common name: opal-prd
+ 
  We need this fix for 16.04.x and 18.04.x LTS releases.
  
  Fix also is needed for 20.04 and 20.10.

** Description changed:

  [Impact]
+ 
  This impacts the opal-prd userspace command from the skiboot package
  The memory_error() hservice interface expects the memory_error() call to
  just accept the offline request and return without actually offlining the
  memory. Currently we will attempt to offline the marked pages before
  returning to HBRT which can result in an excessively long time spent in the 
memory_error() hservice call which blocks HBRT from processing other
  errors.
  
  [Test Case]
- Unfortunately due to the specific hardware requirement I wasn't able to 
reproduce this problem and provide a test case for it. However I was able to 
build this package into a ppa and got the IBM team to confirm this problem was 
resolved for groovy focal, bionic, xenial see comment #4 and #6
+ 
+ Unfortunately due to the specific hardware requirement I wasn't able to
+ reproduce this problem and provide a test case for it. However I was
+ able to build this package into a ppa and got the IBM team to confirm
+ this problem was resolved for groovy focal, bionic, xenial see comment
+ #4 and #6
  
  [What could go wrong]
- Hopefully not much. The initial fix was prepared back in September and I 
would think regression could have been discovered by now
+ 
+ Hopefully not much. The initial fix was prepared back in September and I
+ would think regression could have been discovered by now.
+ 
  
  [Original Description]
  
  https://github.com/open-
  power/skiboot/commit/8cbd0de88d162e387f11569eee1bdecef8fad2e3
  
  opal-prd: Have a worker process handle page offlining
  
  The memory_error() hservice interface expects the memory_error() call to
  just accept the offline request and return without actually offlining the
  memory. Currently we will attempt to offline the marked pages before
  returning to HBRT which can result in an excessively long time spent in the
  memory_error() hservice call which blocks HBRT from processing other
  errors. Fix this by adding a worker process which performs the page
  offlining via the sysfs memory error interfaces.
  
  Reviewed-by: Vasant Hegde <[email protected]>
  Signed-off-by: Oliver O'Halloran <[email protected]>
  
  Thanks in advance for your support.
  
  Machine Type = Power8 and Power9 OPAL systems
  
  ---Steps to Reproduce---
  * Inject memory error (UE)
  * Verify that opal-prd doesn't return asynchronously to the platform after 
requesting the memory offlining operation
  
  Userspace tool common name: opal-prd
  
  We need this fix for 16.04.x and 18.04.x LTS releases.
  
  Fix also is needed for 20.04 and 20.10.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1904585

Title:
  opal-prd: Have a worker process handle page offlining (Fixes
  "PlatServices: dyndealloc memory_error() failed" is getting reported
  in error log (opal-prd))

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1904585/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to