** Description changed:

  == SRU Justification, ARTFUL ==
  
  Bug fix #1747069 causes an issue for NVIDIA drivers on ppc64el
  platforms.  According to Will Davis at NVIDIA:
  
  "- The original patch 3d79a728f9b2e6ddcce4e02c91c4de1076548a4c changed
  the call to arch_add_memory in mm/memory_hotplug.c to call with the
  boolean argument set to true instead of false, and inverted the
  semantics of that argument in the arch layers.
  
  - The revert patch 4fe85d5a7c50f003fe4863a1a87f5d8cc121c75c reverted the
  semantic change in the arch layers, but didn't revert the change to the
  arch_add_memory call in mm/memory_hotplug.c"
  
  And also:
  
  "It looks like the problem here is that the online_type is _MOVABLE but
  can_online_high_movable(nid=255) is returning false:
  
-         if ((zone_idx(zone) > ZONE_NORMAL ||
-             online_type == MMOP_ONLINE_MOVABLE) &&
-             !can_online_high_movable(pfn_to_nid(pfn)))
+         if ((zone_idx(zone) > ZONE_NORMAL ||
+             online_type == MMOP_ONLINE_MOVABLE) &&
+             !can_online_high_movable(pfn_to_nid(pfn)))
  
  This check was removed by upstream commit
  57c0a17238e22395428248c53f8e390c051c88b8, and I've verified that if I apply
  that commit (partially) to the 4.13.0-37.42 tree along with the previous
  arch_add_memory patch to make the probe work, I can fully online the GPU 
device
  memory as expected.
  
  Commit 57c0a172.. implies that the can_online_high_movable() checks weren't
  useful anyway, so in addition to the arch_add_memory fix, does it make sense 
to
  revert the pieces of 4fe85d5a7c50f003fe4863a1a87f5d8cc121c75c that added back
  the can_online_high_movable() check?"
  
  == Fix ==
  
  Fix partial backport from bug #1747069, remove can_online_high_movable
  and fix the incorrectly set boolean argument to arch_add_memory().
  
+ == Testing ==
+ 
+ run ADT memory hotplug test, should not regress this. Without the fix,
+ the nvidia driver on powerpc will not load because it cannot map memory
+ for the device. With the fix it loads.
+ 
  == Regression Potential ==
  
  This fixes a regression in the original fix and hence the regression
  potential is the same as the previously SRU'd bug fix for #1747069,
  namely:
  
  "Reverting this commit does remove some functionality, however this does
  not regress the kernel compared to previous releases and having a
  working reliable memory hotplug is the preferred option. This fix does
  touch some memory hotplug, so there is a risk that this may break this
  functionality that is not covered by the kernel regression testing."

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1761104

Title:
  fix regression in mm/hotplug, allows NVIDIA driver to work

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1761104/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to