** Description changed: == SRU Justification, ARTFUL == Bug fix #1747069 causes an issue for NVIDIA drivers on ppc64el platforms. According to Will Davis at NVIDIA: "- The original patch 3d79a728f9b2e6ddcce4e02c91c4de1076548a4c changed the call to arch_add_memory in mm/memory_hotplug.c to call with the boolean argument set to true instead of false, and inverted the semantics of that argument in the arch layers. - The revert patch 4fe85d5a7c50f003fe4863a1a87f5d8cc121c75c reverted the semantic change in the arch layers, but didn't revert the change to the arch_add_memory call in mm/memory_hotplug.c" And also: "It looks like the problem here is that the online_type is _MOVABLE but can_online_high_movable(nid=255) is returning false: - if ((zone_idx(zone) > ZONE_NORMAL || - online_type == MMOP_ONLINE_MOVABLE) && - !can_online_high_movable(pfn_to_nid(pfn))) + if ((zone_idx(zone) > ZONE_NORMAL || + online_type == MMOP_ONLINE_MOVABLE) && + !can_online_high_movable(pfn_to_nid(pfn))) This check was removed by upstream commit 57c0a17238e22395428248c53f8e390c051c88b8, and I've verified that if I apply that commit (partially) to the 4.13.0-37.42 tree along with the previous arch_add_memory patch to make the probe work, I can fully online the GPU device memory as expected. Commit 57c0a172.. implies that the can_online_high_movable() checks weren't useful anyway, so in addition to the arch_add_memory fix, does it make sense to revert the pieces of 4fe85d5a7c50f003fe4863a1a87f5d8cc121c75c that added back the can_online_high_movable() check?" == Fix == Fix partial backport from bug #1747069, remove can_online_high_movable and fix the incorrectly set boolean argument to arch_add_memory(). + == Testing == + + run ADT memory hotplug test, should not regress this. Without the fix, + the nvidia driver on powerpc will not load because it cannot map memory + for the device. With the fix it loads. + == Regression Potential == This fixes a regression in the original fix and hence the regression potential is the same as the previously SRU'd bug fix for #1747069, namely: "Reverting this commit does remove some functionality, however this does not regress the kernel compared to previous releases and having a working reliable memory hotplug is the preferred option. This fix does touch some memory hotplug, so there is a risk that this may break this functionality that is not covered by the kernel regression testing."
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1761104 Title: fix regression in mm/hotplug, allows NVIDIA driver to work To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1761104/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
