Re: [PATCH v1 0/3] mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE
afaict we're in decent state to move this series into mm-stable. I've tagged the following issues: https://lkml.kernel.org/r/80532f73e52e2c21fdc9aac7bce24aefb76d11b0.ca...@linux.intel.com https://lkml.kernel.org/r/30b5d493-b7c2-4e63-86c1-dcc73d21d...@redhat.com Have these been addressed and are we ready to send this series into the world? Thanks.
Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()
On Tue, 11 Jun 2024 11:42:56 +0200 David Hildenbrand wrote: > > We'll leave the ZONE_DEVICE case alone for now. > > > > @Andrew, can we add here: > > "Note that self-hosted vmemmap pages will no longer be marked as > reserved. This matches ordinary vmemmap pages allocated from the buddy > during memory hotplug. Now, really only vmemmap pages allocated from > memblock during early boot will be marked reserved. Existing > PageReserved() checks seem to be handling all relevant cases correctly > even after this change." Done, thanks.
Re: [PATCH v1 1/3] mm: pass meminit_context to __free_pages_core()
On Tue, 11 Jun 2024 12:06:56 +0200 David Hildenbrand wrote: > On 07.06.24 11:09, David Hildenbrand wrote: > > In preparation for further changes, let's teach __free_pages_core() > > about the differences of memory hotplug handling. > > > > Move the memory hotplug specific handling from generic_online_page() to > > __free_pages_core(), use adjust_managed_page_count() on the memory > > hotplug path, and spell out why memory freed via memblock > > cannot currently use adjust_managed_page_count(). > > > > Signed-off-by: David Hildenbrand > > --- > > @Andrew, can you squash the following? Sure. I queued it against "mm: pass meminit_context to __free_pages_core()", not against > Subject: [PATCH] fixup: mm/highmem: make nr_free_highpages() return "unsigned > long"
Re: [PATCH v2] mm, page_alloc: fix build_zonerefs_node()
On Thu, 7 Apr 2022 14:06:37 +0200 Juergen Gross wrote: > Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from > zones with pages managed by the buddy allocator") Six years ago! > only zones with free > memory are included in a built zonelist. This is problematic when e.g. > all memory of a zone has been ballooned out when zonelists are being > rebuilt. > > The decision whether to rebuild the zonelists when onlining new memory > is done based on populated_zone() returning 0 for the zone the memory > will be added to. The new zone is added to the zonelists only, if it > has free memory pages (managed_zone() returns a non-zero value) after > the memory has been onlined. This implies, that onlining memory will > always free the added pages to the allocator immediately, but this is > not true in all cases: when e.g. running as a Xen guest the onlined > new memory will be added only to the ballooned memory list, it will be > freed only when the guest is being ballooned up afterwards. > > Another problem with using managed_zone() for the decision whether a > zone is being added to the zonelists is, that a zone with all memory > used will in fact be removed from all zonelists in case the zonelists > happen to be rebuilt. > > Use populated_zone() when building a zonelist as it has been done > before that commit. > > Cc: sta...@vger.kernel.org Some details, please. Is this really serious enough to warrant backporting? Is some new workload/usage pattern causing people to hit this?
Re: remove alloc_vm_area v2
On Thu, 24 Sep 2020 15:58:42 +0200 Christoph Hellwig wrote: > this series removes alloc_vm_area, which was left over from the big > vmalloc interface rework. It is a rather arkane interface, basicaly > the equivalent of get_vm_area + actually faulting in all PTEs in > the allocated area. It was originally addeds for Xen (which isn't > modular to start with), and then grew users in zsmalloc and i915 > which seems to mostly qualify as abuses of the interface, especially > for i915 as a random driver should not set up PTE bits directly. > > Note that the i915 patches apply to the drm-tip branch of the drm-tip > tree, as that tree has recent conflicting commits in the same area. Is the drm-tip material in linux-next yet? I'm still seeing a non-trivial reject in there at present.
Re: [PATCH v4 1/2] memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
On Tue, 11 Aug 2020 11:44:46 +0200 Roger Pau Monne wrote: > This is in preparation for the logic behind MEMORY_DEVICE_DEVDAX also > being used by non DAX devices. Acked-by: Andrew Morton . Please add it to the Xen tree when appropriate. (I'm not sure what David means by "separate type", but we can do that later if desired. Dan is taking a taking a bit of downtime).
Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand wrote: > > > > Why does the firmware map support hotplug entries? > > I assume: > > The firmware memmap was added primarily for x86-64 kexec (and still, is > mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs > get hotplugged on real HW, they get added to e820. Same applies to > memory added via HyperV balloon (unless memory is unplugged via > ballooning and you reboot ... the the e820 is changed as well). I assume > we wanted to be able to reflect that, to make kexec look like a real reboot. > > This worked for a while. Then came dax/kmem. Now comes virtio-mem. > > > But I assume only Andrew can enlighten us. > > @Andrew, any guidance here? Should we really add all memory to the > firmware memmap, even if this contradicts with the existing > documentation? (especially, if the actual firmware memmap will *not* > contain that memory after a reboot) For some reason that patch is misattributed - it was authored by Shaohui Zheng , who hasn't been heard from in a decade. I looked through the email discussion from that time and I'm not seeing anything useful. But I wasn't able to locate Dave Hansen's review comments.
Re: [Xen-devel] [PATCH v2 0/8] mm/kdump: allow to exclude pages that are logically offline
On Wed, 27 Feb 2019 13:32:14 +0800 Dave Young wrote: > This series have been in -next for some days, could we get this in > mainline? It's been in -next for two months? > Andrew, do you have plan about them, maybe next release? They're all reviewed except for "xen/balloon: mark inflated pages PG_offline". (https://ozlabs.org/~akpm/mmotm/broken-out/xen-balloon-mark-inflated-pages-pg_offline.patch). Yes, I plan on sending these to Linus during the merge window for 5.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] memory_hotplug: Free pages as higher order
On Mon, 05 Nov 2018 15:12:27 +0530 Arun KS wrote: > On 2018-10-22 16:03, Arun KS wrote: > > On 2018-10-19 13:37, Michal Hocko wrote: > >> On Thu 18-10-18 19:18:25, Andrew Morton wrote: > >> [...] > >>> So this patch needs more work, yes? > >> > >> Yes, I've talked to Arun (he is offline until next week) offlist and > >> he > >> will play with this some more. > > > > Converted totalhigh_pages, totalram_pages and zone->managed_page to > > atomic and tested hot add. Latency is not effected with this change. > > Will send out a separate patch on top of this one. > Hello Andrew/Michal, > > Will this be going in subsequent -rcs? I thought were awaiting a new version? "Will send out a separate patch on top of this one"? I do think a resend would be useful, please. Ensure the changelog is updated to capture the above info and any other worthy issues which arose during review. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] memory_hotplug: Free pages as higher order
On Thu, 11 Oct 2018 09:55:03 +0200 Michal Hocko wrote: > > > > > This is now not called anymore, although the xen/hv variants still do > > > > > it. The function seems empty these days, maybe remove it as a followup > > > > > cleanup? > > > > > > > > > > > - __online_page_increment_counters(page); > > > > > > - __online_page_free(page); > > > > > > + __free_pages_core(page, order); > > > > > > + totalram_pages += (1UL << order); > > > > > > +#ifdef CONFIG_HIGHMEM > > > > > > + if (PageHighMem(page)) > > > > > > + totalhigh_pages += (1UL << order); > > > > > > +#endif > > > > > > > > > > __online_page_increment_counters() would have used > > > > > adjust_managed_page_count() which would do the changes under > > > > > managed_page_count_lock. Are we safe without the lock? If yes, there > > > > > should perhaps be a comment explaining why. > > > > > > > > Looks unsafe without managed_page_count_lock. > > > > > > Why does it matter actually? We cannot online/offline memory in > > > parallel. This is not the case for the boot where we initialize memory > > > in parallel on multiple nodes. So this seems to be safe currently unless > > > I am missing something. A comment explaining that would be helpful > > > though. > > > > Other main callers of adjust_manage_page_count(), > > > > static inline void free_reserved_page(struct page *page) > > { > > __free_reserved_page(page); > > adjust_managed_page_count(page, 1); > > } > > > > static inline void mark_page_reserved(struct page *page) > > { > > SetPageReserved(page); > > adjust_managed_page_count(page, -1); > > } > > > > Won't they race with memory hotplug? > > > > Few more, > > ./drivers/xen/balloon.c:519:adjust_managed_page_count(page, -1); > > ./drivers/virtio/virtio_balloon.c:175: adjust_managed_page_count(page, -1); > > ./drivers/virtio/virtio_balloon.c:196: adjust_managed_page_count(page, 1); > > ./mm/hugetlb.c:2158:adjust_managed_page_count(page, 1 << > > h->order); > > They can, and I have missed those. So this patch needs more work, yes? ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Tue, 24 Jul 2018 16:17:47 +0200 Michal Hocko wrote: > On Fri 20-07-18 17:09:02, Andrew Morton wrote: > [...] > > - Undocumented return value. > > > > - comment "failed to reap part..." is misleading - sounds like it's > > referring to something which happened in the past, is in fact > > referring to something which might happen in the future. > > > > - fails to call trace_finish_task_reaping() in one case > > > > - code duplication. > > > > - Increases mmap_sem hold time a little by moving > > trace_finish_task_reaping() inside the locked region. So sue me ;) > > > > - Sharing the finish: path means that the trace event won't > > distinguish between the two sources of finishing. > > > > Please take a look? > > oom_reap_task_mm should return false when __oom_reap_task_mm return > false. This is what my patch did but it seems this changed by > http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-oom-remove-oom_lock-from-oom_reaper.patch > so that one should be fixed. > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 104ef4a01a55..88657e018714 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -565,7 +565,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, > struct mm_struct *mm) > /* failed to reap part of the address space. Try again later */ > if (!__oom_reap_task_mm(mm)) { > up_read(&mm->mmap_sem); > - return true; > + return false; > } > > pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, > file-rss:%lukB, shmem-rss:%lukB\n", OK, thanks, I added that. > > On top of that the proposed cleanup looks as follows: > Looks good to me. Seems a bit strange that we omit the pr_info() output if the mm was partially reaped - people would still want to know this? Not very important though. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Mon, 16 Jul 2018 13:50:58 +0200 Michal Hocko wrote: > From: Michal Hocko > > There are several blockable mmu notifiers which might sleep in > mmu_notifier_invalidate_range_start and that is a problem for the > oom_reaper because it needs to guarantee a forward progress so it cannot > depend on any sleepable locks. > > ... > > @@ -571,7 +565,12 @@ static bool oom_reap_task_mm(struct task_struct *tsk, > struct mm_struct *mm) > > trace_start_task_reaping(tsk->pid); > > - __oom_reap_task_mm(mm); > + /* failed to reap part of the address space. Try again later */ > + if (!__oom_reap_task_mm(mm)) { > + up_read(&mm->mmap_sem); > + ret = false; > + goto unlock_oom; > + } This function is starting to look a bit screwy. : static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) : { : if (!down_read_trylock(&mm->mmap_sem)) { : trace_skip_task_reaping(tsk->pid); : return false; : } : : /* :* MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't :* work on the mm anymore. The check for MMF_OOM_SKIP must run :* under mmap_sem for reading because it serializes against the :* down_write();up_write() cycle in exit_mmap(). :*/ : if (test_bit(MMF_OOM_SKIP, &mm->flags)) { : up_read(&mm->mmap_sem); : trace_skip_task_reaping(tsk->pid); : return true; : } : : trace_start_task_reaping(tsk->pid); : : /* failed to reap part of the address space. Try again later */ : if (!__oom_reap_task_mm(mm)) { : up_read(&mm->mmap_sem); : return true; : } : : pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", : task_pid_nr(tsk), tsk->comm, : K(get_mm_counter(mm, MM_ANONPAGES)), : K(get_mm_counter(mm, MM_FILEPAGES)), : K(get_mm_counter(mm, MM_SHMEMPAGES))); : up_read(&mm->mmap_sem); : : trace_finish_task_reaping(tsk->pid); : return true; : } - Undocumented return value. - comment "failed to reap part..." is misleading - sounds like it's referring to something which happened in the past, is in fact referring to something which might happen in the future. - fails to call trace_finish_task_reaping() in one case - code duplication. I'm thinking it wants to be something like this? : /* : * Return true if we successfully acquired (then released) mmap_sem : */ : static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) : { : if (!down_read_trylock(&mm->mmap_sem)) { : trace_skip_task_reaping(tsk->pid); : return false; : } : : /* :* MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't :* work on the mm anymore. The check for MMF_OOM_SKIP must run :* under mmap_sem for reading because it serializes against the :* down_write();up_write() cycle in exit_mmap(). :*/ : if (test_bit(MMF_OOM_SKIP, &mm->flags)) { : trace_skip_task_reaping(tsk->pid); : goto out; : } : : trace_start_task_reaping(tsk->pid); : : if (!__oom_reap_task_mm(mm)) { : /* Failed to reap part of the address space. Try again later */ : goto finish; : } : : pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", : task_pid_nr(tsk), tsk->comm, : K(get_mm_counter(mm, MM_ANONPAGES)), : K(get_mm_counter(mm, MM_FILEPAGES)), : K(get_mm_counter(mm, MM_SHMEMPAGES))); : finish: : trace_finish_task_reaping(tsk->pid); : out: : up_read(&mm->mmap_sem); : return true; : } - Increases mmap_sem hold time a little by moving trace_finish_task_reaping() inside the locked region. So sue me ;) - Sharing the finish: path means that the trace event won't distinguish between the two sources of finishing. Please take a look? ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Tue, 17 Jul 2018 10:12:01 +0200 Michal Hocko wrote: > > Any suggestions regarding how the driver developers can test this code > > path? I don't think we presently have a way to fake an oom-killing > > event? Perhaps we should add such a thing, given the problems we're > > having with that feature. > > The simplest way is to wrap an userspace code which uses these notifiers > into a memcg and set the hard limit to hit the oom. This can be done > e.g. after the test faults in all the mmu notifier managed memory and > set the hard limit to something really small. Then we are looking for a > proper process tear down. Chances are, some of the intended audience don't know how to do this and will either have to hunt down a lot of documentation or will just not test it. But we want them to test it, so a little worked step-by-step example would help things along please. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Mon, 16 Jul 2018 13:50:58 +0200 Michal Hocko wrote: > From: Michal Hocko > > There are several blockable mmu notifiers which might sleep in > mmu_notifier_invalidate_range_start and that is a problem for the > oom_reaper because it needs to guarantee a forward progress so it cannot > depend on any sleepable locks. > > Currently we simply back off and mark an oom victim with blockable mmu > notifiers as done after a short sleep. That can result in selecting a > new oom victim prematurely because the previous one still hasn't torn > its memory down yet. > > We can do much better though. Even if mmu notifiers use sleepable locks > there is no reason to automatically assume those locks are held. > Moreover majority of notifiers only care about a portion of the address > space and there is absolutely zero reason to fail when we are unmapping an > unrelated range. Many notifiers do really block and wait for HW which is > harder to handle and we have to bail out though. > > This patch handles the low hanging fruid. > __mmu_notifier_invalidate_range_start > gets a blockable flag and callbacks are not allowed to sleep if the > flag is set to false. This is achieved by using trylock instead of the > sleepable lock for most callbacks and continue as long as we do not > block down the call chain. I assume device driver developers are wondering "what does this mean for me". As I understand it, the only time they will see blockable==false is when their driver is being called in response to an out-of-memory condition, yes? So it is a very rare thing. Any suggestions regarding how the driver developers can test this code path? I don't think we presently have a way to fake an oom-killing event? Perhaps we should add such a thing, given the problems we're having with that feature. > I think we can improve that even further because there is a common > pattern to do a range lookup first and then do something about that. > The first part can be done without a sleeping lock in most cases AFAICS. > > The oom_reaper end then simply retries if there is at least one notifier > which couldn't make any progress in !blockable mode. A retry loop is > already implemented to wait for the mmap_sem and this is basically the > same thing. > > ... > > +static inline int mmu_notifier_invalidate_range_start_nonblock(struct > mm_struct *mm, > + unsigned long start, unsigned long end) > +{ > + int ret = 0; > + if (mm_has_notifiers(mm)) > + ret = __mmu_notifier_invalidate_range_start(mm, start, end, > false); > + > + return ret; > } nit, { if (mm_has_notifiers(mm)) return __mmu_notifier_invalidate_range_start(mm, start, end, false); return 0; } would suffice. > > ... > > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3074,7 +3074,7 @@ void exit_mmap(struct mm_struct *mm) >* reliably test it. >*/ > mutex_lock(&oom_lock); > - __oom_reap_task_mm(mm); > + (void)__oom_reap_task_mm(mm); > mutex_unlock(&oom_lock); What does this do? > set_bit(MMF_OOM_SKIP, &mm->flags); > > ... > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] mm: don't defer struct page initialization for Xen pv guests
On Mon, 19 Feb 2018 02:45:27 +0800 kbuild test robot wrote: > [auto build test ERROR on mmotm/master] > [also build test ERROR on v4.16-rc1 next-20180216] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Juergen-Gross/mm-don-t-defer-struct-page-initialization-for-Xen-pv-guests/20180218-233657 > base: git://git.cmpxchg.org/linux-mmotm.git master > config: i386-randconfig-x010-201807 (attached as .config) > compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 > > All errors (new ones prefixed by >>): > >mm/page_alloc.c: In function 'update_defer_init': > >> mm/page_alloc.c:352:6: error: implicit declaration of function > >> 'xen_pv_domain' [-Werror=implicit-function-declaration] > if (xen_pv_domain()) > ^ I think I already fixed this. From: Andrew Morton Subject: mm-dont-defer-struct-page-initialization-for-xen-pv-guests-fix explicitly include xen.h Cc: Juergen Gross Signed-off-by: Andrew Morton --- mm/page_alloc.c |1 + 1 file changed, 1 insertion(+) diff -puN mm/page_alloc.c~mm-dont-defer-struct-page-initialization-for-xen-pv-guests-fix mm/page_alloc.c --- a/mm/page_alloc.c~mm-dont-defer-struct-page-initialization-for-xen-pv-guests-fix +++ a/mm/page_alloc.c @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include _ ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross wrote: > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat, > /* Always populate low zones for address-constrained allocations */ > if (zone_end < pgdat_end_pfn(pgdat)) > return true; > + /* Xen PV domains need page structures early */ > + if (xen_pv_domain()) > + return true; I'll do this: --- a/mm/page_alloc.c~mm-dont-defer-struct-page-initialization-for-xen-pv-guests-fix +++ a/mm/page_alloc.c @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include So we're not relying on dumb luck ;) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross wrote: > Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing > memory during allocation in vmemmap") broke Xen pv domains in some > configurations, as the "Pinned" information in struct page of early > page tables could get lost. This will lead to the kernel trying to > write directly into the page tables instead of asking the hypervisor > to do so. The result is a crash like the following: Let's cc Pavel, who authored f7f99100d8d95d. > [0.004000] BUG: unable to handle kernel paging request at 8801ead19008 > [0.004000] IP: xen_set_pud+0x4e/0xd0 > [0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE > 8011ead19065 > [0.004000] Oops: 0003 [#1] PREEMPT SMP > [0.004000] Modules linked in: > [0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271 > [0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 > 06/26/2014 > [0.004000] task: 81c10480 task.stack: 81c0 > [0.004000] RIP: e030:xen_set_pud+0x4e/0xd0 > [0.004000] RSP: e02b:81c03cd8 EFLAGS: 00010246 > [0.004000] RAX: 00280800 RBX: 88020fd31000 RCX: > > [0.004000] RDX: ea00 RSI: 0001b8308067 RDI: > 8801ead19008 > [0.004000] RBP: 8801ead19008 R08: R09: > 063f4c80 > [0.004000] R10: R11: 0720072007200720 R12: > 0001b8308067 > [0.004000] R13: 81c8a9cc R14: 88018fd31000 R15: > 77ff8000 > [0.004000] FS: () GS:88020f60() > knlGS: > [0.004000] CS: e033 DS: ES: CR0: 80050033 > [0.004000] CR2: 8801ead19008 CR3: 01c09000 CR4: > 00042660 > [0.004000] Call Trace: > [0.004000] __pmd_alloc+0x128/0x140 > [0.004000] ? acpi_os_map_iomem+0x175/0x1b0 > [0.004000] ioremap_page_range+0x3f4/0x410 > [0.004000] ? acpi_os_map_iomem+0x175/0x1b0 > [0.004000] __ioremap_caller+0x1c3/0x2e0 > [0.004000] acpi_os_map_iomem+0x175/0x1b0 > [0.004000] acpi_tb_acquire_table+0x39/0x66 > [0.004000] acpi_tb_validate_table+0x44/0x7c > [0.004000] acpi_tb_verify_temp_table+0x45/0x304 > [0.004000] ? acpi_ut_acquire_mutex+0x12a/0x1c2 > [0.004000] acpi_reallocate_root_table+0x12d/0x141 > [0.004000] acpi_early_init+0x4d/0x10a > [0.004000] start_kernel+0x3eb/0x4a1 > [0.004000] ? set_init_arg+0x55/0x55 > [0.004000] xen_start_kernel+0x528/0x532 > [0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 > 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> > 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3 > [0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: 81c03cd8 > [0.004000] CR2: 8801ead19008 > [0.004000] ---[ end trace 38eca2e56f1b642e ]--- > > Avoid this problem by not deferring struct page initialization when > running as Xen pv guest. > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat, > /* Always populate low zones for address-constrained allocations */ > if (zone_end < pgdat_end_pfn(pgdat)) > return true; > + /* Xen PV domains need page structures early */ > + if (xen_pv_domain()) > + return true; > (*nr_initialised)++; > if ((*nr_initialised > pgdat->static_init_pgcnt) && > (pfn & (PAGES_PER_SECTION - 1)) == 0) { I'm OK with applying the patch as a short-term regression fix but I do wonder whether it's the correct fix. What is special about Xen (in some configurations!) that causes it to find a hole in deferred initialization? I'd like us to delve further please. Because if Xen found a hole in the implementation, others might do so. Or perhaps Xen is doing something naughty. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel