On 25.07.2022 20:54, Andrew Cooper wrote:
> On 25/07/2022 14:10, Jan Beulich wrote:
>> Quite obviously to determine the split condition successive pages'
>> attributes need to be evaluated, not always those of the initial page.
>>
>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap 
>> init")
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>> ---
>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
>> Split init_heap_pages() in two"), but there it was still benign.
> 
> This also fixes the crash that XenRT found on loads of hardware, which
> looks something like:
> 
> (XEN) NUMA: Allocated memnodemap from 105bc81000 - 105bc92000
> (XEN) NUMA: Using 8 for the hash shift.
> (XEN) Early fatal page fault at e008:ffff82d04022ae1e
> (cr2=00000000000000b8, ec=0002)
> (XEN) ----[ Xen-4.17.0  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d04022ae1e>]
> common/page_alloc.c#free_heap_pages+0x2dd/0x850
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04022ae1e>] R
> common/page_alloc.c#free_heap_pages+0x2dd/0x850
> (XEN)    [<ffff82d04022dd64>] F
> common/page_alloc.c#init_heap_pages+0x55f/0x720
> (XEN)    [<ffff82d040415234>] F end_boot_allocator+0x187/0x1e7
> (XEN)    [<ffff82d040452337>] F __start_xen+0x1a06/0x2779
> (XEN)    [<ffff82d040204344>] F __high_start+0x94/0xa0
> 
> Debugging shows that it's always a block which crosses node 0 and 1,
> where avail[1] has yet to be initialised.
> 
> What I'm confused by is how this manages to manifest broken swiotlb
> issues without Xen crashing.

I didn't debug this in detail since I had managed to spot the issue
by staring at the offending patch, but from the observations some
of node 1's memory was actually accounted to node 0 (incl off-by-
65535 node_need_scrub[] values for both nodes), so I would guess
avail[1] simply wasn't accessed before being set up in my case.

Jan

Reply via email to