Hi,

When executing linux on arm64 with KPTI activated (in Dom0 or in a DomU), I 
have a lot of walk page table errors like this:
(XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0

After implementing a call trace, I found that the problem was coming from the 
update_runstate_area when linux has KPTI activated.

I have the following call trace:
(XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
(XEN) backtrace.c:29: Stacktrace start at 0x8007638efbb0 depth 10
(XEN)    [<000000000027780c>] get_page_from_gva+0x180/0x35c
(XEN)    [<00000000002700c8>] guestcopy.c#copy_guest+0x1b0/0x2e4
(XEN)    [<0000000000270228>] raw_copy_to_guest+0x2c/0x34
(XEN)    [<0000000000268dd0>] domain.c#update_runstate_area+0x90/0xc8
(XEN)    [<000000000026909c>] domain.c#schedule_tail+0x294/0x2d8
(XEN)    [<0000000000269524>] context_switch+0x58/0x70
(XEN)    [<00000000002479c4>] core.c#sched_context_switch+0x88/0x1e4
(XEN)    [<000000000024845c>] core.c#schedule+0x224/0x2ec
(XEN)    [<0000000000224018>] softirq.c#__do_softirq+0xe4/0x128
(XEN)    [<00000000002240d4>] do_softirq+0x14/0x1c

Discussing this subject with Stefano, he pointed me to a discussion started a 
year ago on this subject here:
https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03053.html

And a patch was submitted:
https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg02320.html

I rebased this patch on current master and it is solving the problem I have 
seen.

It sounds to me like a good solution to introduce a 
VCPUOP_register_runstate_phys_memory_area to not depend on the area actually 
being mapped in the guest when a context switch is being done (which is 
actually the problem happening when a context switch is trigger while a guest 
is running in EL0).

Is there any reason why this was not merged at the end ?

Thanks
Bertrand


Reply via email to