On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru
> On 04/16/2018 08:47 PM, George Dunlap wrote:
>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
>>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>>> Debugging continues.
>>> Finally, the attached patch seems to get the display unstuck in my
>>> scenario, although for one guest I get:
>>> (XEN) d2v0 Unexpected vmexit: reason 49
>>> (XEN) domain_crash called from vmx.c:4120
>>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>>> (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]----
>>> (XEN) CPU: 1
>>> (XEN) RIP: 0010:[<fffff96000842354>]
>>> (XEN) RFLAGS: 0000000000010246 CONTEXT: hvm guest (d2v0)
>>> (XEN) rax: fffff88003000000 rbx: fffff900c0083db0 rcx: 00000000aa55aa55
>>> (XEN) rdx: fffffa80041bdc41 rsi: fffff900c00c69a0 rdi: 0000000000000001
>>> (XEN) rbp: 0000000000000000 rsp: fffff88002ee9ef0 r8: fffffa80041bdc40
>>> (XEN) r9: fffff80001810e80 r10: fffffa800342aa70 r11: fffff88002ee9e80
>>> (XEN) r12: 0000000000000005 r13: 0000000000000001 r14: fffff900c00c08b0
>>> (XEN) r15: 0000000000000001 cr0: 0000000080050031 cr4: 00000000000406f8
>>> (XEN) cr3: 00000000ef771000 cr2: fffff900c00c8000
>>> (XEN) fsb: 00000000fffde000 gsb: fffff80001810d00 gss: 000007fffffdc000
>>> (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010
>>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>>> somebody more familiar with the code can point to a more elegant
>>> solution if one exists.
>> I think I have an idea what's going on, but it's complicated. :-)
>> Basically, the logdirty functionality isn't simple, and needs careful
>> thought on how to integrate it. I'll write some more tomorrow, and see
>> if I can come up with a solution.
> I think I know why this happens for the one guest - the other guests
> start at a certain resolution display-wise and stay that way until shutdown.
> This particular guest starts with a larger screen, then goes to roughly
> 2/3rds of it, then tries to go back to the initial larger one - at which
> point the above happens. I assume this corresponds to some pages being
> removed and/or added. I'll test this theory more tomorrow - if it's
> correct I should be able to reproduce the crash (with the patch) by
> simply resetting the screen resolution (increasing it).
The trick is that p2m_change_type doesn't actually iterate over the
entire p2m range, individually changing entries as it goes. Instead
it misconfigures the entries at the top-level, which causes the kinds
of faults shown above. As it gets faults for each entry, it checks
the current type, the logdirty ranges, and the global logdirty bit to
determine what the new types should be.
Your patch makes it so that all the altp2ms now get the
misconfiguration when the logdirty range is changed; but clearly
handling the misconfiguration isn't integrated properly with the
altp2m system yet. Doing it right may take some thought.
Xen-devel mailing list