Hi Jan,

> On 26 Jul 2022, at 11:29, Jan Beulich <jbeul...@suse.com> wrote:
> 
> On 26.07.2022 09:51, Bertrand Marquis wrote:
>> 
>> 
>>> On 25 Jul 2022, at 17:16, Jan Beulich <jbeul...@suse.com> wrote:
>>> 
>>> On 25.07.2022 17:51, Bertrand Marquis wrote:
>>>> On our CI we have randomly a crash during guest boot on x86.
>>> 
>>> Afaict of a PV guest.
>>> 
>>>> We are running on qemu x86_64 using Xen staging.
>>> 
>>> Which may introduce unusual timing. An issue never hit on actual hardware
>>> _may_ (but doesn't have to be) one in qemu itself.
>>> 
>>>> The crash is happening randomly (something like 1 out of 20 times).
>>>> 
>>>> This is always happening on the first guest we start, we never got it 
>>>> after first guest was successfully started.
>>>> 
>>>> Please tell me if you need any other info.
>>>> 
>>>> Here is the guest kernel log:
>>>> [...]
>>>> [ 6.679020] general protection fault, maybe for address 0x8800: 0000 [#1] 
>>>> PREEMPT SMP NOPTI
>>>> [ 6.679020] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.6 #1
>>>> [ 6.679020] RIP: e030:error_entry+0xaf/0xe0
>>>> [ 6.679020] Code: 29 89 c8 48 39 84 24 88 00 00 00 74 15 48 81 bc 24 88 00 
>>>> 00 00 63 10 e0 81 75 03 0f 01 f8 90 90 90 c3 48 89 8c 24 88 00 00 00 <0f> 
>>>> 01 f8 90 90 90 eb 11 0f 20 d8 90 90 90 90 90 48 25 ff e7 ff ff
>>> 
>>> This is SWAPGS, which supposedly a PV guest should never hit. Data further
>>> down suggests the kernel is still in the process of patching alternatives,
>>> which may be the reason for the insn to still be there (being at a point
>>> where exceptions are still unexpected).
>> 
>> So the exception path is using alternative code ? Sounds logic with the 
>> error output.
>> But does explain the original error.
> 
> SWAPGS sits pretty early on all kernel entry paths. If any instance of it
> is subject to alternatives patching, then prior to patching such paths
> may not be taken when running as PV guest under Xen.
> 
>>>> [ 6.679020] RSP: e02b:ffffffff82803a90 EFLAGS: 00000046
>>>> [ 6.679020] RAX: 0000000000008800 RBX: 0000000000000000 RCX: 
>>>> ffffffff81e00fa7
>>>> [ 6.679020] RDX: 0000000000000000 RSI: ffffffff81e009f8 RDI: 
>>>> 00000000000000eb
>>>> [ 6.679020] RBP: 0000000000000000 R08: 0000000000000000 R09: 
>>>> 0000000000000000
>>>> [ 6.679020] R10: 0000000000000000 R11: 0000000000000000 R12: 
>>>> 0000000000000000
>>>> [ 6.679020] R13: 0000000000000000 R14: 0000000000000000 R15: 
>>>> 0000000000000000
>>>> [ 6.679020] FS: 0000000000000000(0000) GS:ffff88801f200000(0000) 
>>>> knlGS:0000000000000000
>>>> [ 6.679020] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 6.679020] CR2: 0000000000000000 CR3: 000000000280c000 CR4: 
>>>> 0000000000050660
>>>> [ 6.679020] Call Trace:
>>>> [ 6.679020] <TASK>
>>>> [ 6.679020] RIP: e030:native_irq_return_iret+0x0/0x2
>>>> [ 6.679020] Code: 41 5d 41 5c 5d 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 
>>>> 48 83 c4 08 eb 0a 0f 1f 00 90 66 0f 1f 44 00 00 f6 44 24 20 04 75 02 <48> 
>>>> cf 57 0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff
>>>> [ 6.679020] RSP: e02b:ffffffff82803b48 EFLAGS: 00000046 ORIG_RAX: 
>>>> 000000000000e030
>>>> [ 6.679020] RAX: 0000000000008800 RBX: ffffffff82803be0 RCX: 
>>>> ffffffff81e00f95
>>>> [ 6.679020] RDX: ffffffff81e00f94 RSI: ffffffff81e00f95 RDI: 
>>>> 00000000000000eb
>>>> [ 6.679020] RBP: 00000000000000eb R08: 0000000090001f0f R09: 
>>>> 0000000000000007
>>>> [ 6.679020] R10: ffffffff81e00f94 R11: ffffffff8285a6c0 R12: 
>>>> 0000000000000000
>>>> [ 6.679020] R13: ffffffff81e00f94 R14: 0000000000000006 R15: 
>>>> 0000000000000006
>>>> [ 6.679020] ? asm_exc_general_protection+0x8/0x30
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1b/0x27
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1b/0x27
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1c/0x27
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1b/0x27
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1c/0x27
>>>> [ 6.679020] RIP: e030:insn_get_opcode.part.0+0xab/0x180
>>>> [ 6.679020] Code: 00 00 8b 43 4c a9 c0 07 08 00 0f 84 bf 00 00 00 c6 43 1c 
>>>> 01 31 c0 5b 5d c3 83 e2 03 be 01 00 00 00 eb b7 89 ef e8 65 e4 ff ff <89> 
>>>> 43 4c a8 30 75 21 e9 8e 00 00 00 0f b6 7b 03 40 84 ff 75 73 8b
>>>> [ 6.679020] RSP: e02b:ffffffff82803b70 EFLAGS: 00000246
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1b/0x27
>>>> [ 6.679020] insn_get_modrm+0x6c/0x120
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1b/0x27
>>>> [ 6.679020] insn_get_sib+0x40/0x80
>>>> [ 6.679020] insn_get_displacement+0x82/0x100
>>>> [ 6.679020] insn_decode+0xf8/0x100
>>>> [ 6.679020] optimize_nops+0x60/0x1e0
>>>> [ 6.679020] ? rcu_nmi_exit+0x2b/0x140
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1b/0x27
>>>> [ 6.679020] ? native_iret+0x3/0x7
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1c/0x27
>>>> [ 6.679020] ? restore_regs_and_return_to_kernel+0x1b/0x27
>>>> [ 6.679020] apply_alternatives+0x165/0x2e0
>>> 
>>> I have to admit that I'm a little lost with these "modern" stack traces,
>>> where contexts apparently switch without being clearly annotated. It is
>>> looking a little as if a #GP fault was happening somewhere here (hence
>>> the asm_exc_general_protection further up), but I cannot work out where
>>> (what insn) that would have come from.
>>> 
>>> You may want to add some debugging code to the hypervisor to tell you
>>> where exactly that #GP (if there is one in the first place) originates
>>> from. With that it may then become a little more clear what's actually
>>> going on (and why the behavior is random).
>> 
>> I will check what I can do there but as the crash is very random and only
>> happening during our CI tests, this is not really easy to reproduce.
>> If you have any example of code to do the debugging, I could run some
>> tests with it.
> 
> Well, you want to show_execution_state() on the guest registers in
> do_general_protection() or perhaps pv_emulate_privileged_op(), but
> only for the first (or first few) #GP for every guest (or else things
> likely get too noisy), and presumably also only when the guest is in
> kernel mode.
> 
> The resulting (guest) stack trace then would need taking apart, with
> the guest kernel binary on the side.
> 
>>> As a final remark - you've Cc-ed the x86 hypervisor maintainers, but at
>>> least from the data which is available so far this is more likely a
>>> kernel issue. So kernel folks might be of more help ...
>> 
>> I wanted to check if this could be a know issue first. The problem is
>> happening in the kernel, I agree, but only when it started as a Xen guest
>> so I assumed it could be related to Xen.
> 
> It is quite likely related to Xen, yes, but then still quite likely
> to the Xen-specific parts in the kernel. In the end it all boils
> down to where the first (suspected) #GP is coming from.

Would it make sense then for me to try a newer linux kernel first ?

Cheers
Bertrand

> 
> Jan


Reply via email to