[Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2019-06-14 Thread Jan Beulich
1: x86/ACPI: re-park previously parked CPUs upon resume from S3
2: x86/ACPI: restore VESA mode upon resume from S3
3: x86: a little bit of 16-bit video mode setting code cleanup

Patch 2 is meant to address an issue I've observed while testing
patch 1, and patch 3 is simply a collection a misc changes noticed
while putting together patch 2 as possibly worthwhile to make.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2018-04-16 Thread Jan Beulich
>>> On 15.04.18 at 22:15,  wrote:
> (XEN) *** DOUBLE FAULT ***
> (XEN) [ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]
> (XEN) CPU:0
> (XEN) RIP:e008:[] search_pre_exception_table+0/0x54
> (XEN) RFLAGS: 00010046   CONTEXT: hypervisor
> (XEN) rax:    rbx:    rcx: 
> (XEN) rdx:    rsi:    rdi: c90040cd4028
> (XEN) rbp: 36ffbf32bfb7   rsp: c90040cd4020   r8:  
> (XEN) r9:     r10:    r11: 
> (XEN) r12:    r13:    r14: c90040cd7fff
> (XEN) r15:    cr0: 8005003b   cr4: 000426e0
> (XEN) cr3: 00022200a000   cr2: c90040cd3ff8
> (XEN) fsb: 7fd74515e740   gsb: 88021e6c   gss: 
> (XEN) ds: 002b   es: 002b   fs:    gs:    ss: e010   cs: e008
> (XEN) Current stack base c90040cd differs from expected 
> 8300cec88000
> (XEN) Valid stack range: c90040cd6000-c90040cd8000, 
> sp=c90040cd4020, tss.rsp0=8300cec8ffa0

The fact that the exact location varies where the #DF triggers is of no big
interest - it all depends on when exactly the stack overflow occurs. What
I note though: c90040cd4020 is a guest (presumably Dom0) kernel
address, far outside the Xen range. I guess we'd need to see all of that
(wrong) stack's contents logged up to the original entry into Xen to
understand how that could have happened.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2018-04-16 Thread Juergen Gross
On 13/04/18 14:01, Andrew Cooper wrote:
> On 13/04/18 12:49, Jan Beulich wrote:
>> 1: correct ordering of operations during S3 resume
>> 2: suppress BTI mitigations around S3 suspend/resume
>> 3: check feature flags after resume
>>
>> Signed-off-by: Jan Beulich 
> 
> Acked-by: Andrew Cooper 
> 

Release-acked-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2018-04-15 Thread Simon Gaiser
Andrew Cooper:
> On 15/04/18 16:52, Simon Gaiser wrote:
>> Andrew Cooper:
>>> On 14/04/18 06:49, Simon Gaiser wrote:
 Jan Beulich:
> 1: correct ordering of operations during S3 resume
> 2: suppress BTI mitigations around S3 suspend/resume
> 3: check feature flags after resume
>
> Signed-off-by: Jan Beulich 
>
> Simon, could you give this a try please?
 Backported to 4.8 it works fine with the two fixes I sent earlier.

 I now also tried staging. Resume is broken even without IBRS/IBPB. It
 panics about a double fault somewhere after it starts to enable the
 non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
 could test the patches anyway. With them it gets again to the point
 where it double faults. So the patches are most likely fine.

 I didn't really looked yet at the cause of the double fault.
>>> Do you at least have the crash log from the attempt?
>> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
>> Debian sid:
> 
> I can't find that object.  I presume this isn't an upstream tree?

That's the head of upstream staging as of Friday/Saturday night. And
AFAICS it still is:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5

>> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, 
>> BCAST, CMCI
>> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
>> (XEN) Finishing wakeup from ACPI S3 state.
>> (XEN) Enabling non-boot CPUs  ...
>> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 001b from 
>> 0xfee00c00 to 0xfee0
>> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 001b from 
>> 0xfee00c00 to 0xfee00800
>> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 001b from 
>> 0xfee00c00 to 0xfee0
>> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 001b from 
>> 0xfee00c00 to 0xfee00800
>> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 001b from 
>> 0xfee00c00 to 0xfee0
>> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 001b from 
>> 0xfee00c00 to 0xfee00800
> 
> Bad dom0.  It shouldn't be playing with APIC_BASE at all, but I guess
> this means I can't fix the hypervisor behaviour to throw #GP back at a
> PV guest.
> 
>> (XEN) *** DOUBLE FAULT ***
>> (XEN) [ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]
>> (XEN) CPU:0
>> (XEN) RIP:e008:[] handle_exception+0x9c/0xf7
> 
> Can you disassemble the binary and find out where this is?  On current
> staging, handle_exception+0x9c is in the middle of
> SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you.

Dump of assembler code for function handle_exception:
   0x82d08037a8a8 <+0>: 0f 1f 00nopl   (%rax)
   0x82d08037a8ab <+3>: 48 83 c4 88 add$0xff88,%rsp
   0x82d08037a8af <+7>: fc  cld
   0x82d08037a8b0 <+8>: 48 89 7c 24 70  mov%rdi,0x70(%rsp)
   0x82d08037a8b5 <+13>:31 ff   xor%edi,%edi
   0x82d08037a8b7 <+15>:48 89 74 24 68  mov%rsi,0x68(%rsp)
   0x82d08037a8bc <+20>:31 f6   xor%esi,%esi
   0x82d08037a8be <+22>:48 89 54 24 60  mov%rdx,0x60(%rsp)
   0x82d08037a8c3 <+27>:31 d2   xor%edx,%edx
   0x82d08037a8c5 <+29>:48 89 4c 24 58  mov%rcx,0x58(%rsp)
   0x82d08037a8ca <+34>:31 c9   xor%ecx,%ecx
   0x82d08037a8cc <+36>:48 89 44 24 50  mov%rax,0x50(%rsp)
   0x82d08037a8d1 <+41>:31 c0   xor%eax,%eax
   0x82d08037a8d3 <+43>:4c 89 44 24 48  mov%r8,0x48(%rsp)
   0x82d08037a8d8 <+48>:4c 89 4c 24 40  mov%r9,0x40(%rsp)
   0x82d08037a8dd <+53>:4c 89 54 24 38  mov%r10,0x38(%rsp)
   0x82d08037a8e2 <+58>:4c 89 5c 24 30  mov%r11,0x30(%rsp)
   0x82d08037a8e7 <+63>:45 31 c0xor%r8d,%r8d
   0x82d08037a8ea <+66>:45 31 c9xor%r9d,%r9d
   0x82d08037a8ed <+69>:45 31 d2xor%r10d,%r10d
   0x82d08037a8f0 <+72>:45 31 dbxor%r11d,%r11d
   0x82d08037a8f3 <+75>:48 89 5c 24 28  mov%rbx,0x28(%rsp)
   0x82d08037a8f8 <+80>:31 db   xor%ebx,%ebx
   0x82d08037a8fa <+82>:48 89 6c 24 20  mov%rbp,0x20(%rsp)
   0x82d08037a8ff <+87>:48 8d 6c 24 20  lea0x20(%rsp),%rbp
   0x82d08037a904 <+92>:48 f7 d5not%rbp
   0x82d08037a907 <+95>:4c 89 64 24 18  mov%r12,0x18(%rsp)
   0x82d08037a90c <+100>:   4c 89 6c 24 10  mov%r13,0x10(%rsp)
   0x82d08037a911 <+105>:   4c 89 74 24 08  mov%r14,0x8(%rsp)
   0x82d08037a916 <+110>:   4c 89 3c 24 mov%r15,(%rsp)
   0x82d08037a91a <+114>:   45 31 e4xor%r12d,%r12d
   0x82d08037a91d <+117>:   45 31 edxor%r13d,%r13d
   0x82d08037a920 <+120>:   45 31 f6

Re: [Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2018-04-15 Thread Andrew Cooper
On 15/04/18 16:52, Simon Gaiser wrote:
> Andrew Cooper:
>> On 14/04/18 06:49, Simon Gaiser wrote:
>>> Jan Beulich:
 1: correct ordering of operations during S3 resume
 2: suppress BTI mitigations around S3 suspend/resume
 3: check feature flags after resume

 Signed-off-by: Jan Beulich 

 Simon, could you give this a try please?
>>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>>
>>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>>> panics about a double fault somewhere after it starts to enable the
>>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>>> could test the patches anyway. With them it gets again to the point
>>> where it double faults. So the patches are most likely fine.
>>>
>>> I didn't really looked yet at the cause of the double fault.
>> Do you at least have the crash log from the attempt?
> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
> Debian sid:

I can't find that object.  I presume this isn't an upstream tree?

>
> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, 
> BCAST, CMCI
> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs  ...
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 001b from 
> 0xfee00c00 to 0xfee0
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 001b from 
> 0xfee00c00 to 0xfee00800
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 001b from 
> 0xfee00c00 to 0xfee0
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 001b from 
> 0xfee00c00 to 0xfee00800
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 001b from 
> 0xfee00c00 to 0xfee0
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 001b from 
> 0xfee00c00 to 0xfee00800

Bad dom0.  It shouldn't be playing with APIC_BASE at all, but I guess
this means I can't fix the hypervisor behaviour to throw #GP back at a
PV guest.

> (XEN) *** DOUBLE FAULT ***
> (XEN) [ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]
> (XEN) CPU:0
> (XEN) RIP:e008:[] handle_exception+0x9c/0xf7

Can you disassemble the binary and find out where this is?  On current
staging, handle_exception+0x9c is in the middle of
SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you.

> (XEN) RFLAGS: 00010006   CONTEXT: hypervisor
> (XEN) rax: c90040cd4068   rbx:    rcx: 000a
> (XEN) rdx:    rsi:    rdi: 
> (XEN) rbp: 36ffbf32bf77   rsp: c90040cd4000   r8:  
> (XEN) r9:     r10:    r11: 
> (XEN) r12:    r13:    r14: c90040cd7fff
> (XEN) r15:    cr0: 8005003b   cr4: 000426e0
> (XEN) cr3: 00022200a000   cr2: c90040cd3ff8
> (XEN) fsb:    gsb: 88021e6c   gss: 
> (XEN) ds: 002b   es: 002b   fs: 8a00   gs: 0010   ss: e010   cs: e008
> (XEN) Current stack base c90040cd differs from expected 
> 8300cec88000
> (XEN) Valid stack range: c90040cd6000-c90040cd8000, 
> sp=c90040cd4000, tss.rsp0=8300cec8ffa0

Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a
region which isn't mapped, tried to push a value, got #PF, tried to
invoke the #PF exception handler which faulted again, and escalated to
#DF which followed the TSS and moved back to reality.

The only way to come in with stack pointers other than TSS.RSP0 is via
syscall and sysenter.  SYSENTER_ESP should be identical to TSS.RSP0

--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs)
 _show_registers(regs, crs, CTXT_hypervisor, NULL);
 show_stack_overflow(cpu, regs);
 
+    {
+    uint64_t val;
+
+    rdmsrl(MSR_IA32_SYSENTER_ESP, val);
+    printk("*** SYSENTER_ESP: %p\n", _p(val));
+    }
+
 panic("DOUBLE FAULT -- system shutdown");
 }
 
so this bit of debugging should help track things down.  If not, then
we've probably got an issue (re)writing the syscall trampolines.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2018-04-15 Thread Simon Gaiser
Andrew Cooper:
> On 14/04/18 06:49, Simon Gaiser wrote:
>> Jan Beulich:
>>> 1: correct ordering of operations during S3 resume
>>> 2: suppress BTI mitigations around S3 suspend/resume
>>> 3: check feature flags after resume
>>>
>>> Signed-off-by: Jan Beulich 
>>>
>>> Simon, could you give this a try please?
>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>
>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>> panics about a double fault somewhere after it starts to enable the
>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>> could test the patches anyway. With them it gets again to the point
>> where it double faults. So the patches are most likely fine.
>>
>> I didn't really looked yet at the cause of the double fault.
> 
> Do you at least have the crash log from the attempt?

Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
Debian sid:

(XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, 
CMCI
(XEN) CPU0 CMCI LVT vector (0xf2) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 001b from 
0xfee00c00 to 0xfee0
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 001b from 
0xfee00c00 to 0xfee00800
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 001b from 
0xfee00c00 to 0xfee0
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 001b from 
0xfee00c00 to 0xfee00800
(XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 001b from 
0xfee00c00 to 0xfee0
(XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 001b from 
0xfee00c00 to 0xfee00800
(XEN) *** DOUBLE FAULT ***
(XEN) [ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]
(XEN) CPU:0
(XEN) RIP:e008:[] handle_exception+0x9c/0xf7
(XEN) RFLAGS: 00010006   CONTEXT: hypervisor
(XEN) rax: c90040cd4068   rbx:    rcx: 000a
(XEN) rdx:    rsi:    rdi: 
(XEN) rbp: 36ffbf32bf77   rsp: c90040cd4000   r8:  
(XEN) r9:     r10:    r11: 
(XEN) r12:    r13:    r14: c90040cd7fff
(XEN) r15:    cr0: 8005003b   cr4: 000426e0
(XEN) cr3: 00022200a000   cr2: c90040cd3ff8
(XEN) fsb:    gsb: 88021e6c   gss: 
(XEN) ds: 002b   es: 002b   fs: 8a00   gs: 0010   ss: e010   cs: e008
(XEN) Current stack base c90040cd differs from expected 8300cec88000
(XEN) Valid stack range: c90040cd6000-c90040cd8000, 
sp=c90040cd4000, tss.rsp0=8300cec8ffa0
(XEN) No stack overflow detected. Skipping stack trace.
(XEN) 
(XEN) 
(XEN) Panic on CPU 0:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) 
(XEN) 
(XEN) Reboot in five seconds...



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2018-04-15 Thread Andrew Cooper
On 14/04/18 06:49, Simon Gaiser wrote:
> Jan Beulich:
>> 1: correct ordering of operations during S3 resume
>> 2: suppress BTI mitigations around S3 suspend/resume
>> 3: check feature flags after resume
>>
>> Signed-off-by: Jan Beulich 
>>
>> Simon, could you give this a try please?
> Backported to 4.8 it works fine with the two fixes I sent earlier.
>
> I now also tried staging. Resume is broken even without IBRS/IBPB. It
> panics about a double fault somewhere after it starts to enable the
> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
> could test the patches anyway. With them it gets again to the point
> where it double faults. So the patches are most likely fine.
>
> I didn't really looked yet at the cause of the double fault.

Do you at least have the crash log from the attempt?

~Andrew

>
> Simon
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2018-04-13 Thread Andrew Cooper
On 13/04/18 12:49, Jan Beulich wrote:
> 1: correct ordering of operations during S3 resume
> 2: suppress BTI mitigations around S3 suspend/resume
> 3: check feature flags after resume
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/3] x86: S3 resume adjustments

2018-04-13 Thread Jan Beulich
1: correct ordering of operations during S3 resume
2: suppress BTI mitigations around S3 suspend/resume
3: check feature flags after resume

Signed-off-by: Jan Beulich 

Simon, could you give this a try please?

Thanks, Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel