Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-23 Thread Kevin O'Connor
On Wed, Dec 23, 2015 at 06:40:12AM +, Gonglei (Arei) wrote:
> > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > On Tue, Dec 22, 2015 at 02:14:12AM +, Gonglei (Arei) wrote:
> > > Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
> > > Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch.
> > 
> > Oops, can you try with the patch below instead?
> > 
> 
> It works now. Thanks!
> 
> But do we need to check other possible situations
> that maybe cause *extra stack* broken or overridden?

I believe the issue is that an NMI could occur while SeaBIOS is
already using its extra stack.  The code is not prepared to switch
into the extra stack while already on the extra stack.  SeaBIOS is
careful to always disable IRQs while running C code to prevent this
issue, but disabling normal IRQs does not disable NMIs.  So, I believe
this issue is specific to the nature of NMIs.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-22 Thread Kevin O'Connor
On Tue, Dec 22, 2015 at 03:15:26AM +, Xulei (Stone) wrote:
> Hi, Kevin,
> Can you tell how to reset/reboot this VM, if it goes to the handle_hwpic1()
> on its booting procedure? I mean, usually, SeaBIOS would not go to 
> handle_hwpic routine. But in my test case, SeaBIOS calls handle_hwpic when
> KVM injects a #UD expcetion (not irq) and  SeaBIOS will loop to handle this
> if KVM persistently injects exception.
>  
> Now, i just wish to reset/reboot this VM if it is fall into handle_hwpic. I
> tried follwing patch and it seems not work. What can i do to force 
> reset/reboot? 

Call the reset() function.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-22 Thread Gonglei (Arei)
> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Tuesday, December 22, 2015 11:51 PM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Tue, Dec 22, 2015 at 02:14:12AM +, Gonglei (Arei) wrote:
> > > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > > Sent: Tuesday, December 22, 2015 2:47 AM
> > > To: Gonglei (Arei)
> > > Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> > > Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> > > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> > > problem on qemu-kvm platform
> > >
> > > On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> > > > When the gurb of OS is booting, then the softirq and C function
> > > > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
> > > > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
> > > > be used again. And the stack of first calling will be broken, so that 
> > > > the
> > > SeaBIOS stuck.
> > > >
> > > > You can easily reproduce the problem.
> > > >
> > > > 1. start on guest
> > > > 2. reset the guest
> > > > 3. inject a NMI when the guest show the grub surface 4. then the guest
> > > > stuck
> > >
> > > Does the SeaBIOS patch below help?
> >
> > Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
> > Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch.
> 
> Oops, can you try with the patch below instead?
> 

It works now. Thanks!

But do we need to check other possible situations
that maybe cause *extra stack* broken or overridden?


> > > I'm not familiar with how to "inject a
> > > NMI" - can you describe the process in more detail?
> > >
> >
> > 1. Qemu Command line:
> >
> > #: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096
> -smp 8 -name suse -vnc 0.0.0.0:10 \
> > -device virtio-scsi-pci,id=scsi0 -drive
> file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=
> none,aio=native \
> > -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
> > -chardev file,id=seabios,path=/home/seabios.log -device
> isa-debugcon,iobase=0x402,chardev=seabios \
> > -monitor stdio -qmp unix:/tmp/qmp,server,nowait
> >
> > 2. Inject a NMI by QMP:
> >
> > #: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
> > Welcome to the QMP low-level shell!
> > Connected to QEMU 2.5.0
> >
> > (QEMU) system_reset
> > {"return": {}}
> > (QEMU) inject-nmi
> > {"return": {}}
> > (QEMU) inject-nmi
> > {"return": {}}
> >
> 
> I tried a few simple tests but was not able to reproduce.
> 
After reset the guest, then you inject an NMI when you see the grub surface
ASAP. 

Kevin, I sent you a picture in private. :)


Regards,
-Gonglei

> -Kevin
> 
> 
> --- a/src/romlayout.S
> +++ b/src/romlayout.S
> @@ -548,7 +548,10 @@ entry_post:
>  ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
> 
>  ORG 0xe2c3
> -IRQ_ENTRY 02
> +.global entry_02
> +entry_02:
> +ENTRY handle_02  // NMI handler does not switch onto extra stack
> +iretw
> 
>  ORG 0xe3fe
>  .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-22 Thread Kevin O'Connor
On Tue, Dec 22, 2015 at 02:14:12AM +, Gonglei (Arei) wrote:
> > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > Sent: Tuesday, December 22, 2015 2:47 AM
> > To: Gonglei (Arei)
> > Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> > Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> > problem on qemu-kvm platform
> > 
> > On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> > > When the gurb of OS is booting, then the softirq and C function
> > > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
> > > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
> > > be used again. And the stack of first calling will be broken, so that the
> > SeaBIOS stuck.
> > >
> > > You can easily reproduce the problem.
> > >
> > > 1. start on guest
> > > 2. reset the guest
> > > 3. inject a NMI when the guest show the grub surface 4. then the guest
> > > stuck
> > 
> > Does the SeaBIOS patch below help?  
> 
> Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
> Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch. 

Oops, can you try with the patch below instead?

> > I'm not familiar with how to "inject a
> > NMI" - can you describe the process in more detail?
> > 
> 
> 1. Qemu Command line:
> 
> #: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -smp 8 
> -name suse -vnc 0.0.0.0:10 \
> -device virtio-scsi-pci,id=scsi0 -drive 
> file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native
>  \
> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
> -chardev file,id=seabios,path=/home/seabios.log -device 
> isa-debugcon,iobase=0x402,chardev=seabios \
> -monitor stdio -qmp unix:/tmp/qmp,server,nowait 
> 
> 2. Inject a NMI by QMP:
> 
> #: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
> Welcome to the QMP low-level shell!
> Connected to QEMU 2.5.0
> 
> (QEMU) system_reset
> {"return": {}}
> (QEMU) inject-nmi  
> {"return": {}}
> (QEMU) inject-nmi
> {"return": {}}
> 

I tried a few simple tests but was not able to reproduce.

-Kevin


--- a/src/romlayout.S
+++ b/src/romlayout.S
@@ -548,7 +548,10 @@ entry_post:
 ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
 
 ORG 0xe2c3
-IRQ_ENTRY 02
+.global entry_02
+entry_02:
+ENTRY handle_02  // NMI handler does not switch onto extra stack
+iretw
 
 ORG 0xe3fe
 .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Kevin O'Connor
On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> When the gurb of OS is booting, then the softirq and C function send_disk_op()
> may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: 
> irqentry_extrastack
> is invoked, and the extra stack will be used again. And the stack of first 
> calling
> will be broken, so that the SeaBIOS stuck. 
> 
> You can easily reproduce the problem.
> 
> 1. start on guest
> 2. reset the guest
> 3. inject a NMI when the guest show the grub surface
> 4. then the guest stuck

Does the SeaBIOS patch below help?  I'm not familiar with how to
"inject a NMI" - can you describe the process in more detail?

-Kevin


--- a/src/romlayout.S
+++ b/src/romlayout.S
@@ -548,7 +548,9 @@ entry_post:
 ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
 
 ORG 0xe2c3
-IRQ_ENTRY 02
+.global entry_02
+entry_02:
+ENTRY handle_02  // NMI handler does not switch onto extra stack
 
 ORG 0xe3fe
 .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Gonglei (Arei)
Dear Kevin,

> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Sunday, December 20, 2015 10:33 PM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Sun, Dec 20, 2015 at 09:49:54AM +, Gonglei (Arei) wrote:
> > > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > > Sent: Saturday, December 19, 2015 11:12 PM
> > > On Sat, Dec 19, 2015 at 12:03:15PM +, Gonglei (Arei) wrote:
> > > > Maybe the root cause is not NMI but INTR, so yield() can open hardware
> > > interrupt,
> > > > And then execute interrupt handler, but the interrupt handler make the
> > > SeaBIOS
> > > > stack broken, so that the BSP can't execute the instruction and occur
> > > exception,
> > > > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs
> except
> > > > the surface phenomenon.
> > >
> > > I can't see any reason why allowing interrupts at this location would
> > > be a problem.
> > >
> > Does it have any relationship with *extra stack* of SeaBIOS?
> 
> None that I can see.  Also, the kvm trace seems to show the code
> trying to execute at rip=0x03 - that will crash long before the extra
> stack is used.
> 
When the gurb of OS is booting, then the softirq and C function send_disk_op()
may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: 
irqentry_extrastack
is invoked, and the extra stack will be used again. And the stack of first 
calling
will be broken, so that the SeaBIOS stuck. 

You can easily reproduce the problem.

1. start on guest
2. reset the guest
3. inject a NMI when the guest show the grub surface
4. then the guest stuck

If we disabled extra stack by setting

 CONFIG_ENTRY_EXTRASTACK=n

Then the problem is gone.

Besides, I have another thought:

Is it possible when one cpu is using the extra stack, but other cpus (APs)
still be waked up by hardware interrupt after yield() or br->flags = F_IF 
and used the extra stack again?


Regards,
-Gonglei

> > > > Kevin, can we drop yield() in smp_setup() ?
> > >
> > > It's possible to eliminate this instance of yield, but I think it
> > > would just push the crash to the next time interrupts are enabled.
> > >
> > Perhaps. I'm not sure.
> >
> > > > Is it really useful and allowable for SeaBIOS? Maybe for other
> components?
> > > > I'm not sure. Because we found that when SeaBIOS is booting, if we 
> > > > inject
> a
> > > > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same
> with
> > > > the current problem.
> > >
> > > If you apply the patches you had to prevent that NMI crash problem,
> > > does it also prevent the above crash?
> > >
> > Yes, but we cannot prevent the NMI injection (though I'll submit some
> patches to
> > forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70).
> >
> 
> -Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Xulei (Stone)
, loop handle PIC irq0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
... always hanle_hwpic1 irq=0, never ends anymore...


>> -Original Message-
>> From: Kevin O'Connor [mailto:ke...@koconnor.net]
>> Sent: Tuesday, December 22, 2015 2:47 AM
>> To: Gonglei (Arei)
>> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
>> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
>> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
>> problem on qemu-kvm platform
>>
>> On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
>> > When the gurb of OS is booting, then the softirq and C function
>> > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
>> > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
>> > be used again. And the stack of first calling will be broken, so that the
>> SeaBIOS stuck.
>> >
>> > You can easily reproduce the problem.
>> >
>> > 1. start on guest
>> > 2. reset the guest
>> > 3. inject a NMI when the guest show the grub surface 4. then the guest
>> > stuck
>>
>> Does the SeaBIOS patch below help? 
>
>Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
>Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch.
>
>
>> I'm not familiar with how to "inject a
>> NMI" - can you describe the process in more detail?
>>
>
>1. Qemu Command line:
>
>#: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -smp 8 
>-name suse -vnc 0.0.0.0:10 \
>-device virtio-scsi-pci,id=scsi0 -drive 
>file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native
> \
>-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
>-chardev file,id=seabios,path=/home/seabios.log -device 
>isa-debugcon,iobase=0x402,chardev=seabios \
>-monitor stdio -qmp unix:/tmp/qmp,server,nowait
>
>2. Inject a NMI by QMP:
>
>#: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
>Welcome to the QMP low-level shell!
>Connected to QEMU 2.5.0
>
>(QEMU) system_reset
>{"return": {}}
>(QEMU) inject-nmi 
>{"return": {}}
>(QEMU) inject-nmi
>{"return": {}}
>
>
>Regards,
>-Gonglei
>
>> -Kevin
>>
>>
>> --- a/src/romlayout.S
>> +++ b/src/romlayout.S
>> @@ -548,7 +548,9 @@ entry_post:
>>  ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
>>
>>  ORG 0xe2c3
>> -IRQ_ENTRY 02
>> +.global entry_02
>> +entry_02:
>> +ENTRY handle_02  // NMI handler does not switch onto extra
>> +stack
>>
>>  ORG 0xe3fe
>>  .global 
>> entry_13_officialN�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Gonglei (Arei)
> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Tuesday, December 22, 2015 2:47 AM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> > When the gurb of OS is booting, then the softirq and C function
> > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
> > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
> > be used again. And the stack of first calling will be broken, so that the
> SeaBIOS stuck.
> >
> > You can easily reproduce the problem.
> >
> > 1. start on guest
> > 2. reset the guest
> > 3. inject a NMI when the guest show the grub surface 4. then the guest
> > stuck
> 
> Does the SeaBIOS patch below help?  

Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch. 


> I'm not familiar with how to "inject a
> NMI" - can you describe the process in more detail?
> 

1. Qemu Command line:

#: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -smp 8 
-name suse -vnc 0.0.0.0:10 \
-device virtio-scsi-pci,id=scsi0 -drive 
file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native
 \
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
-chardev file,id=seabios,path=/home/seabios.log -device 
isa-debugcon,iobase=0x402,chardev=seabios \
-monitor stdio -qmp unix:/tmp/qmp,server,nowait 

2. Inject a NMI by QMP:

#: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
Welcome to the QMP low-level shell!
Connected to QEMU 2.5.0

(QEMU) system_reset
{"return": {}}
(QEMU) inject-nmi  
{"return": {}}
(QEMU) inject-nmi
{"return": {}}


Regards,
-Gonglei

> -Kevin
> 
> 
> --- a/src/romlayout.S
> +++ b/src/romlayout.S
> @@ -548,7 +548,9 @@ entry_post:
>  ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
> 
>  ORG 0xe2c3
> -IRQ_ENTRY 02
> +.global entry_02
> +entry_02:
> +ENTRY handle_02  // NMI handler does not switch onto extra
> +stack
> 
>  ORG 0xe3fe
>  .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-20 Thread Gonglei (Arei)

> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Saturday, December 19, 2015 11:12 PM
> On Sat, Dec 19, 2015 at 12:03:15PM +, Gonglei (Arei) wrote:
> > Maybe the root cause is not NMI but INTR, so yield() can open hardware
> interrupt,
> > And then execute interrupt handler, but the interrupt handler make the
> SeaBIOS
> > stack broken, so that the BSP can't execute the instruction and occur
> exception,
> > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs 
> > except
> > the surface phenomenon.
> 
> I can't see any reason why allowing interrupts at this location would
> be a problem.
> 
Does it have any relationship with *extra stack* of SeaBIOS?

> > Kevin, can we drop yield() in smp_setup() ?
> 
> It's possible to eliminate this instance of yield, but I think it
> would just push the crash to the next time interrupts are enabled.
> 
Perhaps. I'm not sure.

> > Is it really useful and allowable for SeaBIOS? Maybe for other components?
> > I'm not sure. Because we found that when SeaBIOS is booting, if we inject a
> > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with
> > the current problem.
> 
> If you apply the patches you had to prevent that NMI crash problem,
> does it also prevent the above crash?
> 
Yes, but we cannot prevent the NMI injection (though I'll submit some patches to
forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70).


Regards,
-Gonglei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-20 Thread Kevin O'Connor
On Sun, Dec 20, 2015 at 09:49:54AM +, Gonglei (Arei) wrote:
> > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > Sent: Saturday, December 19, 2015 11:12 PM
> > On Sat, Dec 19, 2015 at 12:03:15PM +, Gonglei (Arei) wrote:
> > > Maybe the root cause is not NMI but INTR, so yield() can open hardware
> > interrupt,
> > > And then execute interrupt handler, but the interrupt handler make the
> > SeaBIOS
> > > stack broken, so that the BSP can't execute the instruction and occur
> > exception,
> > > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs 
> > > except
> > > the surface phenomenon.
> > 
> > I can't see any reason why allowing interrupts at this location would
> > be a problem.
> > 
> Does it have any relationship with *extra stack* of SeaBIOS?

None that I can see.  Also, the kvm trace seems to show the code
trying to execute at rip=0x03 - that will crash long before the extra
stack is used.

> > > Kevin, can we drop yield() in smp_setup() ?
> > 
> > It's possible to eliminate this instance of yield, but I think it
> > would just push the crash to the next time interrupts are enabled.
> > 
> Perhaps. I'm not sure.
> 
> > > Is it really useful and allowable for SeaBIOS? Maybe for other components?
> > > I'm not sure. Because we found that when SeaBIOS is booting, if we inject 
> > > a
> > > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same 
> > > with
> > > the current problem.
> > 
> > If you apply the patches you had to prevent that NMI crash problem,
> > does it also prevent the above crash?
> > 
> Yes, but we cannot prevent the NMI injection (though I'll submit some patches 
> to
> forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70).
> 

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-19 Thread Gonglei (Arei)
Hi Kevin,


> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> 
> On Fri, Dec 18, 2015 at 03:04:58AM +, Gonglei (Arei) wrote:
> > Hi Kevin & Paolo,
> >
> > Luckily, I reproduced this problem last night. And I got the below log when
> SeaBIOS is stuck.
> [...]
> > [2015-12-18 10:38:10]  gonglei: finish while
> [...]
> > <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3
> info 0 8306
> > <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
> > <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
> > <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0
> 
> This is an odd finding.  It seems to indicate that the code is caught
> in an infinite irq loop once irqs are enabled.  What doesn't make
> sense is that an NMI shouldn't depend on the cpu irq enable flag.

Maybe the root cause is not NMI but INTR, so yield() can open hardware 
interrupt,
And then execute interrupt handler, but the interrupt handler make the SeaBIOS
stack broken, so that the BSP can't execute the instruction and occur exception,
VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs except
the surface phenomenon.

Kevin, can we drop yield() in smp_setup() ?

diff --git a/src/fw/smp.c b/src/fw/smp.c
index 579acdb..dd23eda 100644
--- a/src/fw/smp.c
+++ b/src/fw/smp.c
@@ -136,7 +136,6 @@ smp_setup(void)
 "  jc 1b\n"
 : "+m" (SMPLock), "+m" (SMPStack)
 : : "cc", "memory");
-yield();
 
 // Restore memory.
 *(u64*)BUILD_AP_BOOT_ADDR = old;

Is it really useful and allowable for SeaBIOS? Maybe for other components?
I'm not sure. Because we found that when SeaBIOS is booting, if we inject a
NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with
the current problem.


Regards,
-Gonglei

> Also, I can't explain why rip would be 0x03, nor why a #UD in an
> exception handler wouldn't result in a triple fault.  Maybe someone
> with more kvm knowledge could help here.
> 
> I did notice that you appear to be running with SeaBIOS v1.8.1 - I
> recommend you upgrade to the latest.  There were two important fixes
> in this area (8b9942fa and 3156b71a).  I don't think either of these
> fixes would explain the log above, but it would be best to eliminate
> the possibility.
> 
> -Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-19 Thread Kevin O'Connor
On Sat, Dec 19, 2015 at 12:03:15PM +, Gonglei (Arei) wrote:
> Maybe the root cause is not NMI but INTR, so yield() can open hardware 
> interrupt,
> And then execute interrupt handler, but the interrupt handler make the SeaBIOS
> stack broken, so that the BSP can't execute the instruction and occur 
> exception,
> VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs except
> the surface phenomenon.

I can't see any reason why allowing interrupts at this location would
be a problem.

> Kevin, can we drop yield() in smp_setup() ?

It's possible to eliminate this instance of yield, but I think it
would just push the crash to the next time interrupts are enabled.

> Is it really useful and allowable for SeaBIOS? Maybe for other components?
> I'm not sure. Because we found that when SeaBIOS is booting, if we inject a
> NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with
> the current problem.

If you apply the patches you had to prevent that NMI crash problem,
does it also prevent the above crash?

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-18 Thread Gonglei (Arei)
torderCount; i++)
 if (glob_prefix(glob, Bootorder[i]))
diff --git a/roms/seabios/src/fw/smp.c b/roms/seabios/src/fw/smp.c
index a466ea6..46ec607 100644
--- a/roms/seabios/src/fw/smp.c
+++ b/roms/seabios/src/fw/smp.c
@@ -46,12 +46,16 @@ int apic_id_is_present(u8 apic_id)
 return !!(FoundAPICIDs[apic_id/32] & (1ul << (apic_id % 32)));
 }
 
+// Atomic lock for shared stack across processors.
+u32 SMPLock __VISIBLE;
+u32 SMPStack __VISIBLE;
+
 void VISIBLE32FLAT
 handle_smp(void)
 {
 if (!CONFIG_QEMU)
 return;
-
+dprintf(1, ">>> enter handle_smp...\n");
 // Enable CPU caching
 setcr0(getcr0() & ~(CR0_CD|CR0_NW));
 
@@ -70,19 +74,16 @@ handle_smp(void)
 FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32));
 
 CountCPUs++;
+dprintf(1, "===: CountCPUs=%d, SMPStack=0x%x\n", CountCPUs, SMPStack);
 }
 
-// Atomic lock for shared stack across processors.
-u32 SMPLock __VISIBLE;
-u32 SMPStack __VISIBLE;
-
 // find and initialize the CPUs by launching a SIPI to them
 void
 smp_setup(void)
 {
 if (!CONFIG_QEMU)
 return;
-
+dprintf(1, ">>>>>gonglei: enter smp_setup()...\n");
 ASSERT32FLAT();
 u32 eax, ebx, ecx, cpuid_features;
 cpuid(1, , , , _features);
@@ -106,7 +107,7 @@ smp_setup(void)
 u64 new = (0xea | ((u64)SEG_BIOS<<24)
| (((u32)entry_smp - BUILD_BIOS_ADDR) << 8));
 *(u64*)BUILD_AP_BOOT_ADDR = new;
-
+dprintf(1, ">>>>>gonglei: begine to enable local APIC...\n");
 // enable local APIC
 u32 val = readl(APIC_SVR);
 writel(APIC_SVR, val | APIC_ENABLED);
@@ -125,10 +126,21 @@ smp_setup(void)
 writel(APIC_ICR_LOW, 0x000C4500);
 u32 sipi_vector = BUILD_AP_BOOT_ADDR >> 12;
 writel(APIC_ICR_LOW, 0x000C4600 | sipi_vector);
-
+dprintf(1, ">>>>>gonglei: finish enable local APIC...\n");
 // Wait for other CPUs to process the SIPI.
 u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
-while (cmos_smp_count != CountCPUs)
+dprintf(1, ">>>gonglei: cmos_smp_count=%d\n", cmos_smp_count);
+while (cmos_smp_count != CountCPUs) {
 asm volatile(
 // Release lock and allow other processors to use the stack.
 "  movl %%esp, %1\n"
@@ -139,8 +151,11 @@ smp_setup(void)
 "  jc 1b\n"
 : "+m" (SMPLock), "+m" (SMPStack)
 : : "cc", "memory");
+   //yield();
+}
+dprintf(1, " gonglei: finish while\n");
 yield();
-
+dprintf(1, " gonglei: finish yield\n");
 // Restore memory.
 *(u64*)BUILD_AP_BOOT_ADDR = old;
 
diff --git a/roms/seabios/src/misc.c b/roms/seabios/src/misc.c
index 8caaf31..77f6be3 100644
--- a/roms/seabios/src/misc.c
+++ b/roms/seabios/src/misc.c
@@ -64,6 +64,10 @@ void VISIBLE16
 handle_02(void)
 {
 debug_isr(DEBUG_ISR_02);
+dprintf(1, "gonglei hand nmi inject, write rtc \n");
 }
 
 void
diff --git a/roms/seabios/src/stacks.c b/roms/seabios/src/stacks.c
index 1dbdfe9..c1b5203 100644
--- a/roms/seabios/src/stacks.c
+++ b/roms/seabios/src/stacks.c
@@ -174,6 +174,7 @@ call16_smm(u32 eax, u32 edx, void *func)
 static void
 call32_sloppy_prep(void)
 {
+dprintf(1, ">>> enter call32_sloppy_prep...\n");
 // Backup cmos index register and disable nmi
 u8 cmosindex = inb(PORT_CMOS_INDEX);
 outb(cmosindex | NMI_DISABLE_BIT, PORT_CMOS_INDEX);


Regards,
-Gonglei


> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Thursday, November 19, 2015 9:41 PM
> To: Xulei (Stone)
> Cc: Gonglei (Arei); qemu-devel; seab...@seabios.org; Huangweidong (C)
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Thu, Nov 19, 2015 at 12:42:50PM +, Xulei (Stone) wrote:
> > Kevin,
> >
> > After deeply analyzing, i think there may be 3 possible reasons:
> > 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no
> > lock to protect.  So, sometimes, 2 or more vcpu may get the same
> > current value of CountCPUs. Then we'll get a single incrementation
> > instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will
> > loop forever;
> 
> The handle_smp() code is called from romlayout.S:entry_smp() which does take
> a lock.  So, all of handle_smp() should run synchronous.
> 
> > 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number?
> 
> Not sure - the last time there were problems in this area of the code others
> used kvmtrace to try and track this down.  Since you are getting dprintf
> statements, you could also try outputting cmos_smp_count prior to the loop
> (see patch below).
> 

Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-18 Thread Kevin O'Connor
On Fri, Dec 18, 2015 at 03:04:58AM +, Gonglei (Arei) wrote:
> Hi Kevin & Paolo,
> 
> Luckily, I reproduced this problem last night. And I got the below log when 
> SeaBIOS is stuck.
[...]
> [2015-12-18 10:38:10]  gonglei: finish while   
[...]
> <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 
> 0 8306
> <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
> <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
> <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0

This is an odd finding.  It seems to indicate that the code is caught
in an infinite irq loop once irqs are enabled.  What doesn't make
sense is that an NMI shouldn't depend on the cpu irq enable flag.
Also, I can't explain why rip would be 0x03, nor why a #UD in an
exception handler wouldn't result in a triple fault.  Maybe someone
with more kvm knowledge could help here.

I did notice that you appear to be running with SeaBIOS v1.8.1 - I
recommend you upgrade to the latest.  There were two important fixes
in this area (8b9942fa and 3156b71a).  I don't think either of these
fixes would explain the log above, but it would be best to eliminate
the possibility.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-18 Thread Gonglei (Arei)
>
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Saturday, December 19, 2015 7:13 AM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Fri, Dec 18, 2015 at 03:04:58AM +, Gonglei (Arei) wrote:
> > Hi Kevin & Paolo,
> >
> > Luckily, I reproduced this problem last night. And I got the below log when
> SeaBIOS is stuck.
> [...]
> > [2015-12-18 10:38:10]  gonglei: finish while
> [...]
> > <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3
> info 0 8306
> > <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
> > <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
> > <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0
> 
> This is an odd finding.  It seems to indicate that the code is caught
> in an infinite irq loop once irqs are enabled.  What doesn't make
> sense is that an NMI shouldn't depend on the cpu irq enable flag.
> Also, I can't explain why rip would be 0x03, nor why a #UD in an
> exception handler wouldn't result in a triple fault.  Maybe someone
> with more kvm knowledge could help here.
> 

Ccing Paolo and Radim.

> I did notice that you appear to be running with SeaBIOS v1.8.1 - I
> recommend you upgrade to the latest.  There were two important fixes
> in this area (8b9942fa and 3156b71a).  I don't think either of these
> fixes would explain the log above, but it would be best to eliminate
> the possibility.
> 
We can reproduce the problem using latest SeaBIOS too. :(


Regards,
-Gonglei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html