Re: consistent VM hang during reboot

2014-06-17 Thread John Nielsen
On Jun 13, 2014, at 4:23 PM, John Nielsen  wrote:

> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>> I am trying to solve a problem with amd64 FreeBSD virtual machines running 
>> on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in 
>> FreeBSD or the hypervisor, but I'm trying to rule out the OS first.
>> 
>> The _second_ time FreeBSD boots in a virtual machine with more than one 
>> core, the boot hangs just before the kernel would normally print e.g. "SMP: 
>> AP CPU #1 Launched!" (The last line on the console is "usbus0: 12Mbps Full 
>> Speed USB v1.0", but the problem persists even without USB). The VM will 
>> boot fine a first time, but running either "shutdown -r now" OR "reboot" 
>> will lead to a hung second boot. Stopping and starting the host qemu-kvm 
>> process is the only way to continue.

...

> Following up on the off chance anyone else is interested. I installed -HEAD 
> on a host that was having the problem ("v2" Xeon CPU) and ran a FreeBSD 9 VM 
> under bhyve. The problem did _not_ persist. That's not entirely conclusive 
> but it does point the finger at Qemu a bit more strongly. I have filed a bug 
> with them:
>  https://bugs.launchpad.net/qemu/+bug/1329956

With some help from the Qemu and KVM folks I've finally made some headway. The 
salient difference between the working and non-working CPUs above seems to be 
support for APIC virtualization. Loading the intel_kvm module (on the Linux 
host) with "enable_apicv=N" works around the reboot problem I've been having.

Since this now looks like a Linux KVM bug I won't follow up here any more, but 
I wanted to wrap up the thread for the archives.

JN

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: consistent VM hang during reboot

2014-06-13 Thread John Nielsen
On May 13, 2014, at 9:50 AM, John Nielsen  wrote:

> On May 9, 2014, at 12:41 PM, John Nielsen  wrote:
> 
>> On May 8, 2014, at 12:42 PM, Andrew Duane  wrote:
>> 
>>> From: owner-freebsd-hack...@freebsd.org 
>>> [mailto:owner-freebsd-hack...@freebsd.org] On Behalf Of John Nielsen
>>> 
 On May 8, 2014, at 11:03 AM, John Baldwin  wrote:
 
> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>> I am trying to solve a problem with amd64 FreeBSD virtual machines 
>> running on a Linux+KVM hypervisor. To be honest I'm not sure if the 
>> problem is in FreeBSD or 
> the hypervisor, but I'm trying to rule out the OS first.
>> 
>> The _second_ time FreeBSD boots in a virtual machine with more than one 
>> core, the boot hangs just before the kernel would normally print e.g. 
>> "SMP: AP CPU #1 
> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed 
> USB v1.0", but the problem persists even without USB). The VM will boot 
> fine a first time, 
> but running either "shutdown -r now" OR "reboot" will lead to a hung 
> second boot. Stopping and starting the host qemu-kvm process is the only 
> way to continue.
>> 
>> The problem seems to be triggered by something in the SMP portion of 
>> cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual 
>> "reset" button the next 
> boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot 
> then subsequent boots are fine (but I can only use one CPU core, of 
> course). However, if I 
> boot normally the first time then set 'kern.smp.disabled="1"' for the 
> second (re)boot, the problem is triggered. Apparently something in the 
> shutdown code is 
> "poisoning the well" for the next boot.
>> 
>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of 
>> yesterday.
>> 
>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
>> 
>> --- sys/amd64/amd64/vm_machdep.c.orig2014-05-07 13:19:07.400981580 
>> -0600
>> +++ sys/amd64/amd64/vm_machdep.c 2014-05-07 17:02:52.416783795 -0600
>> @@ -593,7 +593,7 @@
>> void
>> cpu_reset()
>> {
>> -#ifdef SMP
>> +#if 0
>>  cpuset_t map;
>>  u_int cnt;
>> 
>> I've tried skipping or disabling smaller chunks of code within the #if 
>> block but haven't found a consistent winner yet.
>> 
>> I'm hoping the list will have suggestions on how I can further narrow 
>> down the problem, or theories on what might be going on.
> 
> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 
> reboot')
> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It 
> might
> not, but if it does it would help narrow down the code to consider.
 
 Hello jhb, thanks for responding.
 
 I tried your suggestion but unfortunately it does not make any difference. 
 The reboot hangs regardless of which CPU I assign the command to.
 
 Any other suggestions?
>>> 
>>> When I was doing some early work on some of the Octeon multi-core chips, I 
>>> encountered something similar. If I remember correctly, there was an issue 
>>> in the shutdown sequence that did not properly halt the cores and set up 
>>> the "start jump" vector. So the first core would start, and when it tried 
>>> to start the next ones it would hang waiting for the ACK that they were 
>>> running (since they didn't have a start vector and hence never started). I 
>>> know MIPS, not AMD, so I can't say what the equivalent would be, but I'm 
>>> sure there is one. Check that part, setting up the early state.
>>> 
>>> If Juli and/or Adrian are reading this: do you remember anything about 
>>> that, something like 2 years ago?
>> 
>> That does sound promising, would love more details if anyone can provide 
>> them.
>> 
>> Here's another wrinkle:
>> 
>> The KVM machine in question is part of a cluster of identical servers 
>> (hardware, OS, software revisions). The problem is present on all servers in 
>> the cluster.
>> 
>> I also have access to a second homogenous cluster. The OS and software 
>> revisions on this cluster are identical to the first. The hardware is 
>> _nearly_ identical--slightly different mainboards from the same manufacturer 
>> and slightly older CPUs. The same VMs (identical disk image and definition, 
>> including CPU flags passed to the guest) that have a problem on the first 
>> cluster work flawlessly on this one.
>> 
>> Not sure if that means the bad behavior only appears on certain CPUs or if 
>> it's timing-related or something else entirely. I'd welcome speculation at 
>> this point.
>> 
>> CPU details below in case it makes a difference.
>> 
>> == Problem Host ==
>> model name  : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>> cmov pat 

Re: consistent VM hang during reboot

2014-05-13 Thread John Nielsen
On May 9, 2014, at 12:41 PM, John Nielsen  wrote:

> On May 8, 2014, at 12:42 PM, Andrew Duane  wrote:
> 
>> From: owner-freebsd-hack...@freebsd.org 
>> [mailto:owner-freebsd-hack...@freebsd.org] On Behalf Of John Nielsen
>> 
>>> On May 8, 2014, at 11:03 AM, John Baldwin  wrote:
>>> 
 On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
> I am trying to solve a problem with amd64 FreeBSD virtual machines 
> running on a Linux+KVM hypervisor. To be honest I'm not sure if the 
> problem is in FreeBSD or 
 the hypervisor, but I'm trying to rule out the OS first.
> 
> The _second_ time FreeBSD boots in a virtual machine with more than one 
> core, the boot hangs just before the kernel would normally print e.g. 
> "SMP: AP CPU #1 
 Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB 
 v1.0", but the problem persists even without USB). The VM will boot fine a 
 first time, 
 but running either "shutdown -r now" OR "reboot" will lead to a hung 
 second boot. Stopping and starting the host qemu-kvm process is the only 
 way to continue.
> 
> The problem seems to be triggered by something in the SMP portion of 
> cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual 
> "reset" button the next 
 boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot 
 then subsequent boots are fine (but I can only use one CPU core, of 
 course). However, if I 
 boot normally the first time then set 'kern.smp.disabled="1"' for the 
 second (re)boot, the problem is triggered. Apparently something in the 
 shutdown code is 
 "poisoning the well" for the next boot.
> 
> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of 
> yesterday.
> 
> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
> 
> --- sys/amd64/amd64/vm_machdep.c.orig 2014-05-07 13:19:07.400981580 
> -0600
> +++ sys/amd64/amd64/vm_machdep.c  2014-05-07 17:02:52.416783795 -0600
> @@ -593,7 +593,7 @@
> void
> cpu_reset()
> {
> -#ifdef SMP
> +#if 0
>   cpuset_t map;
>   u_int cnt;
> 
> I've tried skipping or disabling smaller chunks of code within the #if 
> block but haven't found a consistent winner yet.
> 
> I'm hoping the list will have suggestions on how I can further narrow 
> down the problem, or theories on what might be going on.
 
 Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 
 reboot')
 or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It 
 might
 not, but if it does it would help narrow down the code to consider.
>>> 
>>> Hello jhb, thanks for responding.
>>> 
>>> I tried your suggestion but unfortunately it does not make any difference. 
>>> The reboot hangs regardless of which CPU I assign the command to.
>>> 
>>> Any other suggestions?
>> 
>> When I was doing some early work on some of the Octeon multi-core chips, I 
>> encountered something similar. If I remember correctly, there was an issue 
>> in the shutdown sequence that did not properly halt the cores and set up the 
>> "start jump" vector. So the first core would start, and when it tried to 
>> start the next ones it would hang waiting for the ACK that they were running 
>> (since they didn't have a start vector and hence never started). I know 
>> MIPS, not AMD, so I can't say what the equivalent would be, but I'm sure 
>> there is one. Check that part, setting up the early state.
>> 
>> If Juli and/or Adrian are reading this: do you remember anything about that, 
>> something like 2 years ago?
> 
> That does sound promising, would love more details if anyone can provide them.
> 
> Here's another wrinkle:
> 
> The KVM machine in question is part of a cluster of identical servers 
> (hardware, OS, software revisions). The problem is present on all servers in 
> the cluster.
> 
> I also have access to a second homogenous cluster. The OS and software 
> revisions on this cluster are identical to the first. The hardware is 
> _nearly_ identical--slightly different mainboards from the same manufacturer 
> and slightly older CPUs. The same VMs (identical disk image and definition, 
> including CPU flags passed to the guest) that have a problem on the first 
> cluster work flawlessly on this one.
> 
> Not sure if that means the bad behavior only appears on certain CPUs or if 
> it's timing-related or something else entirely. I'd welcome speculation at 
> this point.
> 
> CPU details below in case it makes a difference.
> 
> == Problem Host ==
> model name  : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
> nonstop_tsc ap

Re: consistent VM hang during reboot

2014-05-09 Thread John Nielsen
On May 8, 2014, at 12:42 PM, Andrew Duane  wrote:

> From: owner-freebsd-hack...@freebsd.org 
> [mailto:owner-freebsd-hack...@freebsd.org] On Behalf Of John Nielsen
> 
>> On May 8, 2014, at 11:03 AM, John Baldwin  wrote:
>> 
>>> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
 I am trying to solve a problem with amd64 FreeBSD virtual machines running 
 on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in 
 FreeBSD or 
>>> the hypervisor, but I'm trying to rule out the OS first.
 
 The _second_ time FreeBSD boots in a virtual machine with more than one 
 core, the boot hangs just before the kernel would normally print e.g. 
 "SMP: AP CPU #1 
>>> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB 
>>> v1.0", but the problem persists even without USB). The VM will boot fine a 
>>> first time, 
>>> but running either "shutdown -r now" OR "reboot" will lead to a hung second 
>>> boot. Stopping and starting the host qemu-kvm process is the only way to 
>>> continue.
 
 The problem seems to be triggered by something in the SMP portion of 
 cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual 
 "reset" button the next 
>>> boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot 
>>> then subsequent boots are fine (but I can only use one CPU core, of 
>>> course). However, if I 
>>> boot normally the first time then set 'kern.smp.disabled="1"' for the 
>>> second (re)boot, the problem is triggered. Apparently something in the 
>>> shutdown code is 
>>> "poisoning the well" for the next boot.
 
 The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of 
 yesterday.
 
 This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
 
 --- sys/amd64/amd64/vm_machdep.c.orig  2014-05-07 13:19:07.400981580 
 -0600
 +++ sys/amd64/amd64/vm_machdep.c   2014-05-07 17:02:52.416783795 -0600
 @@ -593,7 +593,7 @@
 void
 cpu_reset()
 {
 -#ifdef SMP
 +#if 0
cpuset_t map;
u_int cnt;
 
 I've tried skipping or disabling smaller chunks of code within the #if 
 block but haven't found a consistent winner yet.
 
 I'm hoping the list will have suggestions on how I can further narrow down 
 the problem, or theories on what might be going on.
>>> 
>>> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 
>>> reboot')
>>> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It might
>>> not, but if it does it would help narrow down the code to consider.
>> 
>> Hello jhb, thanks for responding.
>> 
>> I tried your suggestion but unfortunately it does not make any difference. 
>> The reboot hangs regardless of which CPU I assign the command to.
>> 
>> Any other suggestions?
> 
> When I was doing some early work on some of the Octeon multi-core chips, I 
> encountered something similar. If I remember correctly, there was an issue in 
> the shutdown sequence that did not properly halt the cores and set up the 
> "start jump" vector. So the first core would start, and when it tried to 
> start the next ones it would hang waiting for the ACK that they were running 
> (since they didn't have a start vector and hence never started). I know MIPS, 
> not AMD, so I can't say what the equivalent would be, but I'm sure there is 
> one. Check that part, setting up the early state.
> 
> If Juli and/or Adrian are reading this: do you remember anything about that, 
> something like 2 years ago?

That does sound promising, would love more details if anyone can provide them.

Here's another wrinkle:

The KVM machine in question is part of a cluster of identical servers 
(hardware, OS, software revisions). The problem is present on all servers in 
the cluster.

I also have access to a second homogenous cluster. The OS and software 
revisions on this cluster are identical to the first. The hardware is _nearly_ 
identical--slightly different mainboards from the same manufacturer and 
slightly older CPUs. The same VMs (identical disk image and definition, 
including CPU flags passed to the guest) that have a problem on the first 
cluster work flawlessly on this one.

Not sure if that means the bad behavior only appears on certain CPUs or if it's 
timing-related or something else entirely. I'd welcome speculation at this 
point.

CPU details below in case it makes a difference.

== Problem Host ==
model name  : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat

RE: consistent VM hang during reboot

2014-05-08 Thread Andrew Duane
When I was doing some early work on some of the Octeon multi-core chips, I 
encountered something similar. If I remember correctly, there was an issue in 
the shutdown sequence that did not properly halt the cores and set up the 
"start jump" vector. So the first core would start, and when it tried to start 
the next ones it would hang waiting for the ACK that they were running (since 
they didn't have a start vector and hence never started). I know MIPS, not AMD, 
so I can't say what the equivalent would be, but I'm sure there is one. Check 
that part, setting up the early state.

If Juli and/or Adrian are reading this: do you remember anything about that, 
something like 2 years ago?


Andrew L. Duane
AT&T Technical Lead
JNCIA - JUNOS
m   +1 603.770.7088
o+1 408.933.6944 (2-6944)
skype: andrewlduane
adu...@juniper.net


-Original Message-
From: owner-freebsd-hack...@freebsd.org 
[mailto:owner-freebsd-hack...@freebsd.org] On Behalf Of John Nielsen
Sent: Thursday, May 08, 2014 1:56 PM
To: John Baldwin
Cc: freebsd-hack...@freebsd.org; freebsd-virtualization@freebsd.org
Subject: Re: consistent VM hang during reboot

On May 8, 2014, at 11:03 AM, John Baldwin  wrote:

> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>> I am trying to solve a problem with amd64 FreeBSD virtual machines running 
>> on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in 
>> FreeBSD or 
> the hypervisor, but I'm trying to rule out the OS first.
>> 
>> The _second_ time FreeBSD boots in a virtual machine with more than one 
>> core, the boot hangs just before the kernel would normally print e.g. "SMP: 
>> AP CPU #1 
> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB 
> v1.0", but the problem persists even without USB). The VM will boot fine a 
> first time, 
> but running either "shutdown -r now" OR "reboot" will lead to a hung second 
> boot. Stopping and starting the host qemu-kvm process is the only way to 
> continue.
>> 
>> The problem seems to be triggered by something in the SMP portion of 
>> cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual 
>> "reset" button the next 
> boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot then 
> subsequent boots are fine (but I can only use one CPU core, of course). 
> However, if I 
> boot normally the first time then set 'kern.smp.disabled="1"' for the second 
> (re)boot, the problem is triggered. Apparently something in the shutdown code 
> is 
> "poisoning the well" for the next boot.
>> 
>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of 
>> yesterday.
>> 
>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
>> 
>> --- sys/amd64/amd64/vm_machdep.c.orig2014-05-07 13:19:07.400981580 
>> -0600
>> +++ sys/amd64/amd64/vm_machdep.c 2014-05-07 17:02:52.416783795 -0600
>> @@ -593,7 +593,7 @@
>> void
>> cpu_reset()
>> {
>> -#ifdef SMP
>> +#if 0
>>  cpuset_t map;
>>  u_int cnt;
>> 
>> I've tried skipping or disabling smaller chunks of code within the #if block 
>> but haven't found a consistent winner yet.
>> 
>> I'm hoping the list will have suggestions on how I can further narrow down 
>> the problem, or theories on what might be going on.
> 
> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 reboot')
> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It might
> not, but if it does it would help narrow down the code to consider.

Hello jhb, thanks for responding.

I tried your suggestion but unfortunately it does not make any difference. The 
reboot hangs regardless of which CPU I assign the command to.

Any other suggestions?

JN

___
freebsd-hack...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: consistent VM hang during reboot

2014-05-08 Thread John Nielsen
On May 8, 2014, at 11:03 AM, John Baldwin  wrote:

> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>> I am trying to solve a problem with amd64 FreeBSD virtual machines running 
>> on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in 
>> FreeBSD or 
> the hypervisor, but I'm trying to rule out the OS first.
>> 
>> The _second_ time FreeBSD boots in a virtual machine with more than one 
>> core, the boot hangs just before the kernel would normally print e.g. "SMP: 
>> AP CPU #1 
> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB 
> v1.0", but the problem persists even without USB). The VM will boot fine a 
> first time, 
> but running either "shutdown -r now" OR "reboot" will lead to a hung second 
> boot. Stopping and starting the host qemu-kvm process is the only way to 
> continue.
>> 
>> The problem seems to be triggered by something in the SMP portion of 
>> cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual 
>> "reset" button the next 
> boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot then 
> subsequent boots are fine (but I can only use one CPU core, of course). 
> However, if I 
> boot normally the first time then set 'kern.smp.disabled="1"' for the second 
> (re)boot, the problem is triggered. Apparently something in the shutdown code 
> is 
> "poisoning the well" for the next boot.
>> 
>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of 
>> yesterday.
>> 
>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
>> 
>> --- sys/amd64/amd64/vm_machdep.c.orig2014-05-07 13:19:07.400981580 
>> -0600
>> +++ sys/amd64/amd64/vm_machdep.c 2014-05-07 17:02:52.416783795 -0600
>> @@ -593,7 +593,7 @@
>> void
>> cpu_reset()
>> {
>> -#ifdef SMP
>> +#if 0
>>  cpuset_t map;
>>  u_int cnt;
>> 
>> I've tried skipping or disabling smaller chunks of code within the #if block 
>> but haven't found a consistent winner yet.
>> 
>> I'm hoping the list will have suggestions on how I can further narrow down 
>> the problem, or theories on what might be going on.
> 
> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 reboot')
> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It might
> not, but if it does it would help narrow down the code to consider.

Hello jhb, thanks for responding.

I tried your suggestion but unfortunately it does not make any difference. The 
reboot hangs regardless of which CPU I assign the command to.

Any other suggestions?

JN

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: consistent VM hang during reboot

2014-05-08 Thread John Baldwin
On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
> I am trying to solve a problem with amd64 FreeBSD virtual machines running on 
> a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in 
> FreeBSD or 
the hypervisor, but I'm trying to rule out the OS first.
> 
> The _second_ time FreeBSD boots in a virtual machine with more than one core, 
> the boot hangs just before the kernel would normally print e.g. "SMP: AP CPU 
> #1 
Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB 
v1.0", but the problem persists even without USB). The VM will boot fine a 
first time, 
but running either "shutdown -r now" OR "reboot" will lead to a hung second 
boot. Stopping and starting the host qemu-kvm process is the only way to 
continue.
> 
> The problem seems to be triggered by something in the SMP portion of 
> cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual "reset" 
> button the next 
boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot then 
subsequent boots are fine (but I can only use one CPU core, of course). 
However, if I 
boot normally the first time then set 'kern.smp.disabled="1"' for the second 
(re)boot, the problem is triggered. Apparently something in the shutdown code 
is 
"poisoning the well" for the next boot.
> 
> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of 
> yesterday.
> 
> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
> 
> --- sys/amd64/amd64/vm_machdep.c.orig 2014-05-07 13:19:07.400981580 -0600
> +++ sys/amd64/amd64/vm_machdep.c  2014-05-07 17:02:52.416783795 -0600
> @@ -593,7 +593,7 @@
>  void
>  cpu_reset()
>  {
> -#ifdef SMP
> +#if 0
>   cpuset_t map;
>   u_int cnt;
> 
> I've tried skipping or disabling smaller chunks of code within the #if block 
> but haven't found a consistent winner yet.
> 
> I'm hoping the list will have suggestions on how I can further narrow down 
> the problem, or theories on what might be going on.

Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 reboot')
or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It might
not, but if it does it would help narrow down the code to consider.

-- 
John Baldwin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"