On May 8, 2014, at 12:42 PM, Andrew Duane <adu...@juniper.net> wrote:

> From: owner-freebsd-hack...@freebsd.org 
> [mailto:owner-freebsd-hack...@freebsd.org] On Behalf Of John Nielsen
> 
>> On May 8, 2014, at 11:03 AM, John Baldwin <j...@freebsd.org> wrote:
>> 
>>> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>>>> I am trying to solve a problem with amd64 FreeBSD virtual machines running 
>>>> on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in 
>>>> FreeBSD or 
>>> the hypervisor, but I'm trying to rule out the OS first.
>>>> 
>>>> The _second_ time FreeBSD boots in a virtual machine with more than one 
>>>> core, the boot hangs just before the kernel would normally print e.g. 
>>>> "SMP: AP CPU #1 
>>> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB 
>>> v1.0", but the problem persists even without USB). The VM will boot fine a 
>>> first time, 
>>> but running either "shutdown -r now" OR "reboot" will lead to a hung second 
>>> boot. Stopping and starting the host qemu-kvm process is the only way to 
>>> continue.
>>>> 
>>>> The problem seems to be triggered by something in the SMP portion of 
>>>> cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual 
>>>> "reset" button the next 
>>> boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot 
>>> then subsequent boots are fine (but I can only use one CPU core, of 
>>> course). However, if I 
>>> boot normally the first time then set 'kern.smp.disabled="1"' for the 
>>> second (re)boot, the problem is triggered. Apparently something in the 
>>> shutdown code is 
>>> "poisoning the well" for the next boot.
>>>> 
>>>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of 
>>>> yesterday.
>>>> 
>>>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
>>>> 
>>>> --- sys/amd64/amd64/vm_machdep.c.orig      2014-05-07 13:19:07.400981580 
>>>> -0600
>>>> +++ sys/amd64/amd64/vm_machdep.c   2014-05-07 17:02:52.416783795 -0600
>>>> @@ -593,7 +593,7 @@
>>>> void
>>>> cpu_reset()
>>>> {
>>>> -#ifdef SMP
>>>> +#if 0
>>>>    cpuset_t map;
>>>>    u_int cnt;
>>>> 
>>>> I've tried skipping or disabling smaller chunks of code within the #if 
>>>> block but haven't found a consistent winner yet.
>>>> 
>>>> I'm hoping the list will have suggestions on how I can further narrow down 
>>>> the problem, or theories on what might be going on.
>>> 
>>> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 
>>> reboot')
>>> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It might
>>> not, but if it does it would help narrow down the code to consider.
>> 
>> Hello jhb, thanks for responding.
>> 
>> I tried your suggestion but unfortunately it does not make any difference. 
>> The reboot hangs regardless of which CPU I assign the command to.
>> 
>> Any other suggestions?
> 
> When I was doing some early work on some of the Octeon multi-core chips, I 
> encountered something similar. If I remember correctly, there was an issue in 
> the shutdown sequence that did not properly halt the cores and set up the 
> "start jump" vector. So the first core would start, and when it tried to 
> start the next ones it would hang waiting for the ACK that they were running 
> (since they didn't have a start vector and hence never started). I know MIPS, 
> not AMD, so I can't say what the equivalent would be, but I'm sure there is 
> one. Check that part, setting up the early state.
> 
> If Juli and/or Adrian are reading this: do you remember anything about that, 
> something like 2 years ago?

That does sound promising, would love more details if anyone can provide them.

Here's another wrinkle:

The KVM machine in question is part of a cluster of identical servers 
(hardware, OS, software revisions). The problem is present on all servers in 
the cluster.

I also have access to a second homogenous cluster. The OS and software 
revisions on this cluster are identical to the first. The hardware is _nearly_ 
identical--slightly different mainboards from the same manufacturer and 
slightly older CPUs. The same VMs (identical disk image and definition, 
including CPU flags passed to the guest) that have a problem on the first 
cluster work flawlessly on this one.

Not sure if that means the bad behavior only appears on certain CPUs or if it's 
timing-related or something else entirely. I'd welcome speculation at this 
point.

CPU details below in case it makes a difference.

== Problem Host ==
model name      : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln 
pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms

== Good Host ==
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm 
tpr_shadow vnmi flexpriority ept vpid

Thanks,

JN

_______________________________________________
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Reply via email to