Hi guys,

Since this issue seems to be Debian related, I'm posting it here, too.

I have a problem with KVM. I try to describe the problem as much detailed as possible so please bear with me.

I have CentOS 6.3 installed on a server with dual Xeon CPU's. CentOS was installed a week ago fresh, no unofficial packages are installed. CentOS is up-to-date. Kernel is the stock kernel: 2.6.32-279.11.1.el6.x86_64 #1 SMP Tue Oct 16 15:57:10 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Motherboard info:
http://www.supermicro.com/products/m...0/X9DRT-HF.cfm

CPU info (we have two of these):
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
http://ark.intel.com/products/64594/...-GTs-Intel-QPI

Generally kvm-qemu works. But my Debian guest OS (Debian "squeeze" with 2.6.32 or 3.2.x.x kernel) won't boot up if I assign more than 4 vcpus to the guest. What happens is the kernel starts to load then it freezes right where it should load the Linux agpgart module. It just hangs there. I can hit enter in the console and the cursor moves down but that's all what I can do besides force-shutoff the guest vm.

If I change the VM's config to have just 4 vcpus then it boots up most of the time. Sometimes it hangs with just 4 vcpus, too. When it hangs, it always hangs at the same point, so it is consistent.

Changing the amount of RAM of the VM also have an effect: with 6GB or less RAM assigned to the VM it most likely boots up fine. With 8GB RAM, however, it will boot up about 50% of the time and it hangs the other 50% of the time. My Debian guest VM *never* booted up with 6 or more vcpus. Again, with 4 vcpus it may or may not boot up but if I add a total of 8GB RAM it will more likely hang at bootup. I don't change anything else between these tries just the number of vcpus or the amount of RAM.

I have tried to use a newer Debian kernel installed from Debian backports (3.2.x.x Linux kernel), but it did not help. What happens is exactly the same thing. I also have other Debian VM images (some of them are raw image file based some of them are LVM block device based) and I can reproduce the problem with all of these Debian VMs.

My problem somewhat resembles this one:
http://forum.parallels.com/showthread.php?t=4882
except that I'm not using parallels at all, I am trying to use kvm-qemu under CentOS 6.3 But this part seems to apply: "I did experience some relief ... by changing the amount of available RAM, but that has no longer prevented the intel-agp kernel panic recently."

I've tried to blacklist the agpgart module inside my guest OS to no avail. Here is what I've added to the end of /etc/modprobe.d/blacklist.conf :
blacklist agpgart
blacklist intel-agp

However, when the VM happens to boot up then I still see a reference to agpgart in dmesg (snippet):

Oct 25 23:27:01 infoglobal kernel: [ 0.864798] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Oct 25 23:27:01 infoglobal kernel: [ 0.865650] pciehp: PCI Express Hot Plug Controller Driver version: 0.4 Oct 25 23:27:01 infoglobal kernel: [ 0.866538] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
Oct 25 23:27:01 infoglobal kernel: [ 0.867484] acpiphp: Slot [1] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.868250] acpiphp: Slot [2] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.868995] acpiphp: Slot [3] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.869750] acpiphp: Slot [4] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.870489] acpiphp: Slot [5] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.871233] acpiphp: Slot [6] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.871959] acpiphp: Slot [7] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.872735] acpiphp: Slot [8] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.873472] acpiphp: Slot [9] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.874222] acpiphp: Slot [10] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.874958] acpiphp: Slot [11] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.875705] acpiphp: Slot [12] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.876494] acpiphp: Slot [13] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.877237] acpiphp: Slot [14] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.877987] acpiphp: Slot [15] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.878749] acpiphp: Slot [16] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.879517] acpiphp: Slot [17] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.880299] acpiphp: Slot [18] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.881042] acpiphp: Slot [19] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.881791] acpiphp: Slot [20] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.882546] acpiphp: Slot [21] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.883292] acpiphp: Slot [22] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.884045] acpiphp: Slot [23] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.884796] acpiphp: Slot [24] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.885545] acpiphp: Slot [25] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.886280] acpiphp: Slot [26] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.887005] acpiphp: Slot [27] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.887758] acpiphp: Slot [28] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.888539] acpiphp: Slot [29] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.889300] acpiphp: Slot [30] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.890037] acpiphp: Slot [31] registered
Oct 25 23:27:01 infoglobal kernel: [ 0.891857] ERST: Table is not found!
Oct 25 23:27:01 infoglobal kernel: [ 0.892590] GHES: HEST is not enabled!
Oct 25 23:27:01 infoglobal kernel: [ 0.893438] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled Oct 25 23:27:01 infoglobal kernel: [ 0.916078] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A Oct 25 23:27:01 infoglobal kernel: [ 1.013996] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A Oct 25 23:27:01 infoglobal kernel: [ 1.016342] Linux agpgart interface v0.103

Please notice the last line above: "Linux agpgart interface v0.103"

When the VM hangs during boot then the last line what is printed to the console is the line just above that: Oct 25 23:27:01 infoglobal kernel: [ 1.013996] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A

This is why I'm suspecting that this problem may have something to do with agpgart...

Anyway, here is some info about the host system:

# virsh capabilities
<capabilities>

<host>
<uuid>00020003-0004-0005-0006-000700080009</uuid>
<cpu>
<arch>x86_64</arch>
<model>SandyBridge</model>
<vendor>Intel</vendor>
<topology sockets='1' cores='6' threads='2'/>
<feature name='pdpe1gb'/>
<feature name='osxsave'/>
<feature name='tsc-deadline'/>
<feature name='dca'/>
<feature name='pdcm'/>
<feature name='xtpr'/>
<feature name='tm2'/>
<feature name='est'/>
<feature name='smx'/>
<feature name='vmx'/>
<feature name='ds_cpl'/>
<feature name='monitor'/>
<feature name='dtes64'/>
<feature name='pbe'/>
<feature name='tm'/>
<feature name='ht'/>
<feature name='ss'/>
<feature name='acpi'/>
<feature name='ds'/>
<feature name='vme'/>
</cpu>
<power_management>
<suspend_disk/>
</power_management>
<migration_features>
<live/>
<uri_transports>
<uri_transport>tcp</uri_transport>
</uri_transports>
</migration_features>
<topology>
<cells num='2'>
<cell id='0'>
<cpus num='12'>
<cpu id='0'/>
<cpu id='1'/>
<cpu id='2'/>
<cpu id='3'/>
<cpu id='4'/>
<cpu id='5'/>
<cpu id='12'/>
<cpu id='13'/>
<cpu id='14'/>
<cpu id='15'/>
<cpu id='16'/>
<cpu id='17'/>
</cpus>
</cell>
<cell id='1'>
<cpus num='12'>
<cpu id='6'/>
<cpu id='7'/>
<cpu id='8'/>
<cpu id='9'/>
<cpu id='10'/>
<cpu id='11'/>
<cpu id='18'/>
<cpu id='19'/>
<cpu id='20'/>
<cpu id='21'/>
<cpu id='22'/>
<cpu id='23'/>
</cpus>
</cell>
</cells>
</topology>
</host>

<guest>
<os_type>hvm</os_type>
<arch name='i686'>
<wordsize>32</wordsize>
<emulator>/usr/libexec/qemu-kvm</emulator>
<machine>rhel6.3.0</machine>
<machine canonical='rhel6.3.0'>pc</machine>
<machine>rhel6.2.0</machine>
<machine>rhel6.1.0</machine>
<machine>rhel6.0.0</machine>
<machine>rhel5.5.0</machine>
<machine>rhel5.4.4</machine>
<machine>rhel5.4.0</machine>
<domain type='qemu'>
</domain>
<domain type='kvm'>
<emulator>/usr/libexec/qemu-kvm</emulator>
</domain>
</arch>
<features>
<cpuselection/>
<deviceboot/>
<pae/>
<nonpae/>
<acpi default='on' toggle='yes'/>
<apic default='on' toggle='no'/>
</features>
</guest>

<guest>
<os_type>hvm</os_type>
<arch name='x86_64'>
<wordsize>64</wordsize>
<emulator>/usr/libexec/qemu-kvm</emulator>
<machine>rhel6.3.0</machine>
<machine canonical='rhel6.3.0'>pc</machine>
<machine>rhel6.2.0</machine>
<machine>rhel6.1.0</machine>
<machine>rhel6.0.0</machine>
<machine>rhel5.5.0</machine>
<machine>rhel5.4.4</machine>
<machine>rhel5.4.0</machine>
<domain type='qemu'>
</domain>
<domain type='kvm'>
<emulator>/usr/libexec/qemu-kvm</emulator>
</domain>
</arch>
<features>
<cpuselection/>
<deviceboot/>
<acpi default='on' toggle='yes'/>
<apic default='on' toggle='no'/>
</features>
</guest>

</capabilities>

And the entire XML config file of the Debian VM:

# cat Debian-Hosting.xml
<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
virsh edit Debian-Hosting
or other application using the libvirt API.
-->

<domain type='kvm'>
<name>Debian-Hosting</name>
<uuid>b641a879-35c4-f5c5-9b9d-1a76aa843389</uuid>
<memory unit='KiB'>8388608</memory>
<currentMemory unit='KiB'>8388608</currentMemory>
<vcpu placement='static' cpuset='6-7,18-19'>4</vcpu>
<os>
<type arch='x86_64' machine='rhel6.3.0'>hvm</type>
<boot dev='hd'/>
<bootmenu enable='no'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<cpu mode='custom' match='exact'>
<model fallback='allow'>SandyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='vme'/>
<feature policy='require' name='tm2'/>
<feature policy='require' name='est'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='osxsave'/>
<feature policy='require' name='smx'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='ds'/>
<feature policy='require' name='tsc-deadline'/>
<feature policy='require' name='dtes64'/>
<feature policy='require' name='ht'/>
<feature policy='require' name='dca'/>
<feature policy='require' name='pbe'/>
<feature policy='require' name='tm'/>
<feature policy='require' name='pdcm'/>
<feature policy='require' name='pdpe1gb'/>
<feature policy='require' name='ds_cpl'/>
<feature policy='require' name='xtpr'/>
<feature policy='require' name='acpi'/>
<feature policy='require' name='monitor'/>
</cpu>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/mapper/vg0_host-lv0_3_debianhost'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
<controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<interface type='bridge'>
<mac address='00:16:36:e2:20:ea'/>
<source bridge='br0'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<serial type='pty'>
<target port='0'/>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes'/>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memballoon>
</devices>
</domain>

As configured above, with 8GB RAM and just 4 vcpus it just booted up - on the second attempt... If I would change the amount of RAM to 6GB it would most likely boot up almost all the time. But if I would add 2 or 4 more vcpus then it would never boot up that's for sure. It always stuck at the same point, right there when the "Linux agpgart interface v0.103" should be displayed by the kernel. I'm talking about the 3.2.x.x kernel installed inside the guest from Debian backport.

With the stock 2.6.32 kernel of Debian it also consistently hangs at the same point (the console log is slightly different using that kernel, though).

And of course when the VM boots up it works just fine. Besides the above problem I never experience any crashes inside or outside the VM. The CentOS host systems seems to be rock solid.

I have spent several days researching this issue on the net to no avail. I really hope that some KVM guru who knows the Linux kernel inside out could help me with this. Any help would be highly appreciated!

Thank you,

Zoltan

PS: At least one other person is experiencing the exact same problem. He has pinned it down to the Debian kernel which runs as the guest. Please see his post about it:
http://lists.centos.org/pipermail/centos-virt/2012-October/003084.html


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5089d3d1.3000...@frombach.com

Reply via email to