** Description changed: [Description] - Configured a machine with 32 static VCPUs, 160GB of RAM using 1G hugepages on a NUMA capable machine. Domain definition (http://pastebin.ubuntu.com/25121106/) - Once started (virsh start). Libvirt log. LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name reproducer2 -S -machine pc- i440fx-2.5,accel=kvm,usb=off -cpu host -m 124928 -realtime mlock=off -smp 32,sockets=16,cores=1,threads=2 -object memory-backend-file,id=ram- node0,prealloc=yes,mem- path=/dev/hugepages/libvirt/qemu,share=yes,size=64424509440,host- nodes=0,policy=bind -numa node,nodeid=0,cpus=0-15,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem- path=/dev/hugepages/libvirt/qemu,share=yes,size=66571993088,host- nodes=1,policy=bind -numa node,nodeid=1,cpus=16-31,memdev=ram-node1 -uuid d7a4af7f-7549-4b44-8ceb-4a6c951388d4 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain- reproducer2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/uvtool/libvirt/images/test.qcow,format=qcow2,if=none,id =drive-virtio-disk0,cache=none -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio- disk0,bootindex=1 -chardev pty,id=charserial0 -device isa- serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus- vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon- pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on Then the following error is raised. virsh start reproducer2 error: Failed to start domain reproducer2 error: monitor socket did not show up: No such file or directory + - The fix is done via backports, as a TL;DR the change does: + 1. instead of sleeping too short (1ms) in a loop for very long start + small but exponentially increase for the few cases that need long. + That way fast actions are done fast, but long actions are no cpu-hogs + 2. huge guests get ~1s per 1Gb extra timeout to come up, that allows + huge guests to initialize properly. + [Impact] - * Cannot start virtual machines with large pools of memory allocated + * Cannot start virtual machines with large pools of memory allocated on NUMA nodes. [Test Case] - * Configure a Machine with at least 2 NUMA nodes. - - root@buneary:/home/ubuntu# virsh freepages 0 1G - 1048576KiB: 60 - - root@buneary:/home/ubuntu# virsh freepages 1 1G - 1048576KiB: 62 - - * Create a guest that uses the full amount of available huge pages (on - this case 122). (full guest definition: - http://paste.ubuntu.com/25125500/) + * this is a tradeoff of memory clearing speed vs guest size. + Once the clearing of guest memory exceeds ~30 seconds the issue will + trigger. + * Guest must be backed by huge pages as otherwise the kernel will fault + in on demand instead of needing the initial clear. + * One way to "slow down" is to Configure a Machine with multiple NUMA + nodes. + root@buneary:/home/ubuntu# virsh freepages 0 1G + 1048576KiB: 60 + root@buneary:/home/ubuntu# virsh freepages 1 1G + 1048576KiB: 62 + * Another one to slow down the init is to just use a really heg guest. In + the example 122G guest was enough. (full guest definition: + http://paste.ubuntu.com/25125500/) <memory unit='GiB'>120</memory> - <currentMemory unit='GiB'>120</currentMemory> + <currentMemory unit='GiB'>120</currentMemory> - <memoryBacking> - <hugepages> - <page size='1' unit='GiB' nodeset='0'/> - <page size='1' unit='GiB' nodeset='1'/> - </hugepages> + <memoryBacking> + <hugepages> + <page size='1' unit='GiB' nodeset='0'/> + <page size='1' unit='GiB' nodeset='1'/> + </hugepages> - </memoryBacking> + </memoryBacking> - <cpu mode='host-passthrough'> + <cpu mode='host-passthrough'> - <topology sockets='16' cores='1' threads='2'/> - <numa> - <cell id='0' cpus='0-15' memory='60' unit='GiB' memAccess='shared'/> - <cell id='1' cpus='16-31' memory='62' unit='GiB' memAccess='shared'/> - </numa> - </cpu> + <topology sockets='16' cores='1' threads='2'/> + <numa> + <cell id='0' cpus='0-15' memory='60' unit='GiB' memAccess='shared'/> + <cell id='1' cpus='16-31' memory='62' unit='GiB' memAccess='shared'/> + </numa> + </cpu> - * Define the guest, and try to start it. + * Define the guest, and try to start it. - $ virsh define reproducer.xml - $ virsh start reproducer + $ virsh define reproducer.xml + $ virsh start reproducer * Verify that the following error is raised: root@buneary:/home/ubuntu# virsh start reproducer2 error: Failed to start domain reproducer2 error: monitor socket did not show up: No such file or directory [Expected Behavior] - * Machine is started without issues as displayed https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1705132/comments/7 - + * Machine is started without issues as displayed + https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1705132/comments/7 [Regression Potential] - * None identified. + * The behavior on timeouts around starting a guest changed. We backported + the fix along with a fix to that new behavior (where guests seemed to + wait forever due to the exponential wait). + Still the "allowed" wait time is increased, but users might expect it + instantly as they are used from their laptop. + Now if one starts a 1TB guest the allowed time is base+1000s. + A user might think a while it is broken or hanging, but there is no way + to avoid that. + OTOH before the fix it would have failed to start after 30 seconds so + not really a regression IMHO. + [Other Info] https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=85af0b803cd19a03f71bd01ab4e045552410368f;hp=67dcb797ed7f1fbb048aa47006576f424923933b
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1705132 Title: Large memory guests, "error: monitor socket did not show up: No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1705132/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
