[vagrant-up] Vagrant + libvirt under VMWare ESX 6.5 Host (nested)

Mihai Fri, 06 Oct 2017 06:01:01 -0700

Hi there,

I'm hopefully in the right place with my question and if yes, then it's 
going to be quite a headache one :)
I have a server (HP DL380G8p)  with 2 x 12 core CPUs (Intel E5-2695) + 768 
GB Ram.
On top of it I run an ESX 6.5 and then my lab experiments.
So: ESX --> Ubuntu/Debian --> vagrant libvirt --> Cumulus VX topology/VMs / 
or just a simple Ubuntu vagrant box
Recently I wanted to give Vagrant a try so:
- I installed an Ubuntu 16.04.3 LTS, then an 17.04 and also a Debian 9.1 
(trying to see if kernel version, other distro packages are having an 
influence on this)
- From my notes so far:
   - libvirt -3.0
   - vagrant 2.0
   - kernel 4.90.3-amd64
- Virtual machine with Linux/Vagrant/Qemu with 64GB Ram reserved for (no 
memory ballooning, etc)
 - I cloned these repositories:  
https://github.com/CumulusNetworks/topology_converter.git 
   and https://github.com/CumulusNetworks/topology_converter.git
- They start with vagrant Linux VMs + I also tried with a simple Vagrant 
Ubuntu box (no special settings)
- All the time after "vagrant up", if I do a "virsh console <id>", I get:
Vagrant:
==> leaf-r01n01: Creating shared folders metadata...
==> leaf-r01n01: Starting domain.
==> leaf-r01n01: Waiting for domain to get an IP address...


Virsh:
Connected to domain topology_converter_leaf-r01n01
Escape character is ^]
[   74.380034] INFO: rcu_sched detected stalls on CPUs/tasks:
[   74.380034] (detected by 0, t=60002 jiffies, g=3030, c=3029, q=23)
[   74.380034] All QSes seen, last rcu_sched kthread activity 60002 
(4294741676-4294681674), jiffies_till_next_fqs=3, root ->qsmask 0x0
[   74.380034] rcu_sched kthread starved for 60002 jiffies!
[  100.132032] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[dhclient:2179]
[  128.132031] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[dhclient:2179]
[  156.132037] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[dhclient:2179]

I suspected this to be just another qemu nested setup problem BUT I do have 
another 2 VMs where I play with qemu and all works fine.
This might mean that it is related to the parameters that vagrant -> 
libvirt -> uses to start qemu/kvm machines.
If it is, then the question is what exactly is causing this behavior?

On top of this, I tried also:
- to disable acpi on boot
- reserve also CPU resources on VMWare
- I checked to be sure that VMWare exported the hardware virtualization 
flags (for nested virtualization)
flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc 
arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf 
pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm epb 
tpr_shadow vnmi ept vpid fsgsbase tsc_adjust smep dtherm ida arat pln pts

Vagrant starts the qemu VM with these parameters:
/usr/bin/qemu-system-x86_64 -name 
guest=topology_converter_leaf-r01n01,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-topology_converter_l/master-key.aes
 
-machine pc-i440fx-zesty,accel=kvm,usb=off,dump-guest-core=off -cpu 
IvyBridge,+ds,+ss,+ht,+vmx,+pcid,+osxsave,+hypervisor,+arat,+tsc_adjust -m 
512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 
7e4de018-2e63-4686-a110-4cec6d587758 -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-topology_converter_l/monitor.sock,server,nowait
 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
-device ahci,id=sata0,bus=pci.0,addr=0x3 -drive 
file=/var/lib/libvirt/images/topology_converter_leaf-r01n01.img,format=qcow2,if=none,id=drive-sata0-0-0
 
-device ide-hd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 
-netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:cd:fe,bus=pci.0,addr=0x5 
-netdev socket,udp=127.0.0.1:9008,localaddr=127.0.0.1:8008,id=hostnet1 
-device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=44:38:39:00:00:0d,bus=pci.0,addr=0x6 
-netdev socket,udp=127.0.0.1:8002,localaddr=127.0.0.1:9002,id=hostnet2 
-device 
virtio-net-pci,netdev=hostnet2,id=net2,mac=44:38:39:00:00:04,bus=pci.0,addr=0x7 
-netdev socket,udp=127.0.0.1:8005,localaddr=127.0.0.1:9005,id=hostnet3 
-device 
virtio-net-pci,netdev=hostnet3,id=net3,mac=44:38:39:00:00:0a,bus=pci.0,addr=0x8 
-netdev socket,udp=127.0.0.1:8004,localaddr=127.0.0.1:9004,id=hostnet4 
-device 
virtio-net-pci,netdev=hostnet4,id=net4,mac=44:38:39:00:00:08,bus=pci.0,addr=0x9 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -k en-us -device 
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on

Now I ran out of ideas and this is driving me crazy already.
Does anyone have any clue as to what else I should try?
I'm afraid I reached a deadend.

Thanks in advance and hoping for a life saving solution to get me unstuck :)



-- 
This mailing list is governed under the HashiCorp Community Guidelines - 
https://www.hashicorp.com/community-guidelines.html. Behavior in violation of 
those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/mitchellh/vagrant/issues
IRC: #vagrant on Freenode
--- 
You received this message because you are subscribed to the Google Groups 
"Vagrant" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vagrant-up/91da599d-0804-4e48-9b79-01545c33ea0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[vagrant-up] Vagrant + libvirt under VMWare ESX 6.5 Host (nested)

Reply via email to