Hi, all
We have a openstack cluster newly installed.  Every vm in the cluster randomly 
freezes(about 2~3 seconds) every few minutes.  When a vm freeze,  ping from 
outside (a physical machine or another vm) will have latency of a few seconds 
or timeout (vs <1ms in normal situation). Also, processes on the vm that does 
not use network (eg. writing data time to a file each second) also stops 
working during those freezeing period (so that we dont have lines in those 
seconds).
On the compute node that runs the vm, we found high cpu usage (often near or 
over 100%) of the qemu process running the vm when it freezes. But inside the 
vm, the cpu utilization remains low all the time. This indicates the cpu time 
is given to qemu to do some busy stuff but not given to its vcpu threads, or it 
is given to the vcpu threads but they do not get into guest mode during that.
We have a very simillar setup of openstack using the same versions of 
openstack/qemu/kvm/host OS/guest OS, but which does not have such freezes. The 
only obvious difference is the "random freezing"compute nodes are Huawei RH 
2288H V2, and the "good" ones are some Dell servers (I can get that info if it 
is important). The CPUs are Xeon E5-2560 and Xeon 5405 respectively, with the 
former having more advanced virtualization support (VT-d and EPT).
The host OS is ubuntu 14.04 LTS (kernel 3.13.0-32-generic), qemu version is 
2.0. It looks the guest OS does not matter (it happens on a few difference 
guest OS's we have tried).
We have only a rough idea it is related to some scheduling problem on the host 
leading to starvation of vcpu threads. There are other freezing problems 
reported on the network that are solved by disabling kvm-clock, but we tried 
that and failed. 
We lack a diagnostic method to identify the root cause. Could anyone give 
suggestions where should we start? Any "suspected fixes" are also welcome.
We                                        

Reply via email to