[Bug 1297218] Re: guest hangs after live migration due to tsc jump

2015-03-11 Thread Mohammed Gamal
Ping.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1297218

Title:
  guest hangs after live migration due to tsc jump

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1297218] Re: guest hangs after live migration due to tsc jump

2015-01-28 Thread Mohammed Gamal
Hi Serge,
Yes, that's the case. Let me also make it clear that this is a backport on top 
of qemu 1.2 stable.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1297218

Title:
  guest hangs after live migration due to tsc jump

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1297218] Re: guest hangs after live migration due to tsc jump

2015-01-19 Thread Mohammed Gamal
Hi,

I've seen some strange time behavior in some of our VMs usually
triggered by live migration. In some VMs we have seen some significant
time drift which NTP was not able to correct after doing a live
migration.

I've not been able so far to reproduce the same case, however, I did
notice that live migration does introduce some increase in clock jitter
values, and I am not sure if that might have anything to do with any
significant time drift.

Here is an example of a CentOS 6 guest running under qemu 1.2 before
doing a live migration:

[root@centos ~]# ntpq -pcrv
 remote   refid  st t when poll reach   delay   offset  jitter
==
+helium.constant 18.26.4.105  2 u   65   64  377   60.539   -0.011   0.554
-209.118.204.201 128.9.176.30 2 u   47   64  377   15.750   -1.835   0.388
*time3.chpc.utah 198.60.22.2402 u   46   64  377   30.5853.934   0.253
+dns2.untangle.c 216.218.254.202  2 u   21   64  377   22.1962.345   0.740
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version=ntpd 4.2.6p5@1.2349-o Sat Dec 20 02:53:39 UTC 2014 (1),
processor=x86_64, system=Linux/2.6.32-504.3.3.el6.x86_64, leap=00,
stratum=3, precision=-21, rootdelay=32.355, rootdisp=53.173,
refid=155.101.3.115,
reftime=d86264f3.444c75e7  Thu, Jan 15 2015 16:10:27.266,
clock=d86265ed.10a34c1c  Thu, Jan 15 2015 16:14:37.064, peer=3418, tc=6,
mintc=3, offset=0.000, frequency=2.863, sys_jitter=2.024,
clk_jitter=2.283, clk_wander=0.000

[root@centos ~]# ntpdc -c kerninfo
pll offset:   0 s
pll frequency:2.863 ppm
maximum error:0.19838 s
estimated error:  0.002282 s
status:   2001  pll nano
pll time constant:6
precision:1e-09 s
frequency tolerance:  500 ppm

Immediately after live migration, you can see that there is an increase in 
jitter values:
[root@centos ~]# ntpq -pcrv
 remote   refid  st t when poll reach   delay   offset  jitter
==
-helium.constant 18.26.4.105  2 u   59   64  377   60.556   -0.916  31.921
+209.118.204.201 128.9.176.30 2 u   38   64  377   15.717   28.879  12.220
+time3.chpc.utah 132.163.4.1032 u   45   64  353   30.6393.240  26.975
*dns2.untangle.c 216.218.254.202  2 u   17   64  377   22.248   33.039  11.791
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version=ntpd 4.2.6p5@1.2349-o Sat Dec 20 02:53:39 UTC 2014 (1),
processor=x86_64, system=Linux/2.6.32-504.3.3.el6.x86_64, leap=00,
stratum=3, precision=-21, rootdelay=25.086, rootdisp=83.736,
refid=74.123.29.4,
reftime=d8626838.47529689  Thu, Jan 15 2015 16:24:24.278,
clock=d8626849.4920018a  Thu, Jan 15 2015 16:24:41.285, peer=3419, tc=6,
mintc=3, offset=24.118, frequency=11.560, sys_jitter=15.145,
clk_jitter=8.056, clk_wander=2.757

[root@centos ~]# ntpdc -c kerninfo
pll offset:   0.0211957 s
pll frequency:11.560 ppm
maximum error:0.112523 s
estimated error:  0.008055 s
status:   2001  pll nano
pll time constant:6
precision:1e-09 s
frequency tolerance:  500 ppm


The increase in the jitter and offset values is well within the 500 ppm 
frequency tolerance limit, and therefore are easily corrected by subsequent NTP 
clock sync events, but some live migrations do cause much higher jitter and 
offset jumps, which can not be corrected by NTP and cause the time to go way 
off. Any idea why this is the case?

I've tried backporting the patches
(9a48bcd1b82494671c09b0eefdb882581499 and
317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c) on top of upstream qemu 1.2,
but it actually caused even higher jitter in the order of 100+ ppm.

Any idea what I might be missing?

** Patch added: backport.patch
   
https://bugs.launchpad.net/qemu/+bug/1297218/+attachment/4301780/+files/backport.patch

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1297218

Title:
  guest hangs after live migration due to tsc jump

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs