** Description changed:

+ [Impact]
+ 
+  * If running a 32 bit kernel (rare these days but still existing for some 
+    upgraders until we full drop it) then /proc/vmstat has 32bit values
+ 
+  * These values can wrap at 32bit and the open-vm-tools will not "realize" 
+    that as they assume only 64bit values.
+ 
+  * That causes "just" a spike in the stats being reported, but due to the 
+    fact that there are higher level e.g. VM placement algorithms at work 
+    consuming those numbers this can trigger a mass migration off that 
+    node which in turn can make everything worse.
+ 
+  * Include the upstream fix to that problem to ensure people are not 
+    affected by it.
+ 
+ [Test Case]
+ 
+  * This is a lot of effort to verify explicitly, but since the change is 
+    small once the test is understood code review will in most cases be 
+    enough.
+    - To trigger the error you'd need a VMWare Guest with 32 bit kernel
+      since i386 is no more mainstream the easiest way to get there is to 
+      install from
+      http://releases.ubuntu.com/16.04/ubuntu-16.04.5-server-i386.iso
+      And then upgrade to Bionic.
+    - then the next thing you'd need to do is to check the stat values
+      to do so you can use the script attached to the bug [1]
+      Run it on the host via:
+      $ python query_vmgueststats.py --vmname <name of the vm> --host 
+        localhost --user root --password <root password>
+      These numbers should never "go crazy" due to the wraparound.
+    - Once all this is set up you'd need to ramp up the numbers of e.g. 
+      pgfaults to cause a wraparound - to do so essentially run a lot of 
+      read I/O
+      This could be done with:
+       $ sudo mkdir /data1
+       $ sudo fio /tmp/seq-read.fio
+      While the config is:
+       $ cat /tmp/seq-read.fio
+ ; Read 4 files with aio at different depths
+ [global]
+ ioengine=libaio
+ buffered=0
+ rw=read
+ bs=128k
+ size=128m
+ directory=/data1
+ iodepth=32
+ direct=1
+ time_based
+ runtime=60s
+ 
+ [file1]
+ 
+ [file2]
+ 
+ [file3]
+ 
+ [file4]
+ 
+      Obviously 60 seconds is not enough, and it is recommended to tune the 
+      path and disk backing to your needs to run as fast as possible.
+ 
+    - At the same time run on the guest
+       $ cat /proc/vmstat  | grep pgpgin
+ 
+    - At some point the numbers of the latter will wrap, without the fix 
+      this will make the vmware observed stats spike to huge values.
+ 
+ 
+ [Regression Potential] 
+ 
+  * Worst case the numbers we try to fix would get worse (due to the new 
+    calculation being wrong). But that would only be "as bad as it is now".
+    Furthermore the code change is rather small.
+    Also 64bit wraparounds are not touched (I wonder why but lets stick to 
+    the upstream code) but that means on 64bit systems (=most systems) this 
+    is a no-op further reducing the risk for an regression.
+ 
+ [Other Info]
+  
+  * taking the change was suggested by VMware who owns the tools as well as 
+    most solutions consuming the stats, so we'd like to follow that 
+    request.
+ 
+ [1]: https://bugs.launchpad.net/ubuntu/+source/open-vm-
+ tools/+bug/1793219/+attachment/5193417/+files/query_vmgueststats.py
+ 
+ 
+ ---
+ 
  Reported at Debian as well, see https://bugs.debian.org/cgi-
  bin/bugreport.cgi?bug=909146 :
  
  There is an unhandled overflow issue in open-vm-tools in the code for
  guest stats reporting. This cause artifacts (spikes) in rate stats, for
  example  "Guest|Page In Rate per second". This issue only affects 32 bit
  builds of open-vm-tools.
  
  We have a fix for 10.3.x at
  
https://github.com/vmware/open-vm-tools/commit/c7a186e204cdff46b5e02bcb5208ef8979eaf261
  
  The fix has also been backported to 10.2.5 in a special branch:
  https://github.com/vmware/open-vm-tools/tree/stable-10.2.5-stat-overflow-fix
  
  Thanks,
  Oliver

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1793219

Title:
  open-vm-tools guest stats overflow

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-vm-tools/+bug/1793219/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to