On 29.11.2016 [21:30:21 -0000], bugproxy wrote:
> ------- Comment From wil...@us.ibm.com 2016-11-29 16:21 EDT-------
> (In reply to comment #16)
> > Thanks David.  Asking Nish to take a look at this for you.
> 
> Thanks for your attention to this issue.  This has become an urgent
> issue for our customer.  If you can please provide an ETA when a fix
> will be available.

It will first need to get through the zesty queue (should only take a
few hours) and then the SRU team will need to consider it:
https://wiki.ubuntu.com/StableReleaseUpdates. Once they provide it in
the appropriate -proposed pockets, it can take a week to make it to
-updates, after verification.

Thanks,
Nish


** Description changed:

  [Impact]
  
-  * keepalived on ppc64el (due to a large page size) experiences
+  * keepalived on ppc64el (due to a large page size) experiences
  "Netlink: error: message truncated" messages.
  
-  * These Netlink truncations result in keepalived thinking that the the
+  * These Netlink truncations result in keepalived thinking that the the
  underlying device does not exist, even though it does.
  
  [Test Case]
  
-  * Creating 100 veth interfaces ppc64el and verify if "Netlink: error:
+  * Creating 100 veth interfaces ppc64el and verify if "Netlink: error:
  message truncated" errors are emitted. If so, the bug is present. If
  not, the bug is fixed.
  
  [Regression Potential]
  
-  * This is code issue, fixed upstream, in the keepalived code when the
+  * This is code issue, fixed upstream, in the keepalived code when the
  system page size exceeds 4096. The upstream fix was backported to all
  releases and should only properly limit the size of the buffer used for
  netlink to at most 8192 on systems with a page size greater than 8192. I
  believe risk of regression is very low.
+ 
+  * Using the tests provided by David Wilder, I ran on both x86_64 and
+ ppc64el LXD containers. Without the backported changes, I saw no issues
+ on x86_64, and the reported issue on ppc64el (as expected, as a page
+ size greater than 4K is required to see the buffer size issue). With the
+ backported changes, both architectures show no issue with the provided
+ testcase.
  
  ---
  
  == Comment: #0 - Andrew Thorstensen - 2016-11-17 09:50:25 ==
  
  ---Problem Description---
  Using Ubuntu 16.04 on ppc64le, we are building a 'neutron network node' using 
the VRRP configuration (built on keepalived).
  
  Information on this OpenStack configuration can be found here:
  https://wiki.openstack.org/wiki/Neutron/L3_High_Availability_VRRP
  
  When we run, the configuration is failing to apply via keepalived.
  
  The logs post the following:
  Nov 17 02:58:31 p8test-lp1 Keepalived_vrrp[54542]: VRRP is trying to assign 
VIP to unknown qr-a5f5ba96-52 interface !!! go out and fix your conf !!!
  
  However, the device DOES exist.  But the keepalived config just doesn't
  always deploy it.
  
  ii  keepalived                         1:1.2.19-1
  ppc64el      Failover and monitoring daemon for LVS clusters
  
  This configuration sometimes works, but does sometimes fail on Ubuntu
  16.04.1
  
  ---uname output---
  Linux p8test-lp1 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:38:24 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
  
  ---Additional Hardware Info---
  This is a Power8 system with Ubuntu 16.04.1 installed. Though we see no 
indication that this is specific to Power.
  
  Machine Type = S822L
  
  Machine Type = 8286-42A
  
  ---Steps to Reproduce---
   Install openstack.  Run the network node in a VRRP HA configuration.  Create 
a router and assign a global IP.
  
  == Comment: #5 - David J. Wilder - 2016-11-17 15:58:04 ==
  The problem is fixed in this upstream commit:
  
  
https://github.com/acassen/keepalived/commit/9f327bbf3e86def1055a106eda0633638bda0345
  
  On systems with a page size larger than 4096 keepalived may report:
  
  "Netlink: error: message truncated" messages
  
  This error was reported on a ppc64le in an OpenStack/Nutron environment.
  Ppc64le is using a 64k pages size. I found that keepalived's netlink recvmsg
  buffer was too small causing messages to be truncated. The size of the read
  buffer for the netlink socket should be based on page size however, it should
  not exceed 8192. See the comment in the patch.
  
  I tested the fix by creating 100 veth interfaces and verifying the errors
  did not return.
  
  Signed-off-by: David Wilder <dwil...@us.ibm.com>
  Signed-off-by: Quentin Armitage <quen...@armitage.org.uk>
  ...

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to keepalived in Ubuntu.
https://bugs.launchpad.net/bugs/1642763

Title:
  keepalived raising VIP apply error

Status in keepalived package in Ubuntu:
  Fix Committed
Status in keepalived source package in Xenial:
  In Progress
Status in keepalived source package in Yakkety:
  In Progress

Bug description:
  [Impact]

   * keepalived on ppc64el (due to a large page size) experiences
  "Netlink: error: message truncated" messages.

   * These Netlink truncations result in keepalived thinking that the
  the underlying device does not exist, even though it does.

  [Test Case]

   * Creating 100 veth interfaces ppc64el and verify if "Netlink: error:
  message truncated" errors are emitted. If so, the bug is present. If
  not, the bug is fixed.

  [Regression Potential]

   * This is code issue, fixed upstream, in the keepalived code when the
  system page size exceeds 4096. The upstream fix was backported to all
  releases and should only properly limit the size of the buffer used
  for netlink to at most 8192 on systems with a page size greater than
  8192. I believe risk of regression is very low.

   * Using the tests provided by David Wilder, I ran on both x86_64 and
  ppc64el LXD containers. Without the backported changes, I saw no
  issues on x86_64, and the reported issue on ppc64el (as expected, as a
  page size greater than 4K is required to see the buffer size issue).
  With the backported changes, both architectures show no issue with the
  provided testcase.

  ---

  == Comment: #0 - Andrew Thorstensen - 2016-11-17 09:50:25 ==

  ---Problem Description---
  Using Ubuntu 16.04 on ppc64le, we are building a 'neutron network node' using 
the VRRP configuration (built on keepalived).

  Information on this OpenStack configuration can be found here:
  https://wiki.openstack.org/wiki/Neutron/L3_High_Availability_VRRP

  When we run, the configuration is failing to apply via keepalived.

  The logs post the following:
  Nov 17 02:58:31 p8test-lp1 Keepalived_vrrp[54542]: VRRP is trying to assign 
VIP to unknown qr-a5f5ba96-52 interface !!! go out and fix your conf !!!

  However, the device DOES exist.  But the keepalived config just
  doesn't always deploy it.

  ii  keepalived                         1:1.2.19-1
  ppc64el      Failover and monitoring daemon for LVS clusters

  This configuration sometimes works, but does sometimes fail on Ubuntu
  16.04.1

  ---uname output---
  Linux p8test-lp1 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:38:24 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux

  ---Additional Hardware Info---
  This is a Power8 system with Ubuntu 16.04.1 installed. Though we see no 
indication that this is specific to Power.

  Machine Type = S822L

  Machine Type = 8286-42A

  ---Steps to Reproduce---
   Install openstack.  Run the network node in a VRRP HA configuration.  Create 
a router and assign a global IP.

  == Comment: #5 - David J. Wilder - 2016-11-17 15:58:04 ==
  The problem is fixed in this upstream commit:

  
https://github.com/acassen/keepalived/commit/9f327bbf3e86def1055a106eda0633638bda0345

  On systems with a page size larger than 4096 keepalived may report:

  "Netlink: error: message truncated" messages

  This error was reported on a ppc64le in an OpenStack/Nutron environment.
  Ppc64le is using a 64k pages size. I found that keepalived's netlink recvmsg
  buffer was too small causing messages to be truncated. The size of the read
  buffer for the netlink socket should be based on page size however, it should
  not exceed 8192. See the comment in the patch.

  I tested the fix by creating 100 veth interfaces and verifying the errors
  did not return.

  Signed-off-by: David Wilder <dwil...@us.ibm.com>
  Signed-off-by: Quentin Armitage <quen...@armitage.org.uk>
  ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1642763/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to