------- Comment From [email protected] 2016-12-09 14:49 EDT-------
(In reply to comment #27)
> Hi Dave,
>
> I just checked and it seems like there was an automated test regression in
> neutron on amd64. Oddly, it passed on all other architectures, so I'm seeing
> if it's reproducible and if so, will debug.
Hi Nish
If the amd64 regression turns out to be related to the patch you might
want to try removing the patch and simply change the buffer size to
8192. This accomplishes the same results but is far simpler.
/* Our netlink parser */
static int
netlink_parse_info(int (*filter) (struct sockaddr_nl *, struct nlmsghdr *),
@@ -254,7 +273,7 @@ netlink_parse_info(int (*filter) (struct sockaddr_nl *,
struct nlmsghdr *),
int error;
while (1) {
- char buf[4096];
+ char buf[8192];
Thanks for you help
Dave.
--
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to keepalived in Ubuntu.
https://bugs.launchpad.net/bugs/1642763
Title:
keepalived raising VIP apply error
Status in keepalived package in Ubuntu:
Fix Released
Status in keepalived source package in Xenial:
Fix Committed
Status in keepalived source package in Yakkety:
Fix Committed
Bug description:
[Impact]
* keepalived on ppc64el (due to a large page size) experiences
"Netlink: error: message truncated" messages.
* These Netlink truncations result in keepalived thinking that the
the underlying device does not exist, even though it does.
[Test Case]
* Creating 100 veth interfaces ppc64el and verify if "Netlink: error:
message truncated" errors are emitted. If so, the bug is present. If
not, the bug is fixed.
[Regression Potential]
* This is code issue, fixed upstream, in the keepalived code when the
system page size exceeds 4096. The upstream fix was backported to all
releases and should only properly limit the size of the buffer used
for netlink to at most 8192 on systems with a page size greater than
8192. I believe risk of regression is very low.
* Using the tests provided by David Wilder, I ran on both x86_64 and
ppc64el LXD containers. Without the backported changes, I saw no
issues on x86_64, and the reported issue on ppc64el (as expected, as a
page size greater than 4K is required to see the buffer size issue).
With the backported changes, both architectures show no issue with the
provided testcase.
---
== Comment: #0 - Andrew Thorstensen - 2016-11-17 09:50:25 ==
---Problem Description---
Using Ubuntu 16.04 on ppc64le, we are building a 'neutron network node' using
the VRRP configuration (built on keepalived).
Information on this OpenStack configuration can be found here:
https://wiki.openstack.org/wiki/Neutron/L3_High_Availability_VRRP
When we run, the configuration is failing to apply via keepalived.
The logs post the following:
Nov 17 02:58:31 p8test-lp1 Keepalived_vrrp[54542]: VRRP is trying to assign
VIP to unknown qr-a5f5ba96-52 interface !!! go out and fix your conf !!!
However, the device DOES exist. But the keepalived config just
doesn't always deploy it.
ii keepalived 1:1.2.19-1
ppc64el Failover and monitoring daemon for LVS clusters
This configuration sometimes works, but does sometimes fail on Ubuntu
16.04.1
---uname output---
Linux p8test-lp1 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:38:24 UTC 2016
ppc64le ppc64le ppc64le GNU/Linux
---Additional Hardware Info---
This is a Power8 system with Ubuntu 16.04.1 installed. Though we see no
indication that this is specific to Power.
Machine Type = S822L
Machine Type = 8286-42A
---Steps to Reproduce---
Install openstack. Run the network node in a VRRP HA configuration. Create
a router and assign a global IP.
== Comment: #5 - David J. Wilder - 2016-11-17 15:58:04 ==
The problem is fixed in this upstream commit:
https://github.com/acassen/keepalived/commit/9f327bbf3e86def1055a106eda0633638bda0345
On systems with a page size larger than 4096 keepalived may report:
"Netlink: error: message truncated" messages
This error was reported on a ppc64le in an OpenStack/Nutron environment.
Ppc64le is using a 64k pages size. I found that keepalived's netlink recvmsg
buffer was too small causing messages to be truncated. The size of the read
buffer for the netlink socket should be based on page size however, it should
not exceed 8192. See the comment in the patch.
I tested the fix by creating 100 veth interfaces and verifying the errors
did not return.
Signed-off-by: David Wilder <[email protected]>
Signed-off-by: Quentin Armitage <[email protected]>
...
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1642763/+subscriptions
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help : https://help.launchpad.net/ListHelp