Public bug reported:
Hello,
I faced up with issue when IP floating stopped to work for particular project
in OpenStack because of wrong state of HA interfaces.
I have OpenStack-Ansible setup with 3 Neutron containers. IP floating, creating
router interfaces in other OpenStack projects works fine in the same time.
Debug showed that, all HA interfaces dedicated to router inside OpenStack
project have status "standby". Neutron cli command output:
neutron l3-agent-list-hosting-router c71008d3-5685-4e11-b650-7f7f49408643
+--------------------------------------+------------------------------------------+----------------+-------+----------+
| id | host
| admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------------+----------------+-------+----------+
| 44738018-e88c-4358-829a-167502be4f3b |
infra2-neutron-agents-container-6ed576b6 | True | :-) | standby |
| c1d95367-9193-42e0-aa1f-9919d4ad79a3 |
infra1-neutron-agents-container-946dca38 | True | :-) | standby |
| c7023dd7-ce87-4e85-a82f-f725caf649f0 |
infra3-neutron-agents-container-fb07941c | True | :-) | standby |
+--------------------------------------+------------------------------------------+----------------+-------+----------+
root@infra1-utility-container-59becf74:/# openstack port list --router
c71008d3-5685-4e11-b650-7f7f49408643
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
| ID | Name
| MAC Address | Fixed IP Addresses
| Status |
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
| 10227cbb-d5d0-4563-aa34-f501caedc501 |
| fa:16:3e:7d:f8:da | ip_address='x.x.x.8',
subnet_id='4ba80f19-4ef8-431f-8f4f-4fe844f4b673' | ACTIVE |
| 49e60bd7-3661-4c8f-bbbf-ba6942dc960a | HA port tenant
6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:32:59:5f |
ip_address='169.254.192.6', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' |
DOWN |
| 81b8f9ab-3026-44f4-aa1c-59b0a2754ae5 | HA port tenant
6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:e9:ba:50 |
ip_address='169.254.192.8', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' |
DOWN |
| bbe4833f-f67a-40b1-b72d-12a1a825cbd1 | HA port tenant
6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:71:66:85 |
ip_address='169.254.192.5', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' |
DOWN |
| cab8cacb-1abf-4d94-8133-83fbaa67d048 |
| fa:16:3e:f2:f4:db | ip_address='172.16.0.1',
subnet_id='dc422b33-1e45-43c4-8510-8b0f14baf181' | ACTIVE |
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
Router namespace inside containers doesn't have assigned float IPs, router IP
addresses, internal network GW (172.16.0.1):
root@infra1-neutron-agents-container-946dca38:/# ip netns exec
qrouter-c71008d3-5685-4e11-b650-7f7f49408643 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ha-bbe4833f-f6@if137: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
noqueue state UP group default qlen 1000
link/ether fa:16:3e:71:66:85 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.192.5/18 brd 169.254.255.255 scope global ha-bbe4833f-f6
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe71:6685/64 scope link
valid_lft forever preferred_lft forever
3: qr-cab8cacb-1a@if138: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
noqueue state UP group default qlen 1000
link/ether fa:16:3e:f2:f4:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
4: qg-10227cbb-d5@if139: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP group default qlen 1000
link/ether fa:16:3e:7d:f8:da brd ff:ff:ff:ff:ff:ff link-netnsid 0
Keepalived proccess isn't launched for router id
"c71008d3-5685-4e11-b650-7f7f49408643":
root@infra1-neutron-agents-container-946dca38:/# ps auxw | grep
c71008d3-5685-4e11-b650-7f7f49408643
neutron 90394 0.0 0.1 166660 72136 ? S Oct05 0:00
/openstack/venvs/neutron-16.0.1/bin/python
/openstack/venvs/neutron-16.0.1/bin/neutron-keepalived-state-change
--router_id=c71008d3-5685-4e11-b650-7f7f49408643
--namespace=qrouter-c71008d3-5685-4e11-b650-7f7f49408643
--conf_dir=/var/lib/neutron/ha_confs/c71008d3-5685-4e11-b650-7f7f49408643
--monitor_interface=ha-bbe4833f-f6 --monitor_cidr=169.254.0.1/24
--pid_file=/var/lib/neutron/external/pids/c71008d3-5685-4e11-b650-7f7f49408643.monitor.pid
--state_path=/var/lib/neutron --user=999 --group=999
root 103561 0.0 0.0 11284 928 ? S+ 14:14 0:00 grep
c71008d3-5685-4e11-b650-7f7f49408643
Neutron launches Keepalived from configuration folder
"/var/lib/neutron/ha_confs/c71008d3-5685-4e11-b650-7f7f49408643/".
Folder contains 2 files: "keepalived.conf", "state". Neutron usually
update "keepalived.conf" file on router create command but it doens't do
this for router with id "c71008d3-5685-4e11-b650-7f7f49408643". It only
updates "state" file.
I can't provide step-by-step reproduction steps because the trigger of this
problem is unclear for me. According to my research, this error can be fixed by
recreating router but I don't really want to do this because it will not solve
the source of problem. Neutron log output is attached.
I suppose that problem can be in wrong Neutron database records, but I wasn't
able to found what script generates "keepalived.conf". Please, let me know
script/task do this and I will be able to continue debugging.
Thanks for paying attention.
Software description:
OpenStack was deployed via OpenStack-Ansible playbook, Pike 16.0.1, commit
ebe2bc8734845b44c17819c04f2322a2ca7152db.
OpenStack services running inside LXC containers. Neutron server, API, agent,
sceduler are placed in one container.
Linux OS - Ubuntu 16.04.4 LTS, kernel - 4.4.0-134-generic
neutron-keepalived-state-change - 11.0.2.dev2
neutron-l3-agent 11.0.2.dev2
neutron-server 11.0.2.dev2
neutron CLI - 6.5.0
** Affects: neutron
Importance: Undecided
Status: New
** Patch added: "file contains logs from neutron-l3-agent.log, neutron.log ,
neutron-server.log files"
https://bugs.launchpad.net/bugs/1796703/+attachment/5198714/+files/neutron-full.log
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1796703
Title:
HA router interfaces in standby state
Status in neutron:
New
Bug description:
Hello,
I faced up with issue when IP floating stopped to work for particular project
in OpenStack because of wrong state of HA interfaces.
I have OpenStack-Ansible setup with 3 Neutron containers. IP floating,
creating router interfaces in other OpenStack projects works fine in the same
time.
Debug showed that, all HA interfaces dedicated to router inside OpenStack
project have status "standby". Neutron cli command output:
neutron l3-agent-list-hosting-router c71008d3-5685-4e11-b650-7f7f49408643
+--------------------------------------+------------------------------------------+----------------+-------+----------+
| id | host
| admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------------+----------------+-------+----------+
| 44738018-e88c-4358-829a-167502be4f3b |
infra2-neutron-agents-container-6ed576b6 | True | :-) | standby |
| c1d95367-9193-42e0-aa1f-9919d4ad79a3 |
infra1-neutron-agents-container-946dca38 | True | :-) | standby |
| c7023dd7-ce87-4e85-a82f-f725caf649f0 |
infra3-neutron-agents-container-fb07941c | True | :-) | standby |
+--------------------------------------+------------------------------------------+----------------+-------+----------+
root@infra1-utility-container-59becf74:/# openstack port list --router
c71008d3-5685-4e11-b650-7f7f49408643
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
| ID | Name
| MAC Address | Fixed IP Addresses
| Status |
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
| 10227cbb-d5d0-4563-aa34-f501caedc501 |
| fa:16:3e:7d:f8:da | ip_address='x.x.x.8',
subnet_id='4ba80f19-4ef8-431f-8f4f-4fe844f4b673' | ACTIVE |
| 49e60bd7-3661-4c8f-bbbf-ba6942dc960a | HA port tenant
6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:32:59:5f |
ip_address='169.254.192.6', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' |
DOWN |
| 81b8f9ab-3026-44f4-aa1c-59b0a2754ae5 | HA port tenant
6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:e9:ba:50 |
ip_address='169.254.192.8', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' |
DOWN |
| bbe4833f-f67a-40b1-b72d-12a1a825cbd1 | HA port tenant
6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:71:66:85 |
ip_address='169.254.192.5', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' |
DOWN |
| cab8cacb-1abf-4d94-8133-83fbaa67d048 |
| fa:16:3e:f2:f4:db | ip_address='172.16.0.1',
subnet_id='dc422b33-1e45-43c4-8510-8b0f14baf181' | ACTIVE |
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
Router namespace inside containers doesn't have assigned float IPs, router IP
addresses, internal network GW (172.16.0.1):
root@infra1-neutron-agents-container-946dca38:/# ip netns exec
qrouter-c71008d3-5685-4e11-b650-7f7f49408643 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ha-bbe4833f-f6@if137: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
noqueue state UP group default qlen 1000
link/ether fa:16:3e:71:66:85 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.192.5/18 brd 169.254.255.255 scope global ha-bbe4833f-f6
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe71:6685/64 scope link
valid_lft forever preferred_lft forever
3: qr-cab8cacb-1a@if138: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
noqueue state UP group default qlen 1000
link/ether fa:16:3e:f2:f4:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
4: qg-10227cbb-d5@if139: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP group default qlen 1000
link/ether fa:16:3e:7d:f8:da brd ff:ff:ff:ff:ff:ff link-netnsid 0
Keepalived proccess isn't launched for router id
"c71008d3-5685-4e11-b650-7f7f49408643":
root@infra1-neutron-agents-container-946dca38:/# ps auxw | grep
c71008d3-5685-4e11-b650-7f7f49408643
neutron 90394 0.0 0.1 166660 72136 ? S Oct05 0:00
/openstack/venvs/neutron-16.0.1/bin/python
/openstack/venvs/neutron-16.0.1/bin/neutron-keepalived-state-change
--router_id=c71008d3-5685-4e11-b650-7f7f49408643
--namespace=qrouter-c71008d3-5685-4e11-b650-7f7f49408643
--conf_dir=/var/lib/neutron/ha_confs/c71008d3-5685-4e11-b650-7f7f49408643
--monitor_interface=ha-bbe4833f-f6 --monitor_cidr=169.254.0.1/24
--pid_file=/var/lib/neutron/external/pids/c71008d3-5685-4e11-b650-7f7f49408643.monitor.pid
--state_path=/var/lib/neutron --user=999 --group=999
root 103561 0.0 0.0 11284 928 ? S+ 14:14 0:00 grep
c71008d3-5685-4e11-b650-7f7f49408643
Neutron launches Keepalived from configuration folder
"/var/lib/neutron/ha_confs/c71008d3-5685-4e11-b650-7f7f49408643/".
Folder contains 2 files: "keepalived.conf", "state". Neutron usually
update "keepalived.conf" file on router create command but it doens't
do this for router with id "c71008d3-5685-4e11-b650-7f7f49408643". It
only updates "state" file.
I can't provide step-by-step reproduction steps because the trigger of this
problem is unclear for me. According to my research, this error can be fixed by
recreating router but I don't really want to do this because it will not solve
the source of problem. Neutron log output is attached.
I suppose that problem can be in wrong Neutron database records, but I wasn't
able to found what script generates "keepalived.conf". Please, let me know
script/task do this and I will be able to continue debugging.
Thanks for paying attention.
Software description:
OpenStack was deployed via OpenStack-Ansible playbook, Pike 16.0.1, commit
ebe2bc8734845b44c17819c04f2322a2ca7152db.
OpenStack services running inside LXC containers. Neutron server, API, agent,
sceduler are placed in one container.
Linux OS - Ubuntu 16.04.4 LTS, kernel - 4.4.0-134-generic
neutron-keepalived-state-change - 11.0.2.dev2
neutron-l3-agent 11.0.2.dev2
neutron-server 11.0.2.dev2
neutron CLI - 6.5.0
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1796703/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp