[Yahoo-eng-team] [Bug 1580648] Re: Two HA routers in master state during functional test

2017-01-31 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/420693
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=8d3f216e2421a01b54a4049c639bdb803df72510
Submitter: Jenkins
Branch:master

commit 8d3f216e2421a01b54a4049c639bdb803df72510
Author: Artur Korzeniewski 
Date:   Fri Jan 27 11:19:16 2017 +0100

Addressing L3 HA keepalived failures in functional tests

Current testing of Keepalived was not configuring the connectivity
between 2 agent namespaces.
Added setting up the veth pair.

Also, bridges external qg- and internal qr- were removed
from agent1 namespace and moved to agent2 namespace, because they had
the same name.
Added patching the qg and qr bridges name creation to be different for
functional tests.


Change-Id: I82b3218091da4feb39a9e820d0e54639ae27c97d
Closes-Bug: #1580648


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1580648

Title:
  Two HA routers in master state during functional test

Status in neutron:
  Fix Released

Bug description:
  Scheduling ha routers end with two routers in master state.
  Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - 
after preparing new functional test.

  ha_router.py in method - _get_state_change_monitor_callback() is
  starting a neutron-keepalived-state-change process with parameter
  --monitor-interface as ha_device (ha-xxx) and it's IP address.

  That application is monitoring using
  "ip netns exec xxx ip -o monitor address"
  all changes in that namespace. Each addition of that ha-xxx device produces a 
call to neutron-server API that this router becomes "master".
  It's producing false results because that device doesn't tell anything about 
that router is master or not.

  Logs from
  test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection

  Agent2:
  2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor 
address]. start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 
'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event 
/neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master 
write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] State: master notify_agent 
/neutron/neutron/agent/l3/keepalived_state_change.py:93

  Agent1:
  2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. 
start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 
'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:19.525 15906 DEBUG 

[Yahoo-eng-team] [Bug 1580648] Re: Two HA routers in master state during functional test

2016-09-25 Thread John Schwarz
This seems like a bug to me. I understand that it stands as a limitation
that keepalived always selects the higher-IP to be master, but then I
would expect the non-higher-IP nodes to revert to backups. If this isn't
the case (as it seems from what Ann and Gustavo write) then this is a
bug.

Reopening.

** Changed in: neutron
   Status: Opinion => Confirmed

** Changed in: neutron
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1580648

Title:
  Two HA routers in master state during functional test

Status in neutron:
  Confirmed

Bug description:
  Scheduling ha routers end with two routers in master state.
  Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - 
after preparing new functional test.

  ha_router.py in method - _get_state_change_monitor_callback() is
  starting a neutron-keepalived-state-change process with parameter
  --monitor-interface as ha_device (ha-xxx) and it's IP address.

  That application is monitoring using
  "ip netns exec xxx ip -o monitor address"
  all changes in that namespace. Each addition of that ha-xxx device produces a 
call to neutron-server API that this router becomes "master".
  It's producing false results because that device doesn't tell anything about 
that router is master or not.

  Logs from
  test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection

  Agent2:
  2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor 
address]. start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 
'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event 
/neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master 
write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] State: master notify_agent 
/neutron/neutron/agent/l3/keepalived_state_change.py:93

  Agent1:
  2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. 
start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 
'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:19.525 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.645 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.927 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-22a4d1e0-ad, fe80::1034:56ff:fe78:2b5d/64, True 

[Yahoo-eng-team] [Bug 1580648] Re: Two HA routers in master state during functional test

2016-09-14 Thread Hirofumi Ichihara
It seems keepalived limitation as Ann said.

** Changed in: neutron
   Status: Confirmed => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1580648

Title:
  Two HA routers in master state during functional test

Status in neutron:
  Opinion

Bug description:
  Scheduling ha routers end with two routers in master state.
  Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - 
after preparing new functional test.

  ha_router.py in method - _get_state_change_monitor_callback() is
  starting a neutron-keepalived-state-change process with parameter
  --monitor-interface as ha_device (ha-xxx) and it's IP address.

  That application is monitoring using
  "ip netns exec xxx ip -o monitor address"
  all changes in that namespace. Each addition of that ha-xxx device produces a 
call to neutron-server API that this router becomes "master".
  It's producing false results because that device doesn't tell anything about 
that router is master or not.

  Logs from
  test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection

  Agent2:
  2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor 
address]. start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 
'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event 
/neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master 
write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] State: master notify_agent 
/neutron/neutron/agent/l3/keepalived_state_change.py:93

  Agent1:
  2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. 
start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 
'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:19.525 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.645 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.927 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-22a4d1e0-ad, fe80::1034:56ff:fe78:2b5d/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:28.543 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-22a4d1e0-ad, 169.254.0.1/24, True parse_and_handle_event 
/neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:28.544 15906 DEBUG 

[Yahoo-eng-team] [Bug 1580648] Re: Two HA routers in master state during functional test

2016-09-12 Thread Gauvain Pocentek
I've faced this problem on a production cluster twice in a few weeks, so
setting the bug status back to 'confirmed'.

2 L3 agents were 'active' for the routers, and 1 inactive (3 nodes
setup).

** Changed in: neutron
   Status: Expired => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1580648

Title:
  Two HA routers in master state during functional test

Status in neutron:
  Confirmed

Bug description:
  Scheduling ha routers end with two routers in master state.
  Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - 
after preparing new functional test.

  ha_router.py in method - _get_state_change_monitor_callback() is
  starting a neutron-keepalived-state-change process with parameter
  --monitor-interface as ha_device (ha-xxx) and it's IP address.

  That application is monitoring using
  "ip netns exec xxx ip -o monitor address"
  all changes in that namespace. Each addition of that ha-xxx device produces a 
call to neutron-server API that this router becomes "master".
  It's producing false results because that device doesn't tell anything about 
that router is master or not.

  Logs from
  test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection

  Agent2:
  2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor 
address]. start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 
'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event 
/neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master 
write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] State: master notify_agent 
/neutron/neutron/agent/l3/keepalived_state_change.py:93

  Agent1:
  2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. 
start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 
'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:19.525 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.645 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.927 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-22a4d1e0-ad, fe80::1034:56ff:fe78:2b5d/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:28.543 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: 

[Yahoo-eng-team] [Bug 1580648] Re: Two HA routers in master state during functional test

2016-08-12 Thread Launchpad Bug Tracker
[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1580648

Title:
  Two HA routers in master state during functional test

Status in neutron:
  Expired

Bug description:
  Scheduling ha routers end with two routers in master state.
  Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - 
after preparing new functional test.

  ha_router.py in method - _get_state_change_monitor_callback() is
  starting a neutron-keepalived-state-change process with parameter
  --monitor-interface as ha_device (ha-xxx) and it's IP address.

  That application is monitoring using
  "ip netns exec xxx ip -o monitor address"
  all changes in that namespace. Each addition of that ha-xxx device produces a 
call to neutron-server API that this router becomes "master".
  It's producing false results because that device doesn't tell anything about 
that router is master or not.

  Logs from
  test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection

  Agent2:
  2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor 
address]. start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 
'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event 
/neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master 
write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] State: master notify_agent 
/neutron/neutron/agent/l3/keepalived_state_change.py:93

  Agent1:
  2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. 
start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 
'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:19.525 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.645 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.927 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-22a4d1e0-ad, fe80::1034:56ff:fe78:2b5d/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:28.543 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-22a4d1e0-ad, 169.254.0.1/24, True parse_and_handle_event 
/neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10