Public bug reported:

External process monitor for keepalived state change should be removed when the 
External Gateway is removed for DVR HA routers.
We have seen under certain conditions when the SNAT namespace is missing, the 
External process Monitor is try to respawn the keepalived state change monitor 
process within the namespace.
But the External process monitor does not check for the SNAT namespace and it 
is up to the process that calls it.

The 'delete' ha-router takes care of cleaning the external process
monitor subscription for the keepalived state change, but the external
gateway remove function is not calling this function.

This is how I was able to reproduce the problem.

But this is how I was able to reproduce.
Create HA/DVR routers
Delete the SNAT Namespace of the routers.
Also delete the PID files for the ip_monitor under 
/opt/stack/data/neutron/external/pids/ip_monitor pid

Once deleted I was able to see the log message in the neutron-l3.service
logs.

`
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: ERROR 
neutron.agent.linux.external_process [-] ip_monitor for router with uuid
04fabe76-9316-4270-a99f-4f0ccffb8feb not found. The process should not have died
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: WARNING 
neutron.agent.linux.external_process [-] Respawning ip_monitor for uui
d 04fabe76-9316-4270-a99f-4f0ccffb8feb
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: DEBUG 
neutron.agent.linux.utils [-] Unable to access /opt/stack/data/neutron/e
xternal/pids/04fabe76-9316-4270-a99f-4f0ccffb8feb.monitor.pid {{(pid=12153) 
get_value_from_file /opt/stack/neutron/neutron/agent/linux/utils
.py:250}}
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: DEBUG 
neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip',
'netns', 'exec', 'snat-04fabe76-9316-4270-a99f-4f0ccffb8feb', 
'neutron-keepalived-state-change', '--router_id=04fabe76-9316-4270-a99f-4f0ccf
fb8feb', '--namespace=snat-04fabe76-9316-4270-a99f-4f0ccffb8feb', 
'--conf_dir=/opt/stack/data/neutron/ha_confs/04fabe76-9316-4270-a99f-4f0cc
ffb8feb', '--monitor_interface=ha-4af17105-bd', 
'--monitor_cidr=169.254.0.1/24', 
'--pid_file=/opt/stack/data/neutron/external/pids/04fabe76-
9316-4270-a99f-4f0ccffb8feb.monitor.pid', 
'--state_path=/opt/stack/data/neutron', '--user=1000', '--group=1004'] 
{{(pid=12153) execute_rootw
rap_daemon /opt/stack/neutron/neutron/agent/linux/utils.py:103}}
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: ERROR 
neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: C
annot open network namespace "snat-04fabe76-9316-4270-a99f-4f0ccffb8feb": No 
such file or directory
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]:
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: DEBUG 
oslo_concurrency.lockutils [-] Lock "_check_child_processes" released by
"neutron.agent.linux.external_process._check_child_processes" :: held 0.007s 
{{(pid=12153) inner /usr/local/lib/python2.7/dist-packages/osl
o_concurrency/lockutils.py:285}}
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: Traceback (most 
recent call last):
Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: File 
"/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 460
, in fire_timers

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1815676

Title:
  DVR: External process monitor for keepalived should be removed when
  external gateway is removed for DVR HA routers

Status in neutron:
  New

Bug description:
  External process monitor for keepalived state change should be removed when 
the External Gateway is removed for DVR HA routers.
  We have seen under certain conditions when the SNAT namespace is missing, the 
External process Monitor is try to respawn the keepalived state change monitor 
process within the namespace.
  But the External process monitor does not check for the SNAT namespace and it 
is up to the process that calls it.

  The 'delete' ha-router takes care of cleaning the external process
  monitor subscription for the keepalived state change, but the external
  gateway remove function is not calling this function.

  This is how I was able to reproduce the problem.

  But this is how I was able to reproduce.
  Create HA/DVR routers
  Delete the SNAT Namespace of the routers.
  Also delete the PID files for the ip_monitor under 
/opt/stack/data/neutron/external/pids/ip_monitor pid

  Once deleted I was able to see the log message in the
  neutron-l3.service logs.

  `
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: ERROR 
neutron.agent.linux.external_process [-] ip_monitor for router with uuid
  04fabe76-9316-4270-a99f-4f0ccffb8feb not found. The process should not have 
died
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: WARNING 
neutron.agent.linux.external_process [-] Respawning ip_monitor for uui
  d 04fabe76-9316-4270-a99f-4f0ccffb8feb
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: DEBUG 
neutron.agent.linux.utils [-] Unable to access /opt/stack/data/neutron/e
  xternal/pids/04fabe76-9316-4270-a99f-4f0ccffb8feb.monitor.pid {{(pid=12153) 
get_value_from_file /opt/stack/neutron/neutron/agent/linux/utils
  .py:250}}
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: DEBUG 
neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip',
  'netns', 'exec', 'snat-04fabe76-9316-4270-a99f-4f0ccffb8feb', 
'neutron-keepalived-state-change', '--router_id=04fabe76-9316-4270-a99f-4f0ccf
  fb8feb', '--namespace=snat-04fabe76-9316-4270-a99f-4f0ccffb8feb', 
'--conf_dir=/opt/stack/data/neutron/ha_confs/04fabe76-9316-4270-a99f-4f0cc
  ffb8feb', '--monitor_interface=ha-4af17105-bd', 
'--monitor_cidr=169.254.0.1/24', 
'--pid_file=/opt/stack/data/neutron/external/pids/04fabe76-
  9316-4270-a99f-4f0ccffb8feb.monitor.pid', 
'--state_path=/opt/stack/data/neutron', '--user=1000', '--group=1004'] 
{{(pid=12153) execute_rootw
  rap_daemon /opt/stack/neutron/neutron/agent/linux/utils.py:103}}
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: ERROR 
neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: C
  annot open network namespace "snat-04fabe76-9316-4270-a99f-4f0ccffb8feb": No 
such file or directory
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]:
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: DEBUG 
oslo_concurrency.lockutils [-] Lock "_check_child_processes" released by
  "neutron.agent.linux.external_process._check_child_processes" :: held 0.007s 
{{(pid=12153) inner /usr/local/lib/python2.7/dist-packages/osl
  o_concurrency/lockutils.py:285}}
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: Traceback (most 
recent call last):
  Oct 04 23:43:39 ubuntu-18-ctlr-rocky neutron-l3-agent[12153]: File 
"/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 460
  , in fire_timers

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1815676/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to