Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent.
Bug submitted: https://bugs.launchpad.net/neutron/+bug/1482521 Thanks, Artur From: Oleg Bondarev [mailto:obonda...@mirantis.com] Sent: Thursday, August 6, 2015 5:18 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent. On Thu, Aug 6, 2015 at 5:23 PM, Korzeniewski, Artur artur.korzeniew...@intel.commailto:artur.korzeniew...@intel.com wrote: Thanks Kevin for that hint. But it does not resolve the connectivity problem, it is just not removing the namespace when it is asked to. The real question is, why do we invoke the /neutron/neutron/agent/l3/dvr_fip_ns.py FipNamespace.delete() method in the first place? I’ve captured the traceback for this situation: 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.external_process [-] No process started for 8223e12e-837b-49d4-9793-63603fccbc9f from (pid=70216) disable /opt/openstack/neutron/neutron/agent/linux/external_process.py:113 Traceback (most recent call last): File /usr/local/lib/python2.7/dist-packages/eventlet/queue.py, line 117, in switch self.greenlet.switch(value) File /usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py, line 214, in main result = function(*args, **kwargs) File /usr/local/lib/python2.7/dist-packages/oslo_service/service.py, line 612, in run_service service.start() File /opt/openstack/neutron/neutron/service.py, line 233, in start self.manager.after_start() File /opt/openstack/neutron/neutron/agent/l3/agent.py, line 641, in after_start self.periodic_sync_routers_task(self.context) File /opt/openstack/neutron/neutron/agent/l3/agent.py, line 519, in periodic_sync_routers_task self.fetch_and_sync_all_routers(context, ns_manager) File /opt/openstack/neutron/neutron/agent/l3/namespace_manager.py, line 91, in __exit__ self._cleanup(_ns_prefix, ns_id) File /opt/openstack/neutron/neutron/agent/l3/namespace_manager.py, line 140, in _cleanup ns.delete() File /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py, line 147, in delete raise TypeError(ss) TypeError: ss It seems that the fip namespace is not processed at startup of L3 agent, and the cleanup is removing the namespace… It is also removing the interface to local dvr router connection so… VM has no internet access with floating IP: Command: ['ip', 'netns', 'exec', 'fip-8223e12e-837b-49d4-9793-63603fccbc9f', 'ip', 'link', 'del', u'fpr-fe517b4b-d'] If the interface inside the fip namespace is not deleted, the VM has full internet access without any downtime. Ca we consider it a bug? I guess it is something in startup/full-sync logic since the log is saying: /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid I think yes, we can consider it a bug. Can you please file one? I can take and probably fix it. And after finishing the sync loop, the fip namespace is deleted… Regards, Artur From: Kevin Benton [mailto:blak...@gmail.commailto:blak...@gmail.com] Sent: Thursday, August 6, 2015 7:40 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent. Can you try setting the following to False: https://github.com/openstack/neutron/blob/dc0944f2d4e347922054bba679ba7f5d1ae6ffe2/etc/l3_agent.ini#L97 On Wed, Aug 5, 2015 at 3:36 PM, Korzeniewski, Artur artur.korzeniew...@intel.commailto:artur.korzeniew...@intel.com wrote: Hi all, During testing of Neutron upgrades, I have found that restarting the L3 agent in DVR mode is causing the VM network downtime for configured floating IP. The lockdown is visible when pinging the VM from external network, 2-3 pings are lost. The responsible place in code is: DVR: destroy fip ns: fip-8223e12e-837b-49d4-9793-63603fccbc9f from (pid=156888) delete /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py:164 Can someone explain why the fip namespace is deleted? Can we workout the situation, when there is no downtime of VM access? Artur Korzeniewski Intel Technology Poland sp. z o.o. KRS 101882 ul. Slowackiego 173, 80-298 Gdansk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribehttp://openstack-dev-requ...@lists.openstack.org
Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent.
On Fri, Aug 7, 2015 at 10:24 AM, Korzeniewski, Artur artur.korzeniew...@intel.com wrote: Bug submitted: https://bugs.launchpad.net/neutron/+bug/1482521 Ok, here is the fix: https://review.openstack.org/210539 Thanks! Oleg Thanks, Artur *From:* Oleg Bondarev [mailto:obonda...@mirantis.com] *Sent:* Thursday, August 6, 2015 5:18 PM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent. On Thu, Aug 6, 2015 at 5:23 PM, Korzeniewski, Artur artur.korzeniew...@intel.com wrote: Thanks Kevin for that hint. But it does not resolve the connectivity problem, it is just not removing the namespace when it is asked to. The real question is, why do we invoke the /neutron/neutron/agent/l3/dvr_fip_ns.py FipNamespace.delete() method in the first place? I’ve captured the traceback for this situation: 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.external_process [-] No process started for 8223e12e-837b-49d4-9793-63603fccbc9f from (pid=70216) disable /opt/openstack/neutron/neutron/agent/linux/external_process.py:113 Traceback (most recent call last): File /usr/local/lib/python2.7/dist-packages/eventlet/queue.py, line 117, in switch self.greenlet.switch(value) File /usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py, line 214, in main result = function(*args, **kwargs) File /usr/local/lib/python2.7/dist-packages/oslo_service/service.py, line 612, in run_service service.start() File /opt/openstack/neutron/neutron/service.py, line 233, in start self.manager.after_start() File /opt/openstack/neutron/neutron/agent/l3/agent.py, line 641, in after_start self.periodic_sync_routers_task(self.context) File /opt/openstack/neutron/neutron/agent/l3/agent.py, line 519, in periodic_sync_routers_task self.fetch_and_sync_all_routers(context, ns_manager) File /opt/openstack/neutron/neutron/agent/l3/namespace_manager.py, line 91, in __exit__ self._cleanup(_ns_prefix, ns_id) File /opt/openstack/neutron/neutron/agent/l3/namespace_manager.py, line 140, in _cleanup ns.delete() File /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py, line 147, in delete raise TypeError(ss) TypeError: ss It seems that the fip namespace is not processed at startup of L3 agent, and the cleanup is removing the namespace… It is also removing the interface to local dvr router connection so… VM has no internet access with floating IP: Command: ['ip', 'netns', 'exec', 'fip-8223e12e-837b-49d4-9793-63603fccbc9f', 'ip', 'link', 'del', u'fpr-fe517b4b-d'] If the interface inside the fip namespace is not deleted, the VM has full internet access without any downtime. Ca we consider it a bug? I guess it is something in startup/full-sync logic since the log is saying: /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid I think yes, we can consider it a bug. Can you please file one? I can take and probably fix it. And after finishing the sync loop, the fip namespace is deleted… Regards, Artur *From:* Kevin Benton [mailto:blak...@gmail.com] *Sent:* Thursday, August 6, 2015 7:40 AM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent. Can you try setting the following to False: https://github.com/openstack/neutron/blob/dc0944f2d4e347922054bba679ba7f5d1ae6ffe2/etc/l3_agent.ini#L97 On Wed, Aug 5, 2015 at 3:36 PM, Korzeniewski, Artur artur.korzeniew...@intel.com wrote: Hi all, During testing of Neutron upgrades, I have found that restarting the L3 agent in DVR mode is causing the VM network downtime for configured floating IP. The lockdown is visible when pinging the VM from external network, 2-3 pings are lost. The responsible place in code is: DVR: destroy fip ns: fip-8223e12e-837b-49d4-9793-63603fccbc9f from (pid=156888) delete /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py:164 Can someone explain why the fip namespace is deleted? Can we workout the situation, when there is no downtime of VM access? Artur Korzeniewski Intel Technology Poland sp. z o.o. KRS 101882 ul. Slowackiego 173, 80-298 Gdansk
Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent.
Thanks Kevin for that hint. But it does not resolve the connectivity problem, it is just not removing the namespace when it is asked to. The real question is, why do we invoke the /neutron/neutron/agent/l3/dvr_fip_ns.py FipNamespace.delete() method in the first place? I’ve captured the traceback for this situation: 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.external_process [-] No process started for 8223e12e-837b-49d4-9793-63603fccbc9f from (pid=70216) disable /opt/openstack/neutron/neutron/agent/linux/external_process.py:113 Traceback (most recent call last): File /usr/local/lib/python2.7/dist-packages/eventlet/queue.py, line 117, in switch self.greenlet.switch(value) File /usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py, line 214, in main result = function(*args, **kwargs) File /usr/local/lib/python2.7/dist-packages/oslo_service/service.py, line 612, in run_service service.start() File /opt/openstack/neutron/neutron/service.py, line 233, in start self.manager.after_start() File /opt/openstack/neutron/neutron/agent/l3/agent.py, line 641, in after_start self.periodic_sync_routers_task(self.context) File /opt/openstack/neutron/neutron/agent/l3/agent.py, line 519, in periodic_sync_routers_task self.fetch_and_sync_all_routers(context, ns_manager) File /opt/openstack/neutron/neutron/agent/l3/namespace_manager.py, line 91, in __exit__ self._cleanup(_ns_prefix, ns_id) File /opt/openstack/neutron/neutron/agent/l3/namespace_manager.py, line 140, in _cleanup ns.delete() File /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py, line 147, in delete raise TypeError(ss) TypeError: ss It seems that the fip namespace is not processed at startup of L3 agent, and the cleanup is removing the namespace… It is also removing the interface to local dvr router connection so… VM has no internet access with floating IP: Command: ['ip', 'netns', 'exec', 'fip-8223e12e-837b-49d4-9793-63603fccbc9f', 'ip', 'link', 'del', u'fpr-fe517b4b-d'] If the interface inside the fip namespace is not deleted, the VM has full internet access without any downtime. Ca we consider it a bug? I guess it is something in startup/full-sync logic since the log is saying: /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid And after finishing the sync loop, the fip namespace is deleted… Regards, Artur From: Kevin Benton [mailto:blak...@gmail.com] Sent: Thursday, August 6, 2015 7:40 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent. Can you try setting the following to False: https://github.com/openstack/neutron/blob/dc0944f2d4e347922054bba679ba7f5d1ae6ffe2/etc/l3_agent.ini#L97 On Wed, Aug 5, 2015 at 3:36 PM, Korzeniewski, Artur artur.korzeniew...@intel.commailto:artur.korzeniew...@intel.com wrote: Hi all, During testing of Neutron upgrades, I have found that restarting the L3 agent in DVR mode is causing the VM network downtime for configured floating IP. The lockdown is visible when pinging the VM from external network, 2-3 pings are lost. The responsible place in code is: DVR: destroy fip ns: fip-8223e12e-837b-49d4-9793-63603fccbc9f from (pid=156888) delete /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py:164 Can someone explain why the fip namespace is deleted? Can we workout the situation, when there is no downtime of VM access? Artur Korzeniewski Intel Technology Poland sp. z o.o. KRS 101882 ul. Slowackiego 173, 80-298 Gdansk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribehttp://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent.
On Thu, Aug 6, 2015 at 5:23 PM, Korzeniewski, Artur artur.korzeniew...@intel.com wrote: Thanks Kevin for that hint. But it does not resolve the connectivity problem, it is just not removing the namespace when it is asked to. The real question is, why do we invoke the /neutron/neutron/agent/l3/dvr_fip_ns.py FipNamespace.delete() method in the first place? I’ve captured the traceback for this situation: 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222 2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.external_process [-] No process started for 8223e12e-837b-49d4-9793-63603fccbc9f from (pid=70216) disable /opt/openstack/neutron/neutron/agent/linux/external_process.py:113 Traceback (most recent call last): File /usr/local/lib/python2.7/dist-packages/eventlet/queue.py, line 117, in switch self.greenlet.switch(value) File /usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py, line 214, in main result = function(*args, **kwargs) File /usr/local/lib/python2.7/dist-packages/oslo_service/service.py, line 612, in run_service service.start() File /opt/openstack/neutron/neutron/service.py, line 233, in start self.manager.after_start() File /opt/openstack/neutron/neutron/agent/l3/agent.py, line 641, in after_start self.periodic_sync_routers_task(self.context) File /opt/openstack/neutron/neutron/agent/l3/agent.py, line 519, in periodic_sync_routers_task self.fetch_and_sync_all_routers(context, ns_manager) File /opt/openstack/neutron/neutron/agent/l3/namespace_manager.py, line 91, in __exit__ self._cleanup(_ns_prefix, ns_id) File /opt/openstack/neutron/neutron/agent/l3/namespace_manager.py, line 140, in _cleanup ns.delete() File /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py, line 147, in delete raise TypeError(ss) TypeError: ss It seems that the fip namespace is not processed at startup of L3 agent, and the cleanup is removing the namespace… It is also removing the interface to local dvr router connection so… VM has no internet access with floating IP: Command: ['ip', 'netns', 'exec', 'fip-8223e12e-837b-49d4-9793-63603fccbc9f', 'ip', 'link', 'del', u'fpr-fe517b4b-d'] If the interface inside the fip namespace is not deleted, the VM has full internet access without any downtime. Ca we consider it a bug? I guess it is something in startup/full-sync logic since the log is saying: /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid I think yes, we can consider it a bug. Can you please file one? I can take and probably fix it. And after finishing the sync loop, the fip namespace is deleted… Regards, Artur *From:* Kevin Benton [mailto:blak...@gmail.com] *Sent:* Thursday, August 6, 2015 7:40 AM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent. Can you try setting the following to False: https://github.com/openstack/neutron/blob/dc0944f2d4e347922054bba679ba7f5d1ae6ffe2/etc/l3_agent.ini#L97 On Wed, Aug 5, 2015 at 3:36 PM, Korzeniewski, Artur artur.korzeniew...@intel.com wrote: Hi all, During testing of Neutron upgrades, I have found that restarting the L3 agent in DVR mode is causing the VM network downtime for configured floating IP. The lockdown is visible when pinging the VM from external network, 2-3 pings are lost. The responsible place in code is: DVR: destroy fip ns: fip-8223e12e-837b-49d4-9793-63603fccbc9f from (pid=156888) delete /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py:164 Can someone explain why the fip namespace is deleted? Can we workout the situation, when there is no downtime of VM access? Artur Korzeniewski Intel Technology Poland sp. z o.o. KRS 101882 ul. Slowackiego 173, 80-298 Gdansk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi
[openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent.
Hi all, During testing of Neutron upgrades, I have found that restarting the L3 agent in DVR mode is causing the VM network downtime for configured floating IP. The lockdown is visible when pinging the VM from external network, 2-3 pings are lost. The responsible place in code is: DVR: destroy fip ns: fip-8223e12e-837b-49d4-9793-63603fccbc9f from (pid=156888) delete /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py:164 Can someone explain why the fip namespace is deleted? Can we workout the situation, when there is no downtime of VM access? Artur Korzeniewski Intel Technology Poland sp. z o.o. KRS 101882 ul. Slowackiego 173, 80-298 Gdansk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent.
Thats troubling... We are considering using DVR soon, and we have to restart neutron-openvswitch-agent and openstack-nova-compute periodically go get them to talk to rabbit again Thanks, Kevin From: Korzeniewski, Artur [artur.korzeniew...@intel.com] Sent: Wednesday, August 05, 2015 12:36 PM To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent. Hi all, During testing of Neutron upgrades, I have found that restarting the L3 agent in DVR mode is causing the VM network downtime for configured floating IP. The lockdown is visible when pinging the VM from external network, 2-3 pings are lost. The responsible place in code is: DVR: destroy fip ns: fip-8223e12e-837b-49d4-9793-63603fccbc9f from (pid=156888) delete /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py:164 Can someone explain why the fip namespace is deleted? Can we workout the situation, when there is no downtime of VM access? Artur Korzeniewski Intel Technology Poland sp. z o.o. KRS 101882 ul. Slowackiego 173, 80-298 Gdansk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][dvr] Removing fip namespace when restarting L3 agent.
Can you try setting the following to False: https://github.com/openstack/neutron/blob/dc0944f2d4e347922054bba679ba7f5d1ae6ffe2/etc/l3_agent.ini#L97 On Wed, Aug 5, 2015 at 3:36 PM, Korzeniewski, Artur artur.korzeniew...@intel.com wrote: Hi all, During testing of Neutron upgrades, I have found that restarting the L3 agent in DVR mode is causing the VM network downtime for configured floating IP. The lockdown is visible when pinging the VM from external network, 2-3 pings are lost. The responsible place in code is: DVR: destroy fip ns: fip-8223e12e-837b-49d4-9793-63603fccbc9f from (pid=156888) delete /opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py:164 Can someone explain why the fip namespace is deleted? Can we workout the situation, when there is no downtime of VM access? Artur Korzeniewski Intel Technology Poland sp. z o.o. KRS 101882 ul. Slowackiego 173, 80-298 Gdansk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev