Public bug reported: In case when L3 agents are deployed on compute nodes in dvr_snat agent mode (that is e.g. in CI jobs) and dvr ha is used it may happen that metadata will not be reachable from instances.
For example, as it is in neutron-tempest-dvr-ha-multinode-full job, we have: - controller (all in one) with L3 agent in dvr mode, - compute-1 with L3 agent in dvr_snat mode, - compute-2 with L3 agent in dvr_snat mode. Now, if VM will be scheduled e.g. on host compute-2 and it will be connected to dvr+ha router which is scheduled to be Active on compute-1 and standby on compute-2 node, than on compute-2 metadata haproxy will not be spawned and VM will not be able to reach metadata IP. I found it when I tried to migrate existing legacy neutron-tempest-dvr-ha-multinode-full job to zuulv3. I found that legacy job is in fact "nonHA" job because "l3_ha" option is set there to False and because of that routers are created as nonHA dvr routers. When I switched it to be dvr+ha in https://review.openstack.org/#/c/633979/ I spotted this error described above. Example of failed tests http://logs.openstack.org/79/633979/16/check /neutron-tempest-dvr-ha-multinode-full/710fb3d/job-output.txt.gz - all VMs which SSH wasn't possible, can't reach metadata IP. ** Affects: neutron Importance: Medium Assignee: Slawek Kaplonski (slaweq) Status: Confirmed ** Tags: gate-failure l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1817956 Title: Metadata not reachable when dvr_snat L3 agent is used on compute node Status in neutron: Confirmed Bug description: In case when L3 agents are deployed on compute nodes in dvr_snat agent mode (that is e.g. in CI jobs) and dvr ha is used it may happen that metadata will not be reachable from instances. For example, as it is in neutron-tempest-dvr-ha-multinode-full job, we have: - controller (all in one) with L3 agent in dvr mode, - compute-1 with L3 agent in dvr_snat mode, - compute-2 with L3 agent in dvr_snat mode. Now, if VM will be scheduled e.g. on host compute-2 and it will be connected to dvr+ha router which is scheduled to be Active on compute-1 and standby on compute-2 node, than on compute-2 metadata haproxy will not be spawned and VM will not be able to reach metadata IP. I found it when I tried to migrate existing legacy neutron-tempest-dvr-ha-multinode-full job to zuulv3. I found that legacy job is in fact "nonHA" job because "l3_ha" option is set there to False and because of that routers are created as nonHA dvr routers. When I switched it to be dvr+ha in https://review.openstack.org/#/c/633979/ I spotted this error described above. Example of failed tests http://logs.openstack.org/79/633979/16/check /neutron-tempest-dvr-ha-multinode-full/710fb3d/job-output.txt.gz - all VMs which SSH wasn't possible, can't reach metadata IP. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1817956/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

