Reviewed: https://review.openstack.org/181674 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b672c26cb42ad3d9a17ed049b506b5622601e891 Submitter: Jenkins Branch: master
commit b672c26cb42ad3d9a17ed049b506b5622601e891 Author: Kevin Benton <[email protected]> Date: Fri Apr 15 06:05:56 2016 -0700 Add provisioning blocks to status ACTIVE transition Sometimes an object requires multiple disjoint actors to complete a set of tasks before the status of the object should be transitioned to ACTIVE. The main example of this is when a port is being created. The L2 agent has to do its business to wire up the VIF, but at the same time the DHCP agent has to setup the DHCP reservation. This led to Nova booting the VM when the L2 agent was done even though the DHCP agent may have been nowhere near ready. This patch introduces a provisioning blocks mechansim that allows the entities to be tracked that need to be involved to make a transition to ACTIVE happen. See the devref in the dependent patch for a high-level view of how this works. The ML2 code is updated to use this new mechanism to prevent updating the port status to ACTIVE without both the DHCP agent and L2 agent reporting that the port is ready. The DHCP RPC API required a version bump to allow the port ready notification. This also adds a devref doc for the provisioning_blocks module with a high-level overview of how it works in addition to a detailed description of how it is used specifically with ML2, the L2 agents, and the DHCP agents. Closes-Bug: #1453350 Change-Id: Id85ff6de1a14a550ab50baf4f79d3130af3680c8 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1453350 Title: race between neutron port create and nova boot Status in neutron: Fix Released Bug description: I am doing load testing with tempest scenario tests and see what I think is a race condition between neutron dhcp standup and nova boot. I believe the scenario I am seeing to be a more general case of https://bugs.launchpad.net/neutron/+bug/1334447. test environment: 5 compute nodes, 1 controller node running all api and neutron services. ubuntu juno hand patched 1382064 and 1385257 and my workaround in 1451492. standard neutron setup otherwise. If I run tempest scenario test test_server_basic_ops 30 times in parallel things consistently work fine. If I increase to 60 in parallel I get lots of failures (see below). Upon investigation, it looks to me that neutron standup of netns and its dnsmasq process is too slow and loses the race with nova boot and the VM comes up without a (dhcp provided) IP address (causing ssh to timeout and fail). Traceback (most recent call last): File "/home/aqua/tempest/tempest/test.py", line 125, in wrapper return f(self, *func_args, **func_kwargs) File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 105, in test_server_basicops self.verify_ssh() File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 95, in verify_ssh private_key=self.keypair['private_key']) File "/home/aqua/tempest/tempest/scenario/manager.py", line 310, in get_remote_client linux_client.validate_authentication() File "/home/aqua/tempest/tempest/common/utils/linux/remote_client.py", line 55, in validate_authentication self.ssh_client.test_connection_auth() File "/home/aqua/tempest/tempest/common/ssh.py", line 150, in test_connection_auth connection = self._get_ssh_connection() File "/home/aqua/tempest/tempest/common/ssh.py", line 87, in _get_ssh_connection password=self.password) tempest.exceptions.SSHTimeout: Connection to the 172.17.205.21 via SSH timed out. User: cirros, Password: None Ran 60 tests in 742.931s FAILED (failures=47) To reproduce test environment: 1) checkout tempest and remove all tempest scenario tests except test_server_basic_ops 2) run this command to make 59 copies of the test: for i in {1..59}; do cp -p test_server_basic_ops.py test_server_basic_ops_$i.py; sed --in-place -e "s/class TestServerBasicOps(manager.ScenarioTest):/class TestServerBasicOps$i(manager.ScenarioTest):/" -e "s/ super(TestServerBasicOps, self).setUp()/ super(TestServerBasicOps$i, self).setUp()/" -e "s/ @test.idempotent_id('7fff3fb3-91d8-4fd0-bd7d-0204f1f180ba')/ @test.idempotent_id(\'$(uuidgen)\')/" test_server_basic_ops_$i.py; done 3) run 30 tests and observe successful run: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=30 4) run 60 tests and observe failures: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=60 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1453350/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

