Public bug reported:
ENV:
Neutron stable/queens (12.0.1)
CentOS 7 (3.10.0-514.26.2.el7.x86_64)
Ceph v10.2.9 Jewel
How to reproduce:
Concurrently create 256 VMs in a single network which has 2 dhcp agents.
Exception:
nova-compute side:
2018-06-25 17:56:09.394 43886 DEBUG nova.compute.manager
[req-22395cd8-2461-411c-9da4-7da1be23e480 78ae27a6ab794fe39e64f57310c15b0e
70f17debbc324d81bbf76aaa2e3c1bd0 - default default] [instance:
c6c1d69a-5ee6-4097-8294-8adb02b49a12] Preparing to wait for external event
network-vif-plugged-6f6962b8-b1b0-48a4-99dd-ae19ec1a0f87
prepare_for_instance_event
/usr/lib/python2.7/site-packages/nova/compute/manager.py:325
2018-06-25 18:01:10.670 43886 WARNING nova.virt.libvirt.driver
[req-679f1393-777d-4b9f-9faa-410e9d7e79b2 78ae27a6ab794fe39e64f57310c15b0e
70f17debbc324d81bbf76aaa2e3c1bd0 - default default] [instance:
c6c1d69a-5ee6-4097-8294-8adb02b49a12] Timeout waiting for
[('network-vif-plugged', u'6f6962b8-b1b0-48a4-99dd-ae19ec1a0f87')] for instance
with vm_state building and task_state spawning.: Timeout: 300 seconds
neutron server log:
A failed nova boot, the neutron port only have one log:
2018-06-25 17:56:06.270 221045 DEBUG neutron.db.provisioning_blocks
[req-4bddb839-e9d2-4188-a95a-9a67990c05c0 fc06c70220e74dfd90ebb516ff0da51d
9ff72ac0624c48c390535d58e8f0b3a1 - default default] Transition to ACTIVE for
port object 6f6962b8-b1b0-48a4-99dd-ae19ec1a0f87 will not be triggered until
provisioned by entity DHCP. add_provisioning_component
/usr/lib/python2.7/site-packages/neutron/db/provisioning_blocks.py:73
It will never get the `Provisioning complete` notification.
Furthermore, a successfully booted nova instance, for the provisioning_blocks,
it also takes 40s for the DHCP:
2018-06-25 18:00:17.180 266883 DEBUG neutron.db.provisioning_blocks
[req-37107d50-4777-4f56-b9a0-eaf9b69c17d9 fc06c70220e74dfd90ebb516ff0da51d
9ff72ac0624c48c390535d58e8f0b3a1 - default default] Transition to ACTIVE for
port object bcf1767c-1b9d-48ca-a5dc-c74587fa35e5 will not be triggered until
provisioned by entity DHCP. add_provisioning_component
/usr/lib/python2.7/site-packages/neutron/db/provisioning_blocks.py:73
2018-06-25 18:00:57.165 266884 DEBUG neutron.db.provisioning_blocks
[req-720231ac-5996-432d-b033-37c340532127 - - - - -] Provisioning for port
bcf1767c-1b9d-48ca-a5dc-c74587fa35e5 completed by entity DHCP.
provisioning_complete
/usr/lib/python2.7/site-packages/neutron/db/provisioning_blocks.py:132
2018-06-25 18:00:57.167 266884 DEBUG neutron.db.provisioning_blocks
[req-720231ac-5996-432d-b033-37c340532127 - - - - -] Provisioning complete for
port bcf1767c-1b9d-48ca-a5dc-c74587fa35e5 triggered by entity DHCP.
provisioning_complete
/usr/lib/python2.7/site-packages/neutron/db/provisioning_blocks.py:138
Code:
For [1], seems a sync action with lock will block all port_update rpc. But for
our test, there is no dhcp sync during the instance boot.
For [2], this lock is the essential issue, all the port update are coming from
same network, some of them will not get the lock for more than 300 seconds.
[1]
https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L424
[2]
https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L428
Potential solution:
Add more dhcp agents for the network, for our test settings, add
dhcp_agents_per_network = 10, then it works.
But, such config is really not so good for the production envrionment.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1778616
Title:
DHCP agent block network-vif-plugged cause Instance failed to build
Status in neutron:
New
Bug description:
ENV:
Neutron stable/queens (12.0.1)
CentOS 7 (3.10.0-514.26.2.el7.x86_64)
Ceph v10.2.9 Jewel
How to reproduce:
Concurrently create 256 VMs in a single network which has 2 dhcp agents.
Exception:
nova-compute side:
2018-06-25 17:56:09.394 43886 DEBUG nova.compute.manager
[req-22395cd8-2461-411c-9da4-7da1be23e480 78ae27a6ab794fe39e64f57310c15b0e
70f17debbc324d81bbf76aaa2e3c1bd0 - default default] [instance:
c6c1d69a-5ee6-4097-8294-8adb02b49a12] Preparing to wait for external event
network-vif-plugged-6f6962b8-b1b0-48a4-99dd-ae19ec1a0f87
prepare_for_instance_event
/usr/lib/python2.7/site-packages/nova/compute/manager.py:325
2018-06-25 18:01:10.670 43886 WARNING nova.virt.libvirt.driver
[req-679f1393-777d-4b9f-9faa-410e9d7e79b2 78ae27a6ab794fe39e64f57310c15b0e
70f17debbc324d81bbf76aaa2e3c1bd0 - default default] [instance:
c6c1d69a-5ee6-4097-8294-8adb02b49a12] Timeout waiting for
[('network-vif-plugged', u'6f6962b8-b1b0-48a4-99dd-ae19ec1a0f87')] for instance
with vm_state building and task_state spawning.: Timeout: 300 seconds
neutron server log:
A failed nova boot, the neutron port only have one log:
2018-06-25 17:56:06.270 221045 DEBUG neutron.db.provisioning_blocks
[req-4bddb839-e9d2-4188-a95a-9a67990c05c0 fc06c70220e74dfd90ebb516ff0da51d
9ff72ac0624c48c390535d58e8f0b3a1 - default default] Transition to ACTIVE for
port object 6f6962b8-b1b0-48a4-99dd-ae19ec1a0f87 will not be triggered until
provisioned by entity DHCP. add_provisioning_component
/usr/lib/python2.7/site-packages/neutron/db/provisioning_blocks.py:73
It will never get the `Provisioning complete` notification.
Furthermore, a successfully booted nova instance, for the
provisioning_blocks, it also takes 40s for the DHCP:
2018-06-25 18:00:17.180 266883 DEBUG neutron.db.provisioning_blocks
[req-37107d50-4777-4f56-b9a0-eaf9b69c17d9 fc06c70220e74dfd90ebb516ff0da51d
9ff72ac0624c48c390535d58e8f0b3a1 - default default] Transition to ACTIVE for
port object bcf1767c-1b9d-48ca-a5dc-c74587fa35e5 will not be triggered until
provisioned by entity DHCP. add_provisioning_component
/usr/lib/python2.7/site-packages/neutron/db/provisioning_blocks.py:73
2018-06-25 18:00:57.165 266884 DEBUG neutron.db.provisioning_blocks
[req-720231ac-5996-432d-b033-37c340532127 - - - - -] Provisioning for port
bcf1767c-1b9d-48ca-a5dc-c74587fa35e5 completed by entity DHCP.
provisioning_complete
/usr/lib/python2.7/site-packages/neutron/db/provisioning_blocks.py:132
2018-06-25 18:00:57.167 266884 DEBUG neutron.db.provisioning_blocks
[req-720231ac-5996-432d-b033-37c340532127 - - - - -] Provisioning complete for
port bcf1767c-1b9d-48ca-a5dc-c74587fa35e5 triggered by entity DHCP.
provisioning_complete
/usr/lib/python2.7/site-packages/neutron/db/provisioning_blocks.py:138
Code:
For [1], seems a sync action with lock will block all port_update rpc. But
for our test, there is no dhcp sync during the instance boot.
For [2], this lock is the essential issue, all the port update are coming
from same network, some of them will not get the lock for more than 300 seconds.
[1]
https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L424
[2]
https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L428
Potential solution:
Add more dhcp agents for the network, for our test settings, add
dhcp_agents_per_network = 10, then it works.
But, such config is really not so good for the production envrionment.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1778616/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp