Hi!

We observe in our well aged OpenStack installation (kolla-ansible deployed
victoria release) strange timeouts while creating new instances. It seems
that the feedback from the ovn controller(?) doesn't reach the nova
compute/libvirt process within 300s.

This does not happen all the time, but 1/3 to 1/5 of the launches have
problems.

This is what we see in the logs:

---  nova-compute ---
2021-12-29 07:27:06.972 7 INFO nova.compute.claims
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Claim successful on node compute07
2021-12-29 07:27:07.757 7 INFO nova.virt.libvirt.driver
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Creating image
2021-12-29 07:27:12.514 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Started (Lifecycle Event)
2021-12-29 07:27:12.559 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Paused (Lifecycle Event)
2021-12-29 07:27:12.649 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance
has a pending task (spawning). Skip.
2021-12-29 07:32:12.518 7 WARNING nova.virt.libvirt.driver
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Timeout waiting for
[('network-vif-plugged', 'c641bae9-edcc-4e64-93ec-2dcbf5010009')] for
instance with vm_state building and task_state spawning:
eventlet.timeout.Timeout: 300 seconds
2021-12-29 07:32:12.522 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Resumed (Lifecycle Event)
2021-12-29 07:32:12.526 7 INFO nova.virt.libvirt.driver [-] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Instance spawned successfully.
2021-12-29 07:32:12.527 7 INFO nova.compute.manager
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 304.77 seconds to spawn the
instance on the hypervisor.
2021-12-29 07:32:12.613 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance
has a pending task (spawning). Skip.
2021-12-29 07:32:12.632 7 INFO nova.compute.manager
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 305.70 seconds to build instance.
---  nova-compute ---

--- neutron-server ---
2021-12-29 07:27:09.115 23 INFO neutron.db.ovn_revision_numbers_db
[req-cb418ddf-a068-4511-9fed-f0914945c76d 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] Successfully bumped
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type:
ports) to 1
2021-12-29 07:27:10.054 23 INFO neutron.db.ovn_revision_numbers_db
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840
8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type:
ports) to 2
2021-12-29 07:27:10.393 23 INFO neutron.db.ovn_revision_numbers_db
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840
8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type:
ports) to 2
2021-12-29 07:27:10.415 23 INFO neutron.wsgi
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840
8775aa7d41654c84b8b17b3a570adf4d - default default] 172.16.4.75,172.16.4.74
"PUT /v2.0/ports/c641bae9-edcc-4e64-93ec-2dcbf5010009 HTTP/1.1" status:
200  len: 1258 time: 1.2485251
2021-12-29 07:27:10.556 23 INFO neutron.notifiers.nova [-] Nova event
response: {'name': 'network-changed', 'server_uuid':
'aef0a1b3-99d6-43e2-89db-b7bd4e0516d9', 'tag':
'c641bae9-edcc-4e64-93ec-2dcbf5010009', 'status': 'completed', 'code': 200}
--- neutron-server ---

--- ovn-controller ---
2021-12-29T07:27:11.448Z|00363|binding|INFO|Claiming lport
c641bae9-edcc-4e64-93ec-2dcbf5010009 for this chassis.
2021-12-29T07:27:11.448Z|00364|binding|INFO|c641bae9-edcc-4e64-93ec-2dcbf5010009:
Claiming fa:16:3e:62:2d:8a 192.168.0.237
--- ovn-controller ---

In one case we found an "OVN reports status up for port" with more than 5
hours delay after the initial creation request in the log.

Does anyone have an idea what could be causing this? We are not sure
whether ovn is the culprit or something in between. I didn't find any
useful hints in the all knowing trash heap. But may be I just used the
wrong key words.

Versions used:

ovn-controller 20.03.2
Open vSwitch Library 2.13.3
OpenFlow versions 0x4:0x4

OpenStack Nova 22.2.1

Thanks in advance!

Kind regards,

Christian Stelter
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to