Public bug reported:

Environment:
OpenStack Zed
OpenvSwitch 2.17
OVN 22.06
ML2/OVN driver


Using following command create instances:
# openstack server create test_vm --flavor XXXXXX --network XXXXXX --image 
XXXXXX --min 10 --max 10 --availability-zone nova:XXXXXX

There maybe at least 2 or 3 instances created failed because of vif
plugging timeout error.

VM port UUID: 6d5eccfa-069e-4058-a1c8-87bec9c1c280

Here is some logs info:

1. ovsdb server log:
record 554: 2023-05-19 10:48:42.399
  table Interface insert row "tap6d5eccfa-06" (8d7bdf97):
  table Open_vSwitch row 1a4db534 (1a4db534) diff:
  table Bridge row "br-int" (ee9ab5d4) diff:
  table Port insert row "tap6d5eccfa-06" (a4be79ec):

record 555: 2023-05-19 10:48:42.411
  table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
  table Open_vSwitch row 1a4db534 (1a4db534) diff:

record 556: 2023-05-19 10:48:42.909
  table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:

record 557: 2023-05-19 10:48:42.941 "ovs-vsctl (invoked by init (pid 0)): 
ovs-vsctl del-port tap6d5eccfa-06"
  table Open_vSwitch row 1a4db534 (1a4db534) diff:
  table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
    delete row
  table Bridge row "br-int" (ee9ab5d4) diff:
  table Port row "tap6d5eccfa-06" (a4be79ec) diff:
    delete row

2. ovs-vswitchd log:
2023-05-19T10:48:42.400Z|13247|jsonrpc|DBG|unix:/run/openvswitch/db.sock: 
received notification, method="update3", 
params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Open_vSwitch":{"1a4db534-3133-43e0-922e-dd7d5d76a802":{"modify":{"next_cfg":135}}},"Interface":{"8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2":{"insert":{"name":"tap6d5eccfa-06"}}},"Port":{"a4be79ec-df95-495b-87e1-fc373582f647":{"insert":{"name":"tap6d5eccfa-06","interfaces":["uuid","8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2"]}}},"Bridge":{"ee9ab5d4-b72b-479b-87cd-29a19d5540a8":{"modify":{"ports":["uuid","a4be79ec-df95-495b-87e1-fc373582f647"]}}}}]
2023-05-19T10:48:42.410Z|13248|bridge|WARN|could not open network device 
tap6d5eccfa-06 (No such device)
2023-05-19T10:48:42.411Z|13249|jsonrpc|DBG|unix:/run/openvswitch/db.sock: send 
request, method="transact", 
params=["Open_vSwitch",{"row":{"cur_cfg":135},"where":[["_uuid","==",["uuid","1a4db534-3133-43e0-922e-dd7d5d76a802"]]],"table":"Open_vSwitch","op":"update"},{"row":{"error":"could
 not open network device tap6d5eccfa-06 (No such 
device)","ofport":-1},"where":[["_uuid","==",["uuid","8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2"]]],"table":"Interface","op":"update"},{"op":"assert","lock":"ovs_vswitchd"}],
 id=4787
2023-05-19T10:48:42.412Z|13250|poll_loop|DBG|wakeup due to [POLLIN] on fd 13 
(<->/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (4% CPU usage)
2023-05-19T10:48:42.412Z|13251|jsonrpc|DBG|unix:/run/openvswitch/db.sock: 
received notification, method="update3", 
params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Interface":{"8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2":{"modify":{"error":"could
 not open network device tap6d5eccfa-06 (No such 
device)","ofport":-1}}},"Open_vSwitch":{"1a4db534-3133-43e0-922e-dd7d5d76a802":{"modify":{"cur_cfg":135}}}}]

2023-05-19T10:48:42.907Z|13260|vconn|DBG|unix#50: sent (Success): 
OFPT_PORT_STATUS (OF1.5) (xid=0x0): ADD: 57(tap6d5eccfa-06): 
addr:fe:16:3e:dc:bc:b8
     config: 0
     state: 0
     current: 10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
2023-05-19T10:48:42.907Z|13261|bridge|INFO|bridge br-int: added interface 
tap6d5eccfa-06 on port 57
2023-05-19T10:48:42.908Z|13262|hmap|DBG|Dropped 74 log messages in last 6 
seconds (most recently, 1 seconds ago) due to excessive rate
2023-05-19T10:48:42.908Z|13263|hmap|DBG|../ofproto/ofproto-dpif-xlate.c:884: 1 
bucket with 6+ nodes, including 1 bucket with 7 nodes (16 nodes total across 16 
buckets)
2023-05-19T10:48:42.908Z|13264|dpif_netlink|DBG|port_changed: 
dpif:system@ovs-system vport:tap6d5eccfa-06 cmd:1
2023-05-19T10:48:42.908Z|13265|vconn|DBG|unix#50: sent (Success): 
OFPT_PORT_STATUS (OF1.5) (xid=0x0): MOD: 57(tap6d5eccfa-06): 
addr:fe:16:3e:dc:bc:b8
     config: 0
     state: LIVE
     current: 10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max

2023-05-19T10:48:42.971Z|13289|bridge|INFO|bridge br-int: deleted interface 
tap6d5eccfa-06 on port 57
2023-05-19T10:48:42.972Z|04671|poll_loop(revalidator62)|DBG|wakeup due to 
[POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.972Z|04651|poll_loop(revalidator63)|DBG|wakeup due to 
[POLLIN] on fd 45 (FIFO pipe:[154648]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.975Z|04672|poll_loop(revalidator62)|DBG|wakeup due to 
[POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.975Z|13290|dpif|DBG|system@ovs-system: failed to query port 
tap6d5eccfa-06: No such device
2023-05-19T10:48:42.982Z|07648|poll_loop(revalidator61)|DBG|wakeup due to 
[POLLIN] on fd 41 (FIFO pipe:[155716]) at ../lib/ovs-thread.c:378 (10% CPU 
usage)
2023-05-19T10:48:42.984Z|04719|poll_loop(revalidator65)|DBG|wakeup due to 
[POLLIN] on fd 49 (FIFO pipe:[149813]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.984Z|04589|poll_loop(revalidator64)|DBG|wakeup due to 
[POLLIN] on fd 47 (FIFO pipe:[154649]) at ../lib/ovs-thread.c:378 (1% CPU usage)
2023-05-19T10:48:42.985Z|13291|poll_loop|DBG|wakeup due to [POLLIN] on fd 56 
(FIFO pipe:[136293]) at ../lib/ovs-rcu.c:236 (2% CPU usage)
2023-05-19T10:48:42.999Z|13292|poll_loop|DBG|wakeup due to [POLLIN] on fd 56 
(FIFO pipe:[136293]) at ../lib/ovs-rcu.c:236 (2% CPU usage)
2023-05-19T10:48:42.999Z|13293|dpif_netlink|DBG|port_changed: 
dpif:system@ovs-system vport:tap6d5eccfa-06 cmd:2
2023-05-19T10:48:42.999Z|13294|dpif|DBG|system@ovs-system: failed to query port 
tap6d5eccfa-06: No such device

3. nova-compute log:
2023-05-19 18:48:42.416 7 INFO os_vif [None 
req-6f1ad586-b66d-4fb6-8603-ba96af2af311 bb05576c972b49b4ac598cb33cf151f2 
af89a2f86b204b54af86df7e5bb6a841 - - default default] Successfully plugged vif 
VIFOpenVSwitch(active=False,address=fa:16:3e:dc:bc:b8,bridge_name='br-int',has_traffic_filtering=True,id=6d5eccfa-069e-4058-a1c8-87bec9c1c280,network=Network(19771f61-1003-46a3-8850-001b32947f66),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap6d5eccfa-06')

2023-05-19 18:58:45.827 7 WARNING nova.compute.manager [None
req-6f1ad586-b66d-4fb6-8603-ba96af2af311
bb05576c972b49b4ac598cb33cf151f2 af89a2f86b204b54af86df7e5bb6a841 - -
default default] [instance: 1cfbb245-c807-48e9-90eb-2c4f77e7a7b3]
Timeout waiting for ['network-vif-
plugged-6d5eccfa-069e-4058-a1c8-87bec9c1c280'] for instance with
vm_state building and task_state spawning. Event states are: network-
vif-plugged-6d5eccfa-069e-4058-a1c8-87bec9c1c280: timed out after 600.00
seconds: eventlet.timeout.Timeout: 600 seconds

It seems that ovs successfully add tap device(tap6d5eccfa-06) in br-int,
then quickly delete the device.

In Nova, it wait for network-vif-plugged event, but tap device not
present in br-int bridge, so create instance failed after 600 secs(we
modify the default timeout 300s).

Another Launchpad Bug: https://bugs.launchpad.net/neutron/+bug/2005111
have the same problem.


After change ML2/OVN driver to ML2/OVS driver, concurrent create VM also fails 
with vif plug timeout.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2020328

Title:
  Concurrent create VM failed because of vif plug timeout

Status in neutron:
  New

Bug description:
  Environment:
  OpenStack Zed
  OpenvSwitch 2.17
  OVN 22.06
  ML2/OVN driver

  
  Using following command create instances:
  # openstack server create test_vm --flavor XXXXXX --network XXXXXX --image 
XXXXXX --min 10 --max 10 --availability-zone nova:XXXXXX

  There maybe at least 2 or 3 instances created failed because of vif
  plugging timeout error.

  VM port UUID: 6d5eccfa-069e-4058-a1c8-87bec9c1c280

  Here is some logs info:

  1. ovsdb server log:
  record 554: 2023-05-19 10:48:42.399
    table Interface insert row "tap6d5eccfa-06" (8d7bdf97):
    table Open_vSwitch row 1a4db534 (1a4db534) diff:
    table Bridge row "br-int" (ee9ab5d4) diff:
    table Port insert row "tap6d5eccfa-06" (a4be79ec):

  record 555: 2023-05-19 10:48:42.411
    table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
    table Open_vSwitch row 1a4db534 (1a4db534) diff:

  record 556: 2023-05-19 10:48:42.909
    table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:

  record 557: 2023-05-19 10:48:42.941 "ovs-vsctl (invoked by init (pid 0)): 
ovs-vsctl del-port tap6d5eccfa-06"
    table Open_vSwitch row 1a4db534 (1a4db534) diff:
    table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
      delete row
    table Bridge row "br-int" (ee9ab5d4) diff:
    table Port row "tap6d5eccfa-06" (a4be79ec) diff:
      delete row

  2. ovs-vswitchd log:
  2023-05-19T10:48:42.400Z|13247|jsonrpc|DBG|unix:/run/openvswitch/db.sock: 
received notification, method="update3", 
params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Open_vSwitch":{"1a4db534-3133-43e0-922e-dd7d5d76a802":{"modify":{"next_cfg":135}}},"Interface":{"8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2":{"insert":{"name":"tap6d5eccfa-06"}}},"Port":{"a4be79ec-df95-495b-87e1-fc373582f647":{"insert":{"name":"tap6d5eccfa-06","interfaces":["uuid","8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2"]}}},"Bridge":{"ee9ab5d4-b72b-479b-87cd-29a19d5540a8":{"modify":{"ports":["uuid","a4be79ec-df95-495b-87e1-fc373582f647"]}}}}]
  2023-05-19T10:48:42.410Z|13248|bridge|WARN|could not open network device 
tap6d5eccfa-06 (No such device)
  2023-05-19T10:48:42.411Z|13249|jsonrpc|DBG|unix:/run/openvswitch/db.sock: 
send request, method="transact", 
params=["Open_vSwitch",{"row":{"cur_cfg":135},"where":[["_uuid","==",["uuid","1a4db534-3133-43e0-922e-dd7d5d76a802"]]],"table":"Open_vSwitch","op":"update"},{"row":{"error":"could
 not open network device tap6d5eccfa-06 (No such 
device)","ofport":-1},"where":[["_uuid","==",["uuid","8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2"]]],"table":"Interface","op":"update"},{"op":"assert","lock":"ovs_vswitchd"}],
 id=4787
  2023-05-19T10:48:42.412Z|13250|poll_loop|DBG|wakeup due to [POLLIN] on fd 13 
(<->/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (4% CPU usage)
  2023-05-19T10:48:42.412Z|13251|jsonrpc|DBG|unix:/run/openvswitch/db.sock: 
received notification, method="update3", 
params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Interface":{"8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2":{"modify":{"error":"could
 not open network device tap6d5eccfa-06 (No such 
device)","ofport":-1}}},"Open_vSwitch":{"1a4db534-3133-43e0-922e-dd7d5d76a802":{"modify":{"cur_cfg":135}}}}]

  2023-05-19T10:48:42.907Z|13260|vconn|DBG|unix#50: sent (Success): 
OFPT_PORT_STATUS (OF1.5) (xid=0x0): ADD: 57(tap6d5eccfa-06): 
addr:fe:16:3e:dc:bc:b8
       config: 0
       state: 0
       current: 10MB-FD COPPER
       speed: 10 Mbps now, 0 Mbps max
  2023-05-19T10:48:42.907Z|13261|bridge|INFO|bridge br-int: added interface 
tap6d5eccfa-06 on port 57
  2023-05-19T10:48:42.908Z|13262|hmap|DBG|Dropped 74 log messages in last 6 
seconds (most recently, 1 seconds ago) due to excessive rate
  2023-05-19T10:48:42.908Z|13263|hmap|DBG|../ofproto/ofproto-dpif-xlate.c:884: 
1 bucket with 6+ nodes, including 1 bucket with 7 nodes (16 nodes total across 
16 buckets)
  2023-05-19T10:48:42.908Z|13264|dpif_netlink|DBG|port_changed: 
dpif:system@ovs-system vport:tap6d5eccfa-06 cmd:1
  2023-05-19T10:48:42.908Z|13265|vconn|DBG|unix#50: sent (Success): 
OFPT_PORT_STATUS (OF1.5) (xid=0x0): MOD: 57(tap6d5eccfa-06): 
addr:fe:16:3e:dc:bc:b8
       config: 0
       state: LIVE
       current: 10MB-FD COPPER
       speed: 10 Mbps now, 0 Mbps max

  2023-05-19T10:48:42.971Z|13289|bridge|INFO|bridge br-int: deleted interface 
tap6d5eccfa-06 on port 57
  2023-05-19T10:48:42.972Z|04671|poll_loop(revalidator62)|DBG|wakeup due to 
[POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ovs-thread.c:378 (0% CPU usage)
  2023-05-19T10:48:42.972Z|04651|poll_loop(revalidator63)|DBG|wakeup due to 
[POLLIN] on fd 45 (FIFO pipe:[154648]) at ../lib/ovs-thread.c:378 (0% CPU usage)
  2023-05-19T10:48:42.975Z|04672|poll_loop(revalidator62)|DBG|wakeup due to 
[POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ovs-thread.c:378 (0% CPU usage)
  2023-05-19T10:48:42.975Z|13290|dpif|DBG|system@ovs-system: failed to query 
port tap6d5eccfa-06: No such device
  2023-05-19T10:48:42.982Z|07648|poll_loop(revalidator61)|DBG|wakeup due to 
[POLLIN] on fd 41 (FIFO pipe:[155716]) at ../lib/ovs-thread.c:378 (10% CPU 
usage)
  2023-05-19T10:48:42.984Z|04719|poll_loop(revalidator65)|DBG|wakeup due to 
[POLLIN] on fd 49 (FIFO pipe:[149813]) at ../lib/ovs-thread.c:378 (0% CPU usage)
  2023-05-19T10:48:42.984Z|04589|poll_loop(revalidator64)|DBG|wakeup due to 
[POLLIN] on fd 47 (FIFO pipe:[154649]) at ../lib/ovs-thread.c:378 (1% CPU usage)
  2023-05-19T10:48:42.985Z|13291|poll_loop|DBG|wakeup due to [POLLIN] on fd 56 
(FIFO pipe:[136293]) at ../lib/ovs-rcu.c:236 (2% CPU usage)
  2023-05-19T10:48:42.999Z|13292|poll_loop|DBG|wakeup due to [POLLIN] on fd 56 
(FIFO pipe:[136293]) at ../lib/ovs-rcu.c:236 (2% CPU usage)
  2023-05-19T10:48:42.999Z|13293|dpif_netlink|DBG|port_changed: 
dpif:system@ovs-system vport:tap6d5eccfa-06 cmd:2
  2023-05-19T10:48:42.999Z|13294|dpif|DBG|system@ovs-system: failed to query 
port tap6d5eccfa-06: No such device

  3. nova-compute log:
  2023-05-19 18:48:42.416 7 INFO os_vif [None 
req-6f1ad586-b66d-4fb6-8603-ba96af2af311 bb05576c972b49b4ac598cb33cf151f2 
af89a2f86b204b54af86df7e5bb6a841 - - default default] Successfully plugged vif 
VIFOpenVSwitch(active=False,address=fa:16:3e:dc:bc:b8,bridge_name='br-int',has_traffic_filtering=True,id=6d5eccfa-069e-4058-a1c8-87bec9c1c280,network=Network(19771f61-1003-46a3-8850-001b32947f66),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap6d5eccfa-06')

  2023-05-19 18:58:45.827 7 WARNING nova.compute.manager [None
  req-6f1ad586-b66d-4fb6-8603-ba96af2af311
  bb05576c972b49b4ac598cb33cf151f2 af89a2f86b204b54af86df7e5bb6a841 - -
  default default] [instance: 1cfbb245-c807-48e9-90eb-2c4f77e7a7b3]
  Timeout waiting for ['network-vif-
  plugged-6d5eccfa-069e-4058-a1c8-87bec9c1c280'] for instance with
  vm_state building and task_state spawning. Event states are: network-
  vif-plugged-6d5eccfa-069e-4058-a1c8-87bec9c1c280: timed out after
  600.00 seconds: eventlet.timeout.Timeout: 600 seconds

  It seems that ovs successfully add tap device(tap6d5eccfa-06) in br-
  int, then quickly delete the device.

  In Nova, it wait for network-vif-plugged event, but tap device not
  present in br-int bridge, so create instance failed after 600 secs(we
  modify the default timeout 300s).

  Another Launchpad Bug: https://bugs.launchpad.net/neutron/+bug/2005111
  have the same problem.

  
  After change ML2/OVN driver to ML2/OVS driver, concurrent create VM also 
fails with vif plug timeout.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2020328/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to