Public bug reported:
I am using Queens release and VM's tap interfaces are plugged into ovs br-int.
I'm watching a case when openflow entries are not totally removed when I stop
my VM (name='my-vm').
It is only reproducable when there is some another activity on a node for
different VMs: in my case I attach new network to another VM (name='vm-other')
ovs-agent logs in attach.
I managed to simulate the issue using next steps:
1) grep by mac current openflow entries for my-vm:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=60, n_packets=146,
n_bytes=21001, idle_age=4, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=2,
n_bytes=84, idle_age=4,
priority=95,arp,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,arp_spa=10.94.152.212
actions=NORMAL
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=28,
n_bytes=2448, idle_age=9,
priority=65,ip,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,nw_src=10.94.152.212
actions=ct(table=72,zone=NXM_NX_REG6[0..15])
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=0, n_bytes=0,
idle_age=93,
priority=65,ipv6,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,ipv6_src=fe80::f816:3eff:feec:d345
actions=ct(table=72,zone=NXM_NX_REG6[0..15])
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=73, n_packets=0, n_bytes=0,
idle_age=3401, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
2) # ps ax | grep libvirt
4887 pts/6 S+ 0:00 grep --color=auto libvirt
3934 ? Sl 0:18 /usr/libexec/qemu-kvm -name
guest=instance-00000012,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-131-instance-00000012/master-key.aes
-machine pc-i440fx-vz7.8.0,accel=kvm,usb=off,dump-guest-core=off -cpu
Westmere-IBRS,vme=on,ss=on,pcid=on,x2apic=on,tsc-deadline=on,hypervisor=on,arat=on,tsc_adjust=on,ssbd=on,stibp=on,pdpe1gb=on,rdtscp=on,aes=off,+kvmclock
-m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -numa
node,nodeid=0,cpus=0,mem=512 -uuid 89fccc31-96a6-47ce-abd1-e40fba7274e6 -smbios
type=1,manufacturer=Virtuozzo Infrastructure Platform,product=OpenStack
Compute,version=17.0.6-1.vl7,serial=71f55add-ef93-4ec2-a4dd-ab8098b6312d,uuid=89fccc31-96a6-47ce-abd1-e40fba7274e6,family=Virtual
Machine -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-131-instance-00000012/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -global
kvm-pit.lost_tick_policy=discard -no-shutdown -boot strict=on -device
nec-usb-x,id=usb,bus=pci.0,addr=0x4 -device
virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
file=/mnt/vstorage/vols/datastores/cinder/volume-e30d1874-d68e-4578-bf89-aa599e8383c7/volume-e30d1874-d68e-4578-bf89-aa599e8383c7,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,l2-cache-size=128M,discard=unmap,aio=native
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,logical_block_size=512,physical_block_size=4096,serial=e30d1874-d68e-4578-bf89-aa599e8383c7
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device
virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:ec:d3:45,bus=pci.0,addr=0x3
-chardev
pty,id=charserial0,logfile=/mnt/vstorage/vols/datastores/nova/instances/89fccc31-96a6-47ce-abd1-e40fba7274e6/console.log,logappend=on
-device isa-serial,chardev=charserial0,id=serial0 -chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-00000012.sock,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.1.instance-00000012.sock,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.1
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.10.1.237:0 -device
VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device vmcoreinfo -device
pvpanic,ioport=1285 -msg timestamp=on
24553 ? Ssl 11:44 /usr/sbin/libvirtd --listen
Note: "-netdev tap,fd=28", so, net device is passed to qemu as handle
and AFAIU tap interface is auto removed (by kernel) when qemu process
exits.
3) SIGSTOP libvirtd to emulate port deletion delay that is performed by
libvirtd when guest is stopped
(I believe libvirtd removes port when guest is stopped in the way like
'ovs-vsctl --timeout=5 -- --if-exists del-port taped0487c9-23')
# kill -SIGSTOP 24553
#
4) Kill the guest:
# kill -9 3934
#
Ovs agent logs right after killing:
2018-12-14 17:18:26.600 7 DEBUG neutron.agent.linux.async_process [-]
Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface
name,ofport,external_ids --format=json]: {"data":[["568b058e-382a-
4c50-a185-45807107190b","old",null,163,null],["","new","taped0487c9-23",-1,["map
",[["attached-mac","fa:16:3e:ec:d3:45"],["iface-id","ed0487c9-23b2-4bea-
9a17-567b7f52d7fc"],["iface-status","active"],["vm-id","89fccc31-96a6
-47ce-
abd1-e40fba7274e6"]]]]],"headings":["row","action","name","ofport","external_ids"]}
_read_stdout /usr/lib/python2.7/site-
packages/neutron/agent/linux/async_process.py:239
Really:
# ovs-vsctl --columns name,ofport,error find Interface name='taped0487c9-23'
name : "taped0487c9-23"
ofport : -1
error : "could not open network device taped0487c9-23 (No such
device)"
# ip a s taped0487c9-23
Device "taped0487c9-23" does not exist.
#
5) emulate parallel guest device adding (i.e. force 'rpc_loop' to process the
ports):
# ip tuntap add dev my-tap mode tap
# ovs-vsctl add-port br-int my-tap
#
Right after adding rpc_loop starts to process ports and there is warning in
logs:
2018-12-14 17:21:12.520 7 WARNING neutron.agent.common.ovs_lib
[req-338406ef-81c5-4268-9463-f2e9435892ee - - - - -] ofport: -1 for VIF:
ed0487c9-23b2-4bea-9a17-567b7f52d7fc is not a positive integer
Part of openflow entries are deleted but not all:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
cookie=0xf4d7d970f5382f3d, duration=905.467s, table=60, n_packets=176,
n_bytes=23641, idle_age=367, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
cookie=0xf4d7d970f5382f3d, duration=905.467s, table=73, n_packets=0,
n_bytes=0, idle_age=4213, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
Continue libvirtd:
# kill -SIGCONT 24553
#
2018-12-14 17:24:58.557 7 DEBUG neutron.agent.linux.async_process [-] Output
received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface
name,ofport,external_ids --format=json]: {"data":[["568b058e-382a
-4c50-a185-45807107190b","delete","taped0487c9-23",-1,["map",[["attached-mac","fa:16:3e:ec:d3:45"],["iface-id","ed0487c9-23b2-4bea-9a17-567b7f52d7fc"],["iface-status","active"],["vm-id","89fccc31-96a6-47ce-a
bd1-e40fba7274e6"]]]]],"headings":["row","action","name","ofport","external_ids"]}
_read_stdout
/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239
But flows are still here:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=60, n_packets=176,
n_bytes=23641, idle_age=492, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=73, n_packets=0,
n_bytes=0, idle_age=4338, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
Trace result for stopped VM:
# ovs-appctl ofproto/trace br-int
in_port=12,tcp,dl_src=fa:16:3e:05:95:a1,dl_dst=fa:16:3e:ec:d3:45,nw_src=10.94.152.196,nw_dst=10.94.152.212
<cut>
bridge("br-int")
----------------
thaw
Resuming from table 72
72. ct_state=+new-est,ip,reg5=0xc, priority 74, cookie 0xf4d7d970f5382f3d
resubmit(,73)
73. reg6=0x9,dl_dst=fa:16:3e:ec:d3:45, priority 100, cookie 0xf4d7d970f5382f3d
set_field:0xa3->reg5
resubmit(,81)
81. ct_state=+trk,reg5=0xa3, priority 80, cookie 0xf4d7d970f5382f3d
resubmit(,82)
82. ct_state=+new-est,ip,reg5=0xa3, priority 74, cookie 0xf4d7d970f5382f3d
ct(commit,zone=NXM_NX_REG6[0..15])
drop
output:163
>> Nonexistent output port
resubmit(,92)
92. priority 0, cookie 0xf4d7d970f5382f3d
drop
Final flow:
recirc_id=0x9b0,eth,tcp,reg5=0xa3,reg6=0x9,in_port=12,vlan_tci=0x0000,dl_src=fa:16:3e:05:95:a1,dl_dst=fa:16:3e:ec:d3:45,nw_src=10.94.152.196,nw_dst=10.94.152.212,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
Megaflow:
recirc_id=0x9b0,ct_state=+new-est-rel-rpl+trk,eth,tcp,in_port=12,dl_dst=fa:16:3e:ec:d3:45,nw_frag=no
Datapath actions: ct(commit,zone=9)
#
If then migrate my-vm (in stopped state) to another node, then other-vm
from current node lost connectivity to my-vm due to egress traffic from
vm-other is going to 'Nonexistent output port'.
The issue looks like also actual for master.
** Affects: neutron
Importance: Undecided
Status: New
** Attachment added: "neutron-openvswitch-agent.log"
https://bugs.launchpad.net/bugs/1808541/+attachment/5222284/+files/neutron-openvswitch-agent.log
** Description changed:
I am using Queens release and VM's tap interfaces are plugged into ovs br-int.
I'm watching a case when openflow entries are not totally removed when I stop
my VM (name='my-vm').
It is only reproducable when there is some another activity on a node for
different VMs: in my case I attach new network to another VM (name='vm-other')
ovs-agent logs in attach.
-
I managed to simulate the issue using next steps:
-
1) grep by mac current openflow entries for my-vm:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
- cookie=0xf4d7d970f5382f3d, duration=93.162s, table=60, n_packets=146,
n_bytes=21001, idle_age=4, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
- cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=2,
n_bytes=84, idle_age=4,
priority=95,arp,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,arp_spa=10.94.152.212
actions=NORMAL
- cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=28,
n_bytes=2448, idle_age=9,
priority=65,ip,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,nw_src=10.94.152.212
actions=ct(table=72,zone=NXM_NX_REG6[0..15])
- cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=0,
n_bytes=0, idle_age=93,
priority=65,ipv6,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,ipv6_src=fe80::f816:3eff:feec:d345
actions=ct(table=72,zone=NXM_NX_REG6[0..15])
- cookie=0xf4d7d970f5382f3d, duration=93.162s, table=73, n_packets=0,
n_bytes=0, idle_age=3401, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
+ cookie=0xf4d7d970f5382f3d, duration=93.162s, table=60, n_packets=146,
n_bytes=21001, idle_age=4, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
+ cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=2,
n_bytes=84, idle_age=4,
priority=95,arp,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,arp_spa=10.94.152.212
actions=NORMAL
+ cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=28,
n_bytes=2448, idle_age=9,
priority=65,ip,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,nw_src=10.94.152.212
actions=ct(table=72,zone=NXM_NX_REG6[0..15])
+ cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=0,
n_bytes=0, idle_age=93,
priority=65,ipv6,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,ipv6_src=fe80::f816:3eff:feec:d345
actions=ct(table=72,zone=NXM_NX_REG6[0..15])
+ cookie=0xf4d7d970f5382f3d, duration=93.162s, table=73, n_packets=0,
n_bytes=0, idle_age=3401, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
-
2) # ps ax | grep libvirt
- 4887 pts/6 S+ 0:00 grep --color=auto libvirt
- 3934 ? Sl 0:18 /usr/libexec/qemu-kvm -name
guest=instance-00000012,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-131-instance-00000012/master-key.aes
-machine pc-i440fx-vz7.8.0,accel=kvm,usb=off,dump-guest-core=off -cpu
Westmere-IBRS,vme=on,ss=on,pcid=on,x2apic=on,tsc-deadline=on,hypervisor=on,arat=on,tsc_adjust=on,ssbd=on,stibp=on,pdpe1gb=on,rdtscp=on,aes=off,+kvmclock
-m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -numa
node,nodeid=0,cpus=0,mem=512 -uuid 89fccc31-96a6-47ce-abd1-e40fba7274e6 -smbios
type=1,manufacturer=Virtuozzo Infrastructure Platform,product=OpenStack
Compute,version=17.0.6-1.vl7,serial=71f55add-ef93-4ec2-a4dd-ab8098b6312d,uuid=89fccc31-96a6-47ce-abd1-e40fba7274e6,family=Virtual
Machine -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-131-instance-00000012/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -global
kvm-pit.lost_tick_policy=discard -no-shutdown -boot strict=on -device
nec-usb-x,id=usb,bus=pci.0,addr=0x4 -device
virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
file=/mnt/vstorage/vols/datastores/cinder/volume-e30d1874-d68e-4578-bf89-aa599e8383c7/volume-e30d1874-d68e-4578-bf89-aa599e8383c7,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,l2-cache-size=128M,discard=unmap,aio=native
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,logical_block_size=512,physical_block_size=4096,serial=e30d1874-d68e-4578-bf89-aa599e8383c7
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device
virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:ec:d3:45,bus=pci.0,addr=0x3
-chardev
pty,id=charserial0,logfile=/mnt/vstorage/vols/datastores/nova/instances/89fccc31-96a6-47ce-abd1-e40fba7274e6/console.log,logappend=on
-device isa-serial,chardev=charserial0,id=serial0 -chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-00000012.sock,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.1.instance-00000012.sock,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.1
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.10.1.237:0 -device
VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device vmcoreinfo -device
pvpanic,ioport=1285 -msg timestamp=on
+ 4887 pts/6 S+ 0:00 grep --color=auto libvirt
+ 3934 ? Sl 0:18 /usr/libexec/qemu-kvm -name
guest=instance-00000012,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-131-instance-00000012/master-key.aes
-machine pc-i440fx-vz7.8.0,accel=kvm,usb=off,dump-guest-core=off -cpu
Westmere-IBRS,vme=on,ss=on,pcid=on,x2apic=on,tsc-deadline=on,hypervisor=on,arat=on,tsc_adjust=on,ssbd=on,stibp=on,pdpe1gb=on,rdtscp=on,aes=off,+kvmclock
-m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -numa
node,nodeid=0,cpus=0,mem=512 -uuid 89fccc31-96a6-47ce-abd1-e40fba7274e6 -smbios
type=1,manufacturer=Virtuozzo Infrastructure Platform,product=OpenStack
Compute,version=17.0.6-1.vl7,serial=71f55add-ef93-4ec2-a4dd-ab8098b6312d,uuid=89fccc31-96a6-47ce-abd1-e40fba7274e6,family=Virtual
Machine -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-131-instance-00000012/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -global
kvm-pit.lost_tick_policy=discard -no-shutdown -boot strict=on -device
nec-usb-x,id=usb,bus=pci.0,addr=0x4 -device
virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
file=/mnt/vstorage/vols/datastores/cinder/volume-e30d1874-d68e-4578-bf89-aa599e8383c7/volume-e30d1874-d68e-4578-bf89-aa599e8383c7,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,l2-cache-size=128M,discard=unmap,aio=native
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,logical_block_size=512,physical_block_size=4096,serial=e30d1874-d68e-4578-bf89-aa599e8383c7
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device
virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:ec:d3:45,bus=pci.0,addr=0x3
-chardev
pty,id=charserial0,logfile=/mnt/vstorage/vols/datastores/nova/instances/89fccc31-96a6-47ce-abd1-e40fba7274e6/console.log,logappend=on
-device isa-serial,chardev=charserial0,id=serial0 -chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-00000012.sock,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.1.instance-00000012.sock,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.1
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.10.1.237:0 -device
VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device vmcoreinfo -device
pvpanic,ioport=1285 -msg timestamp=on
24553 ? Ssl 11:44 /usr/sbin/libvirtd --listen
Note: "-netdev tap,fd=28", so, net device is passed to qemu as handle
and AFAIU tap interface is auto removed (by kernel) when qemu process
exits.
-
3) SIGSTOP libvirtd to emulate port deletion delay that is performed by
libvirtd when guest is stopped
(I believe libvirtd removes port when quest is stopped in the way like
'ovs-vsctl --timeout=5 -- --if-exists del-port taped0487c9-23')
# kill -SIGSTOP 24553
#
-
4) Kill the quest:
# kill -9 3934
#
Ovs agent logs right after killing:
2018-12-14 17:18:26.600 7 DEBUG neutron.agent.linux.async_process [-]
Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface
name,ofport,external_ids --format=json]: {"data":[["568b058e-382a-
4c50-a185-45807107190b","old",null,163,null],["","new","taped0487c9-23",-1,["map
",[["attached-mac","fa:16:3e:ec:d3:45"],["iface-id","ed0487c9-23b2-4bea-
9a17-567b7f52d7fc"],["iface-status","active"],["vm-id","89fccc31-96a6
-47ce-
abd1-e40fba7274e6"]]]]],"headings":["row","action","name","ofport","external_ids"]}
_read_stdout /usr/lib/python2.7/site-
packages/neutron/agent/linux/async_process.py:239
Really:
# ovs-vsctl --columns name,ofport,error find Interface name='taped0487c9-23'
name : "taped0487c9-23"
ofport : -1
error : "could not open network device taped0487c9-23 (No such
device)"
# ip a s taped0487c9-23
Device "taped0487c9-23" does not exist.
#
-
- 5) emulate parallel guest device adding (i.e. inforce rpc_loop to process
ports):
+ 5) emulate parallel guest device adding (i.e. force 'rpc_loop' to process the
ports):
# ip tuntap add dev my-tap mode tap
# ovs-vsctl add-port br-int my-tap
#
Right after adding rpc_loop starts to process ports and there is warning in
logs:
2018-12-14 17:21:12.520 7 WARNING neutron.agent.common.ovs_lib
[req-338406ef-81c5-4268-9463-f2e9435892ee - - - - -] ofport: -1 for VIF:
ed0487c9-23b2-4bea-9a17-567b7f52d7fc is not a positive integer
-
Part of openflow entries are deleted but not all:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
- cookie=0xf4d7d970f5382f3d, duration=905.467s, table=60, n_packets=176,
n_bytes=23641, idle_age=367, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
- cookie=0xf4d7d970f5382f3d, duration=905.467s, table=73, n_packets=0,
n_bytes=0, idle_age=4213, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
+ cookie=0xf4d7d970f5382f3d, duration=905.467s, table=60, n_packets=176,
n_bytes=23641, idle_age=367, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
+ cookie=0xf4d7d970f5382f3d, duration=905.467s, table=73, n_packets=0,
n_bytes=0, idle_age=4213, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
-
Continue libvirtd:
# kill -SIGCONT 24553
#
2018-12-14 17:24:58.557 7 DEBUG neutron.agent.linux.async_process [-] Output
received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface
name,ofport,external_ids --format=json]: {"data":[["568b058e-382a
-4c50-a185-45807107190b","delete","taped0487c9-23",-1,["map",[["attached-mac","fa:16:3e:ec:d3:45"],["iface-id","ed0487c9-23b2-4bea-9a17-567b7f52d7fc"],["iface-status","active"],["vm-id","89fccc31-96a6-47ce-a
bd1-e40fba7274e6"]]]]],"headings":["row","action","name","ofport","external_ids"]}
_read_stdout
/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239
-
But flows are still here:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
- cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=60, n_packets=176,
n_bytes=23641, idle_age=492, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
- cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=73, n_packets=0,
n_bytes=0, idle_age=4338, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
+ cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=60, n_packets=176,
n_bytes=23641, idle_age=492, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
+ cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=73, n_packets=0,
n_bytes=0, idle_age=4338, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
-
Trace result for stopped VM:
# ovs-appctl ofproto/trace br-int
in_port=12,tcp,dl_src=fa:16:3e:05:95:a1,dl_dst=fa:16:3e:ec:d3:45,nw_src=10.94.152.196,nw_dst=10.94.152.212
<cut>
bridge("br-int")
----------------
- thaw
- Resuming from table 72
+ thaw
+ Resuming from table 72
72. ct_state=+new-est,ip,reg5=0xc, priority 74, cookie 0xf4d7d970f5382f3d
- resubmit(,73)
+ resubmit(,73)
73. reg6=0x9,dl_dst=fa:16:3e:ec:d3:45, priority 100, cookie 0xf4d7d970f5382f3d
- set_field:0xa3->reg5
- resubmit(,81)
+ set_field:0xa3->reg5
+ resubmit(,81)
81. ct_state=+trk,reg5=0xa3, priority 80, cookie 0xf4d7d970f5382f3d
- resubmit(,82)
+ resubmit(,82)
82. ct_state=+new-est,ip,reg5=0xa3, priority 74, cookie 0xf4d7d970f5382f3d
- ct(commit,zone=NXM_NX_REG6[0..15])
- drop
- output:163
- >> Nonexistent output port
- resubmit(,92)
+ ct(commit,zone=NXM_NX_REG6[0..15])
+ drop
+ output:163
+ >> Nonexistent output port
+ resubmit(,92)
92. priority 0, cookie 0xf4d7d970f5382f3d
- drop
+ drop
Final flow:
recirc_id=0x9b0,eth,tcp,reg5=0xa3,reg6=0x9,in_port=12,vlan_tci=0x0000,dl_src=fa:16:3e:05:95:a1,dl_dst=fa:16:3e:ec:d3:45,nw_src=10.94.152.196,nw_dst=10.94.152.212,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
Megaflow:
recirc_id=0x9b0,ct_state=+new-est-rel-rpl+trk,eth,tcp,in_port=12,dl_dst=fa:16:3e:ec:d3:45,nw_frag=no
Datapath actions: ct(commit,zone=9)
#
-
- If then migrate my-vm (in stopped state) to another node, then other-vm from
current node lost connectivity to my-vm due to egress traffic from vm-other is
going to 'Nonexistent output port'.
+ If then migrate my-vm (in stopped state) to another node, then other-vm
+ from current node lost connectivity to my-vm due to egress traffic from
+ vm-other is going to 'Nonexistent output port'.
The issue looks like also actual for master.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1808541
Title:
Openflow entries are not totally removed for stopped VM
Status in neutron:
New
Bug description:
I am using Queens release and VM's tap interfaces are plugged into ovs br-int.
I'm watching a case when openflow entries are not totally removed when I stop
my VM (name='my-vm').
It is only reproducable when there is some another activity on a node for
different VMs: in my case I attach new network to another VM (name='vm-other')
ovs-agent logs in attach.
I managed to simulate the issue using next steps:
1) grep by mac current openflow entries for my-vm:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=60, n_packets=146,
n_bytes=21001, idle_age=4, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=2,
n_bytes=84, idle_age=4,
priority=95,arp,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,arp_spa=10.94.152.212
actions=NORMAL
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=28,
n_bytes=2448, idle_age=9,
priority=65,ip,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,nw_src=10.94.152.212
actions=ct(table=72,zone=NXM_NX_REG6[0..15])
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=0,
n_bytes=0, idle_age=93,
priority=65,ipv6,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,ipv6_src=fe80::f816:3eff:feec:d345
actions=ct(table=72,zone=NXM_NX_REG6[0..15])
cookie=0xf4d7d970f5382f3d, duration=93.162s, table=73, n_packets=0,
n_bytes=0, idle_age=3401, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
2) # ps ax | grep libvirt
4887 pts/6 S+ 0:00 grep --color=auto libvirt
3934 ? Sl 0:18 /usr/libexec/qemu-kvm -name
guest=instance-00000012,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-131-instance-00000012/master-key.aes
-machine pc-i440fx-vz7.8.0,accel=kvm,usb=off,dump-guest-core=off -cpu
Westmere-IBRS,vme=on,ss=on,pcid=on,x2apic=on,tsc-deadline=on,hypervisor=on,arat=on,tsc_adjust=on,ssbd=on,stibp=on,pdpe1gb=on,rdtscp=on,aes=off,+kvmclock
-m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -numa
node,nodeid=0,cpus=0,mem=512 -uuid 89fccc31-96a6-47ce-abd1-e40fba7274e6 -smbios
type=1,manufacturer=Virtuozzo Infrastructure Platform,product=OpenStack
Compute,version=17.0.6-1.vl7,serial=71f55add-ef93-4ec2-a4dd-ab8098b6312d,uuid=89fccc31-96a6-47ce-abd1-e40fba7274e6,family=Virtual
Machine -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-131-instance-00000012/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -global
kvm-pit.lost_tick_policy=discard -no-shutdown -boot strict=on -device
nec-usb-x,id=usb,bus=pci.0,addr=0x4 -device
virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
file=/mnt/vstorage/vols/datastores/cinder/volume-e30d1874-d68e-4578-bf89-aa599e8383c7/volume-e30d1874-d68e-4578-bf89-aa599e8383c7,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,l2-cache-size=128M,discard=unmap,aio=native
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,logical_block_size=512,physical_block_size=4096,serial=e30d1874-d68e-4578-bf89-aa599e8383c7
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device
virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:ec:d3:45,bus=pci.0,addr=0x3
-chardev
pty,id=charserial0,logfile=/mnt/vstorage/vols/datastores/nova/instances/89fccc31-96a6-47ce-abd1-e40fba7274e6/console.log,logappend=on
-device isa-serial,chardev=charserial0,id=serial0 -chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-00000012.sock,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.1.instance-00000012.sock,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.1
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.10.1.237:0 -device
VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device vmcoreinfo -device
pvpanic,ioport=1285 -msg timestamp=on
24553 ? Ssl 11:44 /usr/sbin/libvirtd --listen
Note: "-netdev tap,fd=28", so, net device is passed to qemu as handle
and AFAIU tap interface is auto removed (by kernel) when qemu process
exits.
3) SIGSTOP libvirtd to emulate port deletion delay that is performed by
libvirtd when guest is stopped
(I believe libvirtd removes port when guest is stopped in the way like
'ovs-vsctl --timeout=5 -- --if-exists del-port taped0487c9-23')
# kill -SIGSTOP 24553
#
4) Kill the guest:
# kill -9 3934
#
Ovs agent logs right after killing:
2018-12-14 17:18:26.600 7 DEBUG neutron.agent.linux.async_process [-]
Output received from [ovsdb-client monitor tcp:127.0.0.1:6640
Interface name,ofport,external_ids --format=json]: {"data
":[["568b058e-382a-
4c50-a185-45807107190b","old",null,163,null],["","new","taped0487c9-23",-1,["map
",[["attached-mac","fa:16:3e:ec:d3:45"],["iface-id","ed0487c9-23b2
-4bea-9a17-567b7f52d7fc"],["iface-status","active"],["vm-
id","89fccc31-96a6-47ce-
abd1-e40fba7274e6"]]]]],"headings":["row","action","name","ofport","external_ids"]}
_read_stdout /usr/lib/python2.7/site-
packages/neutron/agent/linux/async_process.py:239
Really:
# ovs-vsctl --columns name,ofport,error find Interface name='taped0487c9-23'
name : "taped0487c9-23"
ofport : -1
error : "could not open network device taped0487c9-23 (No such
device)"
# ip a s taped0487c9-23
Device "taped0487c9-23" does not exist.
#
5) emulate parallel guest device adding (i.e. force 'rpc_loop' to process the
ports):
# ip tuntap add dev my-tap mode tap
# ovs-vsctl add-port br-int my-tap
#
Right after adding rpc_loop starts to process ports and there is warning in
logs:
2018-12-14 17:21:12.520 7 WARNING neutron.agent.common.ovs_lib
[req-338406ef-81c5-4268-9463-f2e9435892ee - - - - -] ofport: -1 for VIF:
ed0487c9-23b2-4bea-9a17-567b7f52d7fc is not a positive integer
Part of openflow entries are deleted but not all:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
cookie=0xf4d7d970f5382f3d, duration=905.467s, table=60, n_packets=176,
n_bytes=23641, idle_age=367, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
cookie=0xf4d7d970f5382f3d, duration=905.467s, table=73, n_packets=0,
n_bytes=0, idle_age=4213, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
Continue libvirtd:
# kill -SIGCONT 24553
#
2018-12-14 17:24:58.557 7 DEBUG neutron.agent.linux.async_process [-] Output
received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface
name,ofport,external_ids --format=json]: {"data":[["568b058e-382a
-4c50-a185-45807107190b","delete","taped0487c9-23",-1,["map",[["attached-mac","fa:16:3e:ec:d3:45"],["iface-id","ed0487c9-23b2-4bea-9a17-567b7f52d7fc"],["iface-status","active"],["vm-id","89fccc31-96a6-47ce-a
bd1-e40fba7274e6"]]]]],"headings":["row","action","name","ofport","external_ids"]}
_read_stdout
/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239
But flows are still here:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=60, n_packets=176,
n_bytes=23641, idle_age=492, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=73, n_packets=0,
n_bytes=0, idle_age=4338, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45
actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#
Trace result for stopped VM:
# ovs-appctl ofproto/trace br-int
in_port=12,tcp,dl_src=fa:16:3e:05:95:a1,dl_dst=fa:16:3e:ec:d3:45,nw_src=10.94.152.196,nw_dst=10.94.152.212
<cut>
bridge("br-int")
----------------
thaw
Resuming from table 72
72. ct_state=+new-est,ip,reg5=0xc, priority 74, cookie 0xf4d7d970f5382f3d
resubmit(,73)
73. reg6=0x9,dl_dst=fa:16:3e:ec:d3:45, priority 100, cookie 0xf4d7d970f5382f3d
set_field:0xa3->reg5
resubmit(,81)
81. ct_state=+trk,reg5=0xa3, priority 80, cookie 0xf4d7d970f5382f3d
resubmit(,82)
82. ct_state=+new-est,ip,reg5=0xa3, priority 74, cookie 0xf4d7d970f5382f3d
ct(commit,zone=NXM_NX_REG6[0..15])
drop
output:163
>> Nonexistent output port
resubmit(,92)
92. priority 0, cookie 0xf4d7d970f5382f3d
drop
Final flow:
recirc_id=0x9b0,eth,tcp,reg5=0xa3,reg6=0x9,in_port=12,vlan_tci=0x0000,dl_src=fa:16:3e:05:95:a1,dl_dst=fa:16:3e:ec:d3:45,nw_src=10.94.152.196,nw_dst=10.94.152.212,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
Megaflow:
recirc_id=0x9b0,ct_state=+new-est-rel-rpl+trk,eth,tcp,in_port=12,dl_dst=fa:16:3e:ec:d3:45,nw_frag=no
Datapath actions: ct(commit,zone=9)
#
If then migrate my-vm (in stopped state) to another node, then other-
vm from current node lost connectivity to my-vm due to egress traffic
from vm-other is going to 'Nonexistent output port'.
The issue looks like also actual for master.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1808541/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp