Public bug reported:

If VM port's security group rules update frequently and network traffic is 
heavy.
There will be situation that OvS security group flows wrongly mark the 
conntrack to 1 and block the VM network connectivity.

If there are 2 VMs, VM A(192.168.111.234) and VM B(192.168.111.233), B allow 
ping from A.
We ping B from A forever.
There will be one conntrack rule in VM B's compute Host.
icmp     1 29 src=192.168.111.234 dst=192.168.111.233 type=8 code=0 id=29697 
src=192.168.111.233 dst=192.168.111.234 type=0 code=0 id=29697 mark=0 zone=1 
use=2

I try to simulate this issue because it's hard to reproduce this issue in 
normal way.
There is one precondition to notice:
If SG rules change on a port, SG flows on this port will be recreated.
Although all SG flows for this port will be added into OvS flows by
command 'ovs-ofctl add-flows' one-off, but flows will actually be
added into OvS flows one by one.

It's hard to reproduce this issue if we do not hack the codes. 
So I disable security group defer in codes to simulate. (change codes here: 
https://github.com/openstack/neutron/blob/master/neutron/agent/securitygroups_rpc.py#L132)
 

Then I start neutron-openvswitch-agent with breakpoint on
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/openvswitch_firewall/firewall.py#L1004

Now we will get mark=1 conntrack rule in VM B's compute Host:
icmp     1 29 src=192.168.111.234 dst=192.168.111.233 type=8 code=0 id=29697 
src=192.168.111.233 dst=192.168.111.234 type=0 code=0 id=29697 mark=1 zone=1 
use=1

Here after the port's security group rules flows added later, this
mark=1 conntrack rule will not deleted only if timeout for this rule.

In our OpenStack production environment, we encounter this issue and our vital 
system network disconnected.
The reason is that the VM port security rule change frequently and VM network 
traffic is heavy.

** Affects: neutron
     Importance: Undecided
     Assignee: Jesse (jesse-5)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1719769

Title:
  Occasional network interruption with mark=1 in conntrack

Status in neutron:
  In Progress

Bug description:
  If VM port's security group rules update frequently and network traffic is 
heavy.
  There will be situation that OvS security group flows wrongly mark the 
conntrack to 1 and block the VM network connectivity.

  If there are 2 VMs, VM A(192.168.111.234) and VM B(192.168.111.233), B allow 
ping from A.
  We ping B from A forever.
  There will be one conntrack rule in VM B's compute Host.
  icmp     1 29 src=192.168.111.234 dst=192.168.111.233 type=8 code=0 id=29697 
src=192.168.111.233 dst=192.168.111.234 type=0 code=0 id=29697 mark=0 zone=1 
use=2

  I try to simulate this issue because it's hard to reproduce this issue in 
normal way.
  There is one precondition to notice:
  If SG rules change on a port, SG flows on this port will be recreated.
  Although all SG flows for this port will be added into OvS flows by
  command 'ovs-ofctl add-flows' one-off, but flows will actually be
  added into OvS flows one by one.

  It's hard to reproduce this issue if we do not hack the codes. 
  So I disable security group defer in codes to simulate. (change codes here: 
https://github.com/openstack/neutron/blob/master/neutron/agent/securitygroups_rpc.py#L132)
 

  Then I start neutron-openvswitch-agent with breakpoint on
  
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/openvswitch_firewall/firewall.py#L1004

  Now we will get mark=1 conntrack rule in VM B's compute Host:
  icmp     1 29 src=192.168.111.234 dst=192.168.111.233 type=8 code=0 id=29697 
src=192.168.111.233 dst=192.168.111.234 type=0 code=0 id=29697 mark=1 zone=1 
use=1

  Here after the port's security group rules flows added later, this
  mark=1 conntrack rule will not deleted only if timeout for this rule.

  In our OpenStack production environment, we encounter this issue and our 
vital system network disconnected.
  The reason is that the VM port security rule change frequently and VM network 
traffic is heavy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1719769/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to