On Mon, Mar 15, 2021 at 5:18 PM Krzysztof Klimonda < [email protected]> wrote:
> Hi, > > Sorry for what is most likely an unconnected reply to a thread - I can't > seem to figure out how to reply to a thread from before I was subscribed to > ML. > > We've been testing OVN scaling for our OpenStack cloud, and found what > seems to be a OF flow explosion that is basically a mirror of the issue > reported by Girish a week ago or so. > > In OpenStack, neutron creates a "default" security group that has 4 rules > (2 for both IPv4 and IPv6): > > - allow all egress traffic from the port > - allow all ingress traffic from other ports belonging to the same default > group > > What we have discovered in our testing, is that this second rule > translates into the following ACL in OVN: > > ``` > outport == @pg_304cc336_8db3_4efd_a558_408e648e6259 && ip4 && ip4.src == > $pg_304cc336_8db3_4efd_a558_408e648e6259_ip4 > ``` > where port_group `pg_304cc336_8db3_4efd_a558_408e648e6259_ip4` is defined > in nbdb and contains all ports attached to the SG, and address_set > pg_304cc336_8db3_4efd_a558_408e648e6259_ip4 is defined in sbdb and seems to > have a list of addresses that are assigned to ports from that port_group[1]. > > As Girish has explained in his email, such ACLs are translated into a > bunch of duplicated flows that only seem to differ in metadata: > > ``` > # ovs-ofctl dump-flows br-int |egrep "(12474|12475)" > [...] > cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, > idle_age=47132, priority=2002,ip,reg0=0x100/0x100,reg15=0x3,metadata=0x20e > actions=conjunction(12475,2/2) > cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, > idle_age=47132, > priority=2002,ip,reg0=0x100/0x100,metadata=0x20e,nw_src=1.0.0.67 > actions=conjunction(12475,1/2) > cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, > idle_age=47132, > priority=2002,ip,reg0=0x100/0x100,metadata=0x20e,nw_src=2.0.0.52 > actions=conjunction(12475,1/2) > cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, > idle_age=47132, > priority=2002,ip,reg0=0x100/0x100,metadata=0x20d,nw_src=1.0.0.67 > actions=conjunction(12475,1/2) > cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, > idle_age=47132, > priority=2002,ip,reg0=0x100/0x100,metadata=0x20d,nw_src=2.0.0.52 > actions=conjunction(12475,1/2) > cookie=0xb25108c3, duration=47132.116s, table=45, n_packets=0, n_bytes=0, > idle_age=47132, > priority=2002,conj_id=12475,ip,reg0=0x100/0x100,metadata=0x20e > actions=resubmit(,46) > cookie=0xb25108c3, duration=47132.116s, table=45, n_packets=0, n_bytes=0, > idle_age=47132, > priority=2002,conj_id=12475,ip,reg0=0x100/0x100,metadata=0x20d > actions=resubmit(,46) > [...] > # > ``` > (See http://paste.openstack.org/show/803598/ for the full output of grep) > > His idea of changing this conjunction into one that matches additionally > on metadata seems to make sense in this particular instance, given that all > ports from all datapaths need to evaluate same set of rules, and possibly > it makes sense for all ACLs too? > > Anyway, to understand how OF flows are generated by ovn-controller, I took > a quick look at the source code, and it seems that right now all flows are > forcefully matched to their datapath (by unconditional matching on metadata > field). > Would it make sense to introduce a notion of "datapath unbound flow" when > conjunction is already matching metadata? > Are there some other parts of OVN code that heavily depend on flows being > installed per-dp? > How would that affect OVS performance when matching packets in userspace? > In our testing we've ended up with over 1M flows installed in table 45, > which seems to be dwarfing any potential performance loss from having flows > that don't match on metadata field, but perhaps I'm wrong? Still, that's a > lot of flows, and puts a hard scaling limit on some openstack deployments > given it's a SG that is by default attached to all ports on all VMs. > > > [1] (although apparently not additional IP addresses allowed on port via > allowed-address-pair - I think I've seen this issue before while testing > magnum. > > > -- > Krzysztof Klimonda > [email protected] > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Hi Krzysztof, Sorry for the late response, but here is a series of patch to the problem: https://patchwork.ozlabs.org/project/ovn/list/?series=240419 Would you give it a try? Thanks, Han _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
