Re: [ovs-dev] [RFC] Question about ovn-controller performance
It seems one of my replies didn't went through the mailinglist because of "format" issue. On Thu, Sep 14, 2017 at 1:08 PM, Miguel Angel Ajo Pelayo < majop...@redhat.com> wrote: > >> b. Disable ovsdb probe. Any input would trigger a full recomputing of >> flows. OVN-SB probe is configurable by external_ids:ovn-remote-probe- >> interval >> >> > Just for the probe? Is there any way to avoid that or at least filter a > percentage of them? > > I don't know what the probes are for exactly? health check of the > connection? > Yes it is for health check of ovsdb connections. > > >> c. Packet-in from local OVS, such as ECHO, will also trigger recomputing. >> > > Can we filter the ECHOs for recomputing? is that necessary? > Since the openflow connection is local, echo seems not necessary. But the point here is that any input to ovn-controller would wake up the main loop, and it can't tell what needs to be processed and what does not, so it just recompute everything. There has been a lot of discussions before, and the conclusion was that multi-threading is the right approach to solve the problem. If some input is coming, it only wakes up the thread that is responsible for that kind of input. The related patch is here: https://patchwork.ozlabs.org/patch/806360/. An earlier discussion is here: https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/331813.html Thanks, Han ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Question about ovn-controller performance
On Thu, Sep 14, 2017 at 03:10:42PM -0600, Miguel Angel Ajo Pelayo wrote: > On Thu, Sep 14, 2017 at 2:58 PM, Ben Pfaffwrote: > > > On Thu, Sep 14, 2017 at 02:52:48PM -0600, Miguel Angel Ajo Pelayo wrote: > > > Although I see we have code for somehow packing stuff into conjunctions: > > > > > > https://github.com/openvswitch/ovs/blob/1ea2184501d43352ec40764f5eaa3c > > bd07e3fee3/ovn/controller/lflow.c#L298 > > > > > > I don't really understand (yet) what's it doing. Is it may be supposed to > > > cover this case but we got into a bug? > > > > It's a naive, ad hoc algorithm that I implemented knowing at the time > > that I didn't know what was actually important yet. Now that we have an > > example of a case where it's important to get it right, it's time to > > take another look. > > > > Oh, sounds great Ben, thank you for handling this. > > I'm spending some time reading the lflow.c code to understand what we have > now. > > I was wondering if, another improvement we could make in the future is > having ACL_Match sets, or something like that, to reduce the amount of ACL > entries and lflow entries that we generate, and also make it easier for > ovn-controller to group them. They would resemble the idea of security > groups (for rules, not for members) in neutron, but not sure if that's too > specific. If there are higher-level concepts that often get used in practice, then it makes sense to me to figure out whether there's a clean way to integrate them in a general fashion. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Question about ovn-controller performance
On Thu, Sep 14, 2017 at 2:58 PM, Ben Pfaffwrote: > On Thu, Sep 14, 2017 at 02:52:48PM -0600, Miguel Angel Ajo Pelayo wrote: > > Although I see we have code for somehow packing stuff into conjunctions: > > > > https://github.com/openvswitch/ovs/blob/1ea2184501d43352ec40764f5eaa3c > bd07e3fee3/ovn/controller/lflow.c#L298 > > > > I don't really understand (yet) what's it doing. Is it may be supposed to > > cover this case but we got into a bug? > > It's a naive, ad hoc algorithm that I implemented knowing at the time > that I didn't know what was actually important yet. Now that we have an > example of a case where it's important to get it right, it's time to > take another look. > Oh, sounds great Ben, thank you for handling this. I'm spending some time reading the lflow.c code to understand what we have now. I was wondering if, another improvement we could make in the future is having ACL_Match sets, or something like that, to reduce the amount of ACL entries and lflow entries that we generate, and also make it easier for ovn-controller to group them. They would resemble the idea of security groups (for rules, not for members) in neutron, but not sure if that's too specific. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Question about ovn-controller performance
Hi folks, I'd like to discuss my thoughts, too. To simplify the discussion, let's just use the Neutron sec-group use case as the context. If there are P ports in same group and there are R rules in the sec-group (by default there are 2 such ingress (OVN to-lport) rules in the neutron default sec-group), then the number of flows (F) related to those rules on each HV will be (assuming all ports are under same lswitch or different lswitch but connected by lrouter(s)): F = P x P x R x 3 (for each ACL there are 3 flows required to implement it) Given the huge number of flows, it is expected that the CPU will be running hot. But there are still 2 parts of the problem: 1. Reduce number of flows 2. Reduce CPU cost 1. Reduce number of flows a. Consider local bound ports only (not implemented) As discussed in today's OVN meeting, in theory, for the flows related to ACLs, only the ones has inport/outport in match condition being the local ports is required on the HV. If there are L ports bound locally, it should be: F = L x P x R x 3 This is a huge difference considering that L is usually much smaller than P. b. Conjuncture This is mentioned by Miguel with the link to an earlier discussion. If I understand correctly, there are several cases can be optimized potentially. i). When there are multiple ports on the same HV belongs to same group (thus has same security rules/acls), conjuncture can help if the rules can be combined like: inport/outport in {port1, port2, port3 ...} ... . Assume the L port in above example belongs to G groups (L > G). F become: F = G x P x R x 3 + L ii). When there are multiple sec-group rules with same "remote-group-id" (by default it is the self group) but with different TCP/UDP ports. Conjuncture can help if we can combine the rules like: ... tcp.src in {p1, p2, p3 ...}. If R rules can be combined to S rules: F = G x P x S x 3 + L + R There can be other cases to utilize conjuncture but I think in OpenStack Neutron these are the typical ones. However, it may be tricky where to do the combining. For i), if Neutron has port-binding information it may do some work, but it is not quit straightforward to me. For ii), Neutron sec-group API doesn't support multiple tcp/udp ports in a single rule, except for port range. So it seems ovn-controller is the best place to do the combining. 2. Reduce CPU cost There are some points for tuning. a. Make sure the version used include this commit: https://github.com/openvswitch/ovs/commit/74c760c8fe99d554b94423d49d13d5ca3dea0d9e It saves CPU cycles dramatically. It is in 2.8 but not 2.7. b. Disable ovsdb probe. Any input would trigger a full recomputing of flows. OVN-SB probe is configurable by external_ids:ovn-remote-probe-interval c. Packet-in from local OVS, such as ECHO, will also trigger recomputing. Multi-threading ovn-controller will solve the problem but it is not merged yet. So there is no way to tune, but if you are sure there is no need for the features require packet-in, you can disable it by hardcode (not recommended though), maybe just for performance testing. And if you use 2.8, make sure "log" is not enabled in ACL (using Neutron won't have the problem, since this is not integrated to Neutron yet). Thanks, Han On Thu, Sep 14, 2017 at 11:51 AM, Miguel Angel Ajo Pelayo < majop...@redhat.com> wrote: > I will prepare a summary with what I found and post it on this thread. > > On Thu, Sep 14, 2017 at 12:28 PM, Ben Pfaffwrote: > > > On Thu, Sep 14, 2017 at 10:39:28AM +0800, wang.qia...@zte.com.cn wrote: > > > I configure 5 networks, every network have about 80 ports, the total > > ports > > > is 400, all in same security group. > > > > > > When I bind some port on HVs, the ovn-controller is always running with > > > 100% cpu, and the total openflow table entities in ovs is more than > > > 300,000. Most of the entities is table 52, worked as src ip filter. > > > > It sounds like some of the OVN folks at Red Hat have recently found the > > same issue during testing. I'm expecting a more detailed report from > > them soon. > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Question about ovn-controller performance
I will prepare a summary with what I found and post it on this thread. On Thu, Sep 14, 2017 at 12:28 PM, Ben Pfaffwrote: > On Thu, Sep 14, 2017 at 10:39:28AM +0800, wang.qia...@zte.com.cn wrote: > > I configure 5 networks, every network have about 80 ports, the total > ports > > is 400, all in same security group. > > > > When I bind some port on HVs, the ovn-controller is always running with > > 100% cpu, and the total openflow table entities in ovs is more than > > 300,000. Most of the entities is table 52, worked as src ip filter. > > It sounds like some of the OVN folks at Red Hat have recently found the > same issue during testing. I'm expecting a more detailed report from > them soon. > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Question about ovn-controller performance
On Thu, Sep 14, 2017 at 10:39:28AM +0800, wang.qia...@zte.com.cn wrote: > I configure 5 networks, every network have about 80 ports, the total ports > is 400, all in same security group. > > When I bind some port on HVs, the ovn-controller is always running with > 100% cpu, and the total openflow table entities in ovs is more than > 300,000. Most of the entities is table 52, worked as src ip filter. It sounds like some of the OVN folks at Red Hat have recently found the same issue during testing. I'm expecting a more detailed report from them soon. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC] Question about ovn-controller performance
I configure 5 networks, every network have about 80 ports, the total ports is 400, all in same security group. When I bind some port on HVs, the ovn-controller is always running with 100% cpu, and the total openflow table entities in ovs is more than 300,000. Most of the entities is table 52, worked as src ip filter. Could some one tell me how to reduce the flows or how to make ovn-controller work more efficient or do some options to reduce the table number of acls? The attachment is the dump of ovn-sb. Thanks. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev