Re: [ovs-dev] [RFC] Question about ovn-controller performance

2017-09-14 Thread Han Zhou
It seems one of my replies didn't went through the mailinglist because of
"format" issue.

On Thu, Sep 14, 2017 at 1:08 PM, Miguel Angel Ajo Pelayo <
majop...@redhat.com> wrote:

>
>> b. Disable ovsdb probe. Any input would trigger a full recomputing of
>> flows. OVN-SB probe is configurable by external_ids:ovn-remote-probe-
>> interval
>>
>>
> Just for the probe? Is there any way to avoid that or at least filter a
> percentage of them?
>
> I don't know what the probes are for exactly? health check of the
> connection?
>
Yes it is for health check of ovsdb connections.

>
>
>> c. Packet-in from local OVS, such as ECHO, will also trigger recomputing.
>>
>
> Can we filter the ECHOs for recomputing? is that necessary?
>
Since the openflow connection is local, echo seems not necessary. But the
point here is that any input to ovn-controller would wake up the main loop,
and it can't tell what needs to be processed and what does not, so it just
recompute everything. There has been a lot of discussions before, and the
conclusion was that multi-threading is the right approach to solve the
problem. If some input is coming, it only wakes up the thread that is
responsible for that kind of input. The related patch is here:
https://patchwork.ozlabs.org/patch/806360/. An earlier discussion is here:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/331813.html

Thanks,
Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Question about ovn-controller performance

2017-09-14 Thread Ben Pfaff
On Thu, Sep 14, 2017 at 03:10:42PM -0600, Miguel Angel Ajo Pelayo wrote:
> On Thu, Sep 14, 2017 at 2:58 PM, Ben Pfaff  wrote:
> 
> > On Thu, Sep 14, 2017 at 02:52:48PM -0600, Miguel Angel Ajo Pelayo wrote:
> > > Although I see we have code for somehow packing stuff into conjunctions:
> > >
> > > https://github.com/openvswitch/ovs/blob/1ea2184501d43352ec40764f5eaa3c
> > bd07e3fee3/ovn/controller/lflow.c#L298
> > >
> > > I don't really understand (yet) what's it doing. Is it may be supposed to
> > > cover this case but we got into a bug?
> >
> > It's a naive, ad hoc algorithm that I implemented knowing at the time
> > that I didn't know what was actually important yet.  Now that we have an
> > example of a case where it's important to get it right, it's time to
> > take another look.
> >
> 
> Oh, sounds great Ben, thank you for handling this.
> 
> I'm spending some time reading the lflow.c code to understand what we have
> now.
> 
> I was wondering if, another improvement we could make in the future is
>  having ACL_Match sets, or something like that, to reduce the amount of ACL
> entries and lflow entries that we generate, and also make it easier for
> ovn-controller to group them. They would resemble the idea of security
> groups (for rules, not for members) in neutron, but not sure if that's too
> specific.

If there are higher-level concepts that often get used in practice, then
it makes sense to me to figure out whether there's a clean way to
integrate them in a general fashion.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Question about ovn-controller performance

2017-09-14 Thread Miguel Angel Ajo Pelayo
On Thu, Sep 14, 2017 at 2:58 PM, Ben Pfaff  wrote:

> On Thu, Sep 14, 2017 at 02:52:48PM -0600, Miguel Angel Ajo Pelayo wrote:
> > Although I see we have code for somehow packing stuff into conjunctions:
> >
> > https://github.com/openvswitch/ovs/blob/1ea2184501d43352ec40764f5eaa3c
> bd07e3fee3/ovn/controller/lflow.c#L298
> >
> > I don't really understand (yet) what's it doing. Is it may be supposed to
> > cover this case but we got into a bug?
>
> It's a naive, ad hoc algorithm that I implemented knowing at the time
> that I didn't know what was actually important yet.  Now that we have an
> example of a case where it's important to get it right, it's time to
> take another look.
>

Oh, sounds great Ben, thank you for handling this.

I'm spending some time reading the lflow.c code to understand what we have
now.

I was wondering if, another improvement we could make in the future is
 having ACL_Match sets, or something like that, to reduce the amount of ACL
entries and lflow entries that we generate, and also make it easier for
ovn-controller to group them. They would resemble the idea of security
groups (for rules, not for members) in neutron, but not sure if that's too
specific.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Question about ovn-controller performance

2017-09-14 Thread Han Zhou
Hi folks,

I'd like to discuss my thoughts, too. To simplify the discussion, let's
just use the Neutron sec-group use case as the context. If there are P
ports in same group and there are R rules in the sec-group (by default
there are 2 such ingress (OVN to-lport) rules in the neutron default
sec-group), then the number of flows (F) related to those rules on each HV
will be (assuming all ports are under same lswitch or different lswitch but
connected by lrouter(s)):

F = P x P x R x 3 (for each ACL there are 3 flows required to implement it)

Given the huge number of flows, it is expected that the CPU will be running
hot. But there are still 2 parts of the problem:
1. Reduce number of flows
2. Reduce CPU cost


1. Reduce number of flows

a. Consider local bound ports only (not implemented)
As discussed in today's OVN meeting, in theory, for the flows related to
ACLs, only the ones has inport/outport in match condition being the local
ports is required on the HV. If there are L ports bound locally, it should
be:

F = L x P x R x 3

This is a huge difference considering that L is usually much smaller than P.

b. Conjuncture
This is mentioned by Miguel with the link to an earlier discussion. If I
understand correctly, there are several cases can be optimized potentially.

i). When there are multiple ports on the same HV belongs to same group
(thus has same security rules/acls), conjuncture can help if the rules can
be combined like: inport/outport in {port1, port2, port3 ...}  ... . Assume the L port in above example belongs to G groups (L
> G). F become:

F = G x P x R x 3 + L

ii). When there are multiple sec-group rules with same "remote-group-id"
(by default it is the self group) but with different TCP/UDP ports.
Conjuncture can help if we can combine the rules like: ... tcp.src in {p1,
p2, p3 ...}. If R rules can be combined to S rules:

F = G x P x S x 3 + L + R

There can be other cases to utilize conjuncture but I think in OpenStack
Neutron these are the typical ones. However, it may be tricky where to do
the combining. For i), if Neutron has port-binding information it may do
some work, but it is not quit straightforward to me. For ii), Neutron
sec-group API doesn't support multiple tcp/udp ports in a single rule,
except for port range. So it seems ovn-controller is the best place to do
the combining.

2. Reduce CPU cost
There are some points for tuning.

a. Make sure the version used include this commit:
https://github.com/openvswitch/ovs/commit/74c760c8fe99d554b94423d49d13d5ca3dea0d9e
It saves CPU cycles dramatically. It is in 2.8 but not 2.7.

b. Disable ovsdb probe. Any input would trigger a full recomputing of
flows. OVN-SB probe is configurable by
external_ids:ovn-remote-probe-interval

c. Packet-in from local OVS, such as ECHO, will also trigger recomputing.
Multi-threading ovn-controller will solve the problem but it is not merged
yet. So there is no way to tune, but if you are sure there is no need for
the features require packet-in, you can disable it by hardcode (not
recommended though), maybe just for performance testing. And if you use
2.8, make sure "log" is not enabled in ACL (using Neutron won't have the
problem, since this is not integrated to Neutron yet).

Thanks,
Han

On Thu, Sep 14, 2017 at 11:51 AM, Miguel Angel Ajo Pelayo <
majop...@redhat.com> wrote:

> I will prepare a summary with what I found and post it on this thread.
>
> On Thu, Sep 14, 2017 at 12:28 PM, Ben Pfaff  wrote:
>
> > On Thu, Sep 14, 2017 at 10:39:28AM +0800, wang.qia...@zte.com.cn wrote:
> > > I configure 5 networks, every network have about 80 ports, the total
> > ports
> > > is 400, all in same security group.
> > >
> > > When I bind some port on HVs, the ovn-controller is always running with
> > > 100% cpu, and the total openflow table entities in ovs is more than
> > > 300,000. Most of the entities is table 52, worked as src ip filter.
> >
> > It sounds like some of the OVN folks at Red Hat have recently found the
> > same issue during testing.  I'm expecting a more detailed report from
> > them soon.
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Question about ovn-controller performance

2017-09-14 Thread Miguel Angel Ajo Pelayo
I will prepare a summary with what I found and post it on this thread.

On Thu, Sep 14, 2017 at 12:28 PM, Ben Pfaff  wrote:

> On Thu, Sep 14, 2017 at 10:39:28AM +0800, wang.qia...@zte.com.cn wrote:
> > I configure 5 networks, every network have about 80 ports, the total
> ports
> > is 400, all in same security group.
> >
> > When I bind some port on HVs, the ovn-controller is always running with
> > 100% cpu, and the total openflow table entities in ovs is more than
> > 300,000. Most of the entities is table 52, worked as src ip filter.
>
> It sounds like some of the OVN folks at Red Hat have recently found the
> same issue during testing.  I'm expecting a more detailed report from
> them soon.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Question about ovn-controller performance

2017-09-14 Thread Ben Pfaff
On Thu, Sep 14, 2017 at 10:39:28AM +0800, wang.qia...@zte.com.cn wrote:
> I configure 5 networks, every network have about 80 ports, the total ports 
> is 400, all in same security group.
> 
> When I bind some port on HVs, the ovn-controller is always running with 
> 100% cpu, and the total openflow table entities in ovs is more than 
> 300,000. Most of the entities is table 52, worked as src ip filter.

It sounds like some of the OVN folks at Red Hat have recently found the
same issue during testing.  I'm expecting a more detailed report from
them soon.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] Question about ovn-controller performance

2017-09-13 Thread wang . qianyu
I configure 5 networks, every network have about 80 ports, the total ports 
is 400, all in same security group.

When I bind some port on HVs, the ovn-controller is always running with 
100% cpu, and the total openflow table entities in ovs is more than 
300,000. Most of the entities is table 52, worked as src ip filter.

Could some one tell me how to reduce the flows or how to make 
ovn-controller work more efficient or do some options to reduce the table 
number of acls?

The attachment is the dump of ovn-sb.

Thanks.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev