I've found out more about what is running slow in this scenario.
I've profiled the processing of the update2 messages and here you can
see the sequence of calls to __process_update2 (idl.py) when I'm
creating a new port via OpenStack on a system loaded with 800 ports
on the same Logical Switch:
1. Logical_Switch_Port 'insert'
2. Address_Set 'modify' (to add the new address, it takes ~ takes around
0.2 seconds
3. Logical_Switch_Port 'modify' (to add the new 8 ACLs) <- This takes > 2
seconds
4. ACL 'insert' x8 (one per ACL)
5. Logical_Switch_Port 'modify' ('up' = False)
6. Logical_Switch_Port 'insert' (this is exactly the same as 1, so it'll be
skipped)
7. Address_Set 'modify' (this is exactly the same as 2, so it'll be
skipped) still takes ~0.05-0.01 s
8. Logical_Switch_Port 'modify' (to add the new 8 ACLs, same as 3) still
takes ~0.5 seconds
9. ACL 'insert' x8 (one per ACL, same as 4)
10. Logical_Switch_Port 'modify' ('up' = False) same as 5
11. Port_Binding (SB) 'insert'
12. Port_Binding (SB) 'insert' (same as 11)
Half of those are dups and even they are noop, they consume times.
The most expensive operation is adding the new 8 ACLs to the acls set in LS
table.
(800 ports with 8 ACLs each makes that set to have 6400 elements).
NOTE: As you can see, we're trying to insert the address again into
Address_Sets so we should
bear this in mind if we go ahead with Lucas' suggestion about allowing dups
here.
It's obvious that we'll gain a lot by using Ports Sets for ACLs but, we'll
also need
to invest some time in finding out why we're getting dups and also trying
to optimize
the process_update2 method and its callees to make it faster. With those
last
two things I guess we can improve the performance a lot.
Also, creating a python C binding for this module could also help but that
seems like
a lot of work and still we would need to convert to/from C structures to
Python
objects. However, inserting or identifying dups on large sets would be way
faster.
I'm going to try out Ben's suggestions for optimizing the process_update*
methods,
and will also try to dig further about the dups. As process_update* seems a
bit
expensive, looks to me that calling it 26 times for a single port is a lot.
26 calls = 2*(1 (LS insert) + 1 (AS modify) + 1(LSP modify) + 8 (ACL
insert) + 1(LSP modify) + 1(PB insert))
Thoughts?
Thanks!
Daniel
On Thu, Feb 15, 2018 at 10:56 PM, Daniel Alvarez Sanchez <
[email protected]> wrote:
>
>
> On Wed, Feb 14, 2018 at 9:34 PM, Han Zhou <[email protected]> wrote:
>
>>
>>
>> On Wed, Feb 14, 2018 at 9:45 AM, Ben Pfaff <[email protected]> wrote:
>> >
>> > On Wed, Feb 14, 2018 at 11:27:11AM +0100, Daniel Alvarez Sanchez wrote:
>> > > Thanks for your inputs. I need to look more carefully into the patch
>> you
>> > > submitted but it looks like, at least, we'll be reducing the number of
>> > > calls to Datum.__cmp__ which should be good.
>> >
>> > Thanks. Please do take a look. It's a micro-optimization but maybe
>> > it'll help?
>> >
>> > > I probably didn't explain it very well. Right now we have N processes
>> > > for Neutron server (in every node). Each of those opens a connection
>> > > to NB db and they subscribe to updates from certain tables. Each time
>> > > a change happens, ovsdb-server will send N update2 messages that has
>> > > to be processed in this "expensive" way by each of those N
>> > > processes. My proposal (yet to be refined) would be to now open N+1
>> > > connections to ovsdb-server and only subscribe to notifications from 1
>> > > of those. So every time a new change happens, ovsdb-server will send 1
>> > > update2 message. This message will be processed (using Py IDL as we do
>> > > now) and once processed, send it (mcast maybe?) to the rest N
>> > > processes. This msg could be simply a Python object serialized and
>> > > we'd be saving all this Datum, Atom, etc. processing by doing it just
>> > > once.
>> >
>> Daniel, I understand that the update2 messages sending would consume NB
>> ovsdb-server CPU and processing those update would consume neutron server
>> process CPU. However, are we sure it is the bottleneck for port creation?
>>
>> From ovsdb-server point of view, sending updates to tens of clients
>> should not be the bottleneck, considering that we have a lot more clients
>> on HVs for SB ovsdb-server.
>>
>> From clients point of view, I think it is more of memory overhead than
>> CPU, and it also depends on how many neutron processes are running on the
>> same node. I didn't find neutron process CPU in your charts. I am hesitate
>> for such big change before we are clear about the bottleneck. The chart of
>> port creation time is very nice, but do we know which part of code
>> contributed to the linear growth? Do we have profiling for the time spent
>> in ovn_client.add_acls()?
>>
>
> Here we are [0]. We see some spikes which are larger as the amount of
> ports increases
> but looks like the actual bottleneck is going to be when we're actually
> commiting the
> transaction [1]. I'll dig further though.
>
> [0 https://imgur.com/a/TmwbC
> [1] https://github.com/openvswitch/ovs/blob/master/
> python/ovs/db/idl.py#L1158
>
>>
>> > OK. It's an optimization that does the work in one place rather than N
>> > places, so definitely a win from a CPU cost point of view, but it trades
>> > performance for increased complexity. It sounds like performance is
>> > really important so maybe the increased complexity is a fair trade.
>> >
>> > We might also be able to improve performance by using native code for
>> > some of the work. Were these tests done with the native code JSON
>> > parser that comes with OVS? It is dramatically faster than the Python
>> > code.
>> >
>> > > On Tue, Feb 13, 2018 at 8:32 PM, Ben Pfaff <[email protected]> wrote:
>> > >
>> > > > Can you sketch the rows that are being inserted or modified when a
>> port
>> > > > is added? I would expect something like this as a minimum:
>> > > >
>> > > > * Insert one Logical_Switch_Port row.
>> > > >
>> > > > * Add pointer to Logical_Switch_Port to ports column in one
>> row
>> > > > in Logical_Switch.
>> > > >
>> > > > In addition it sounds like currently we're seeing:
>> > > >
>> > > > * Add one ACL row per security group rule.
>> > > >
>> > > > * Add pointers to ACL rows to acls column in one row in
>> > > > Logical_Switch.
>> > > >
>> > > This is what happens when we create a port in OpenStack (without
>> > > binding it) which belongs to a SG which allows ICMP and SSH traffic
>> > > and drops the rest [0]
>> > >
>> > > Basically, you were right and only thing missing was adding the new
>> > > address to the Address_Set table.
>> >
>> > OK.
>> >
>> > It sounds like the real scaling problem here is that for R security
>> > group rules and P ports, we have R*P rows in the ACL table. Is that
>> > correct? Should we aim to solve that problem?
>>
>> I think this might be the most valuable point to optimize for the
>> create_port scenario from Neutron.
>> I remember there was a patch for ACL group in OVN, so that instead of R*P
>> rows we will have only R + P rows, but didn't see it went through.
>> Is this also a good use case of conjuncture?
>>
>> > _______________________________________________
>> > discuss mailing list
>> > [email protected]
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>
>>
>
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss