On Wed, Feb 14, 2018 at 9:34 PM, Han Zhou <zhou...@gmail.com> wrote: > > > On Wed, Feb 14, 2018 at 9:45 AM, Ben Pfaff <b...@ovn.org> wrote: > > > > On Wed, Feb 14, 2018 at 11:27:11AM +0100, Daniel Alvarez Sanchez wrote: > > > Thanks for your inputs. I need to look more carefully into the patch > you > > > submitted but it looks like, at least, we'll be reducing the number of > > > calls to Datum.__cmp__ which should be good. > > > > Thanks. Please do take a look. It's a micro-optimization but maybe > > it'll help? > > > > > I probably didn't explain it very well. Right now we have N processes > > > for Neutron server (in every node). Each of those opens a connection > > > to NB db and they subscribe to updates from certain tables. Each time > > > a change happens, ovsdb-server will send N update2 messages that has > > > to be processed in this "expensive" way by each of those N > > > processes. My proposal (yet to be refined) would be to now open N+1 > > > connections to ovsdb-server and only subscribe to notifications from 1 > > > of those. So every time a new change happens, ovsdb-server will send 1 > > > update2 message. This message will be processed (using Py IDL as we do > > > now) and once processed, send it (mcast maybe?) to the rest N > > > processes. This msg could be simply a Python object serialized and > > > we'd be saving all this Datum, Atom, etc. processing by doing it just > > > once. > > > Daniel, I understand that the update2 messages sending would consume NB > ovsdb-server CPU and processing those update would consume neutron server > process CPU. However, are we sure it is the bottleneck for port creation? > > From ovsdb-server point of view, sending updates to tens of clients should > not be the bottleneck, considering that we have a lot more clients on HVs > for SB ovsdb-server. > > From clients point of view, I think it is more of memory overhead than > CPU, and it also depends on how many neutron processes are running on the > same node. I didn't find neutron process CPU in your charts. I am hesitate > for such big change before we are clear about the bottleneck. The chart of > port creation time is very nice, but do we know which part of code > contributed to the linear growth? Do we have profiling for the time spent > in ovn_client.add_acls()? >
Here we are [0]. We see some spikes which are larger as the amount of ports increases but looks like the actual bottleneck is going to be when we're actually commiting the transaction [1]. I'll dig further though. [0 https://imgur.com/a/TmwbC [1] https://github.com/openvswitch/ovs/blob/master/python/ovs/db/idl.py#L1158 > > > OK. It's an optimization that does the work in one place rather than N > > places, so definitely a win from a CPU cost point of view, but it trades > > performance for increased complexity. It sounds like performance is > > really important so maybe the increased complexity is a fair trade. > > > > We might also be able to improve performance by using native code for > > some of the work. Were these tests done with the native code JSON > > parser that comes with OVS? It is dramatically faster than the Python > > code. > > > > > On Tue, Feb 13, 2018 at 8:32 PM, Ben Pfaff <b...@ovn.org> wrote: > > > > > > > Can you sketch the rows that are being inserted or modified when a > port > > > > is added? I would expect something like this as a minimum: > > > > > > > > * Insert one Logical_Switch_Port row. > > > > > > > > * Add pointer to Logical_Switch_Port to ports column in one > row > > > > in Logical_Switch. > > > > > > > > In addition it sounds like currently we're seeing: > > > > > > > > * Add one ACL row per security group rule. > > > > > > > > * Add pointers to ACL rows to acls column in one row in > > > > Logical_Switch. > > > > > > > This is what happens when we create a port in OpenStack (without > > > binding it) which belongs to a SG which allows ICMP and SSH traffic > > > and drops the rest [0] > > > > > > Basically, you were right and only thing missing was adding the new > > > address to the Address_Set table. > > > > OK. > > > > It sounds like the real scaling problem here is that for R security > > group rules and P ports, we have R*P rows in the ACL table. Is that > > correct? Should we aim to solve that problem? > > I think this might be the most valuable point to optimize for the > create_port scenario from Neutron. > I remember there was a patch for ACL group in OVN, so that instead of R*P > rows we will have only R + P rows, but didn't see it went through. > Is this also a good use case of conjuncture? > > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss