Hi Han, I had some time to get back to this. See my comments below.
On 10/21/19 4:01 PM, Han Zhou wrote:
Hi Mark,
Thanks for the patch. We had a brief discussion during last OVN meeting.
Let me put my points inlined.
On Fri, Oct 18, 2019 at 1:43 PM Mark Michelson <[email protected]
<mailto:[email protected]>> wrote:
>
> This proposes a set of patches to move pinctrl operations out of the
> ovn-controller process and into its own.
>
> The main reasons for doing this are the following:
> 1) Separating pinctrl makes it so that receiving a packet-in can't wake
> up ovn-controller.
To avoid waking up ovn-controller, it doesn't have to be in a separate
process. A thread with its own OVSDB IDL to SB DB can achieve the same,
as what this old patch did:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/332887.html
However, the problem of a separate SB connection introduced the concern
for scalability. There were discussions and thoughts for a separate
thread without introducing new SB connection, but once the two threads
share same SB connection, there has to be some synchronization between
the threads that ends up waking up or blocking each other whenever there
is a pinctrl processing that requires read/write SB data. The current
multi-thread implementation from Numan is a trade off that avoids new SB
connection but syncing with the main thread when SB data is needed. It
is perfect for pinctrl handling that doesn't require SB data, and then
wakes up ovn-controller for updating SB data.
I followed the discussions that resulted from the patch you sent. It
looks like the concerns are that you either have to
1) Have two separate connections to the SB database, resulting in double
the connections (this is what my patch does)
2) Have one connection to the SB database but synchronise the efforts of
the different concerns of ovn-controller (namely logical flow processing
and pinctrl processing).
1 is easy but resource intensive, and 2 is difficult but has the
potential for not having the same bottlenecks.
But this has me thinking. The current IDL code assumes that one IDL
client == one database connection. I suppose it may be possible to alter
the OVSDB IDL code so that a single connection could be shared by
multiple IDL clients. In other words, you could create the SB
connection, then create a separate thread. Each thread would then create
an IDL that makes use of the same connection. The OVSDB client code
would need to be altered to be able to notify multiple IDLs about
changes. The client would also need to be modified to be thread safe, in
the case that multiple threads want to write to the database at once.
This would allow for multithreading, and the controller code wouldn't
need to worry about synchronization. Each thread would have its own data
it manipulates, and the synchronization would be handled by the IDL itself.
Having said all this, though, it would not be a trivial task to
implement this. And so the question becomes, is it worth it?
Today (2.12) there were improvements on both ovn-controller and OVSDB
server, that alleviated the scale problems on both side.
- For ovn-controller, with incremental processing, when there is no
input change, it doesn't trigger flow recomputing, even when pinctrl
wakes up the main thread. The major concern may be when main thread does
need a recompute, it could block pinctrl processing for messages that
requires SB data accessing, such as ARP handling.
- For SB DB
- Active-active cluster alleviates the burden of a single server and
spread to 3 or 5. However, RAFT is not designed for scale. Write always
happen on the leader node, and the cost of cluster sync between leader
and follower becomes higher when number of nodes increases.
- The fast-resync feature (requiring active-active clustered mode)
avoids the slowness of data resync to all clients after DB
restart/failover. However, it doesn't help if ovsdb-server is overloaded
for regular updates and notifications during normal operations, given
that it is single threaded. Also, there are corner cases that
fast-resync doesn't help, e.g. when DB restart/failover happened just
after a compress, when all the transaction history is lost.
I'd suggest to reconsider these scalability concerns, the pros and cons
for a dedicated SB connection for pinctrl, before moving forward to this
approach.
> 2) Separating pinctrl allows for manipulating the southbound database
> directly while handling packets in, thus minimizing the need for storing
> local copies of data
This is true, but similar as point 1), it doesn't necessarily need a
separate process. The point is whether pinctrl (thread or process)
should use a dedicated SB connection.
Yep, you're definitely correct here. I guess my thought here was that if
the two threads have no need to share any memory or state, then it makes
more sense for them to exist as two processes instead.
However, what I suggested above about multiple IDL clients sharing a
connection would require that the code be multithreaded rather than
multiprocess.
> 3) This lays the groundwork for an easier eventual conversion of
> ovn-controller to DDlog, since the DDlog code would need to only handle
> flow creation, not packet in handling.
>
Agree with this point. This is probably the most important benefit of
separating pinctrl as a process. Although it is still possible to have
pinctrl as a thread sharing SB connection while converting the flow
processing part with DDlog, a separate process does make the conversion
cleaner.
In addition, a separate process introduces some operational costs,
although not a big concern. The tooling like ovn-ctl and packaging also
needs to be updated.
If you look at the ovn-northd implementation of DDLog, you'll see that
there is still a C process that controls everything. So it definitely
could still work that for ovn-controller, the C program would create a
separate thread for pinctrl and then perform the DDLog flow generation
in the main thread.
If we were to attempt to limit the SB DB connections but also allow for
multiple processes, then you'd likely need to create a third process
that actually performs database communications, and use IPC to
communicate between this third process and the pinctrl and controller
processes. I don't think this is any simpler than just having two
threads in a single process.
Thanks,
Han
I'm going to back out with this change for the time being. I'm going to
take a closer look at the IDL client code to see how feasible it would
be to make it thread-safe and allow for multiple clients to share a
single connection. No guarantee that I actually come forward with such a
patch any time soon though :)
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev