Hello David,

Thanks for your great advices.

Some answer in line.

Regards

Olivier

Le 02/03/2015 08:18, David Lamparter a écrit :
On Wed, Feb 18, 2015 at 12:58:22PM +0100, Olivier Dugeon wrote:
In complement to our TE works already submit, we would implement the BGP
Link State extension (see
https://datatracker.ietf.org/doc/draft-ietf-idr-ls-distribution). For
that purpose, we need inter-process communication with OSPFd and ISISd
process.
Same needs are also necessary to implement Path Computation
Element (PCE - RFC 4655). The primary goal is to exchange Database
contains, in particular OSPF LSA and IS-IS LSP including TE information.
*sigh*  this has been coming for a long time - the IPC protocol between
zebra and the daemons needed to be extended (or even overhauled?) for a
long time.

Let me try to pull together a list of things that can't be done with the
current ZAPI socket protocol:
- LS distribution & PCE
- BFD peer status signalling & automatic session creation
- exchanging MPLS labels
- exchanging VPN route information (both intra- and inter-VRF)
- matching on route properties from another daemon when redistributing
- ... probably even more stuff I forgot

Some of these can probably be added into the existing protocol, but in
general what we have now can be described as anything but extensible.

I'm not saying you need to support all of these - I'm saying we need to
address extensibility.
Even if I just try to solve my current problem, I'm totally agree with you.
Adding such new communication between the various Quagga processes must be flexible and generic in order to
take into account further development and other way to use it.

It is exactly the spiritof my proposal and why I ask some advices. I would try to design(before developing) a generic communication system that take into account most of the requirements and let it flexible for further development.

In parallel, digging around al possibility for IPC, in particular pthread mechanism, I discover that Quagga used its own thread implementation instead of pthread. I don't know the complete history of Zebra/Quagga, but I suppose that when first code was written pthread was not supported by majority of system. So, if we go to a different system for communication between Quagga process, perhaps it is also the time to re-think the thread mechanism,
unless there is a valid reason (that I ignore, apologize) to keepit.

1/ Extend Zebra protocol. Vincent Jardin already point me that it is not
a good option as the Zebra protocol, and Zebra daemon are heavy
solicited for VPN and adding more traffic will have a bad effect on
performance. But, as it will used in a particular case, perhaps it is
not an issue.

2/ Move OSPF and ISIS database from user space to Shared Memory space.
Such architecture let others process / thread access to the database in
read_only mode, but what will be the impact in term of performance,
especially with large database ? In addition, it not gives the
possibility to send some commands to other process like the OSPF_API do.

3/ Implement a dedicated bus/protocol similar to the Zebra one using
socket. Part of code could be reuse (coming from Zebra and OSPF_API),
but, like Zebra protocol, it uses intensively data copy in memory (at
least 4 to transfer a message to one process). Again, with large
database, there could be some issue with performance.

4/ Implement a dedicated bus using Shared Memory and Semaphore/Mutex to
access the bus managing read/write mode. This option reduce the number
of time we copy data in memory (copy once, read multiple) but introduce
more complexity as we need to synchronise thread and process which could
be hard to debug. The objective is to add a dedicated thread per daemon
to manage the bus which will not disturb other thread in case of lock.
If it is powerful and provide good performance, it could be a candidate
to replace the Zebra communication based on socket to improve performance.
There are 2 independent questions here:
- should this be a separate communication channel or should it be
   integrated with zebra communications?
- what transport medium should this use, shm or socket?

Your options match up mostly (though not exactly):
1) = "integrated, socket"
2) = "separate,   shm"
3) = "separate,   socket"
4) = "integrated, shm"

I don't have a well-founded opinion on what to do (yet), though I'd like
to make the following arguments:
- shm is not neccessarily *noticably* faster than sockets.  Sure it
   saves some copying and kernel calls, but if the overhead goes from 2%
   to 1.5% you haven't won much.
- shm should still use a well-isolated API/wrappers.  In fact I'd argue
   the API should be the same between sockets or shm.  Accessing shm
   directly without such wrappers is a recipe for crashes.
It is exactly my intention. I would write a ZBus library (zbus.c and zbus.h) that offer a common API for all communication and that reuse as much as possible existing
code (e.g. stream API).
- shm doesn't imply locking.  Particularly, RCU might help.
After looking to IPV literature, and in particular the reader/writer problem, I think that our problem is quite different. The kind of communication is more a problem of 'write once' then 'read multiple' to transmit some information like we do with a socket or message queuing. Using SHM means that we need to lock the shm before the process start to write a new message and until we are sure that all readers consume the message.Of course, we could use a 'write once' / 'read once' system, but we loose the benefit that the message could be address to several process (e.g. zebra advertise all process on a modification of interface parameters) avoiding to write the message multiple. Making a parallel to network protocol, we need a multicast communication
system.So, perhaps a multicast socket is a good approach.
- socket protocols should probably use some "standard" external encoding
   library, simply to be more usable from other programming languages.
- I don't see much gain from forcing all communication through a single
   point, but I do think we should use some uniform encoding & mechanism.
   If you use shm-based messaging, we should probably use that
   everywhere.  Same if you use protobuf over sockets, it should be
   protobuf over sockets everywhere.
Yes of course. Currently ZEBRA API and OSPF API don't use the same semantic. I'm in favour of a simple encoding schema based on TLV like routing protocol used.

NB: I'm not against SHM, but I do think SHM is more difficult to get
right, and it's not an automatic performance win.  I did some thinking
about a shared memory RCU-based replacement for ZAPI, but never had the
time to try that.  It probably *does* help moving Quagga towards
supporting multiple threads in the individual daemons.

   [quote moved]
But, such exchange could be useful for other purpose like hot restart,
monitoring ... OSPF already provide such facility through the OSPF_API,
but it is dedicated to OSPFd only and we need to generalize it to other
Quagga daemon. From this API, we would take the capabilities to send
commands to a given process and get back some information, synchronously
(answer to the command) or asynchronously (LSA/LSP update).

We study several option for the implementation and would get some advise
from the community before really start coding. Up to now, we have
identify 4 options:

Option 1 and 2 have not our favour, but we are open to discussion. We
hesitate between option 3 and 4 and we appreciate greatly some advises
to help us making decision.
To be honest, I think this will need to be "evaluated" instead of
"decided".  Pick one, prototype implement it with the least effort
possible and show it.  You will have gained some insights from
implementing it, and we'll know how it performs...
Yes for sure. I'll try to design and test some ideas and submit them to the mailing list.

... ultimately, this may be something that needs doing by trial & error,
I'm afraid.


-David




_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to