On Wed, May 29, 2019 at 04:05:07PM +0200, Lorenzo Bianconi wrote: > > On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote: > > > Add Controller_Event table to OVN SBDB in order to > > > report CMS related event. > > > Introduce event_table hashmap array and controller_event related > > > structures to ovn-controller in order to track pending events > > > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap > > > array with event_table ovn-sbdb table > > > > > > Signed-off-by: Mark Michelson <[email protected]> > > > Co-authored-by: Mark Michelson <[email protected]> > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
... > > 4. What is the tolerance for events that are never delivered or that are > > delivered more than once? What can actually be guaranteed, given > > that the database can die and that ovn-controller can die? (Also, > > OVSDB transactions cannot guarantee exactly-once semantics in corner > > cases unless the transactions are idempotent.) > > If the ovn-controller dies I think there is no too much we can do, events will > be lost until the controller restarts properly. > If ovn-northd or the connection to the db dies, controller_event_run() will > not > manage the Controller_Event table and pinctrl_handle_event() will queue the > pending events in the event_table hash until the upper limit is reached. > We can probably add a garbage collector for the pending events in the table. > What do you think? What's the consequence if an event is missed? What's the consequence if an event is pushed two or more times? It's easiest to design a distributed system so that it's OK if an event is delivered zero times or multiple times. It's a little harder to design so that an event is delivered one or more times. It's hardest to design so that an event is delivered exactly one time. There are the following obvious points of failure from these points of view: 1. ovn-controller. If it dies, it might not push an event that it should. When it comes back up, will it know to push the event that it missed? What about if it dies while it is pushing an event; is it possible that it will push it again when it comes up? 2. The OVSDB protocol. If the OVSDB connection dies after ovn-controller's transaction is committed but before ovn-controller receives the acknowledgment, then when it reconnects ovn-controller might retry it, which could lead to an event being pushed two or more times. 3. ovsdb-server. Clients don't typically use the OVSDB protocol feature that ensures that a transaction is committed to stable storage before it is acknowledged, so an event could get lost if ovsdb-server dies after acknowledging a transaction but before it gets written to disk. (Clustered OVSDB always does sync to stable storage though.) 4. ovn-northd. There is a race between ovn-northd acting on an event and marking it handled (or deleting it). There are also the same OVSDB protocol and ovsdb-server races in the reverse direction. We may be able to work around some or all of these issues if necessary. Have you considered them? How important are they? _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
