Re: [ovs-dev] [RFC 1/3] OVN: introduce Controller_Event table

Ben Pfaff Wed, 05 Jun 2019 18:41:22 -0700

On Wed, May 29, 2019 at 04:05:07PM +0200, Lorenzo Bianconi wrote:
> > On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> > > Add Controller_Event table to OVN SBDB in order to
> > > report CMS related event.
> > > Introduce event_table hashmap array and controller_event related
> > > structures to ovn-controller in order to track pending events
> > > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> > > array with event_table ovn-sbdb table
> > > 
> > > Signed-off-by: Mark Michelson <[email protected]>
> > > Co-authored-by: Mark Michelson <[email protected]>
> > > Signed-off-by: Lorenzo Bianconi <[email protected]>


...

> > 4. What is the tolerance for events that are never delivered or that are
> >    delivered more than once?  What can actually be guaranteed, given
> >    that the database can die and that ovn-controller can die?  (Also,
> >    OVSDB transactions cannot guarantee exactly-once semantics in corner
> >    cases unless the transactions are idempotent.)
> 
> If the ovn-controller dies I think there is no too much we can do, events will
> be lost until the controller restarts properly.
> If ovn-northd or the connection to the db dies, controller_event_run() will 
> not
> manage the Controller_Event table and pinctrl_handle_event() will queue the
> pending events in the event_table hash until the upper limit is reached.
> We can probably add a garbage collector for the pending events in the table.
> What do you think?

What's the consequence if an event is missed?  What's the consequence if
an event is pushed two or more times?  It's easiest to design a
distributed system so that it's OK if an event is delivered zero times
or multiple times.  It's a little harder to design so that an event is
delivered one or more times.  It's hardest to design so that an event is
delivered exactly one time.

There are the following obvious points of failure from these points of
view:

1. ovn-controller.  If it dies, it might not push an event that it
   should.  When it comes back up, will it know to push the event that
   it missed?  What about if it dies while it is pushing an event; is it
   possible that it will push it again when it comes up?

2. The OVSDB protocol.  If the OVSDB connection dies after
   ovn-controller's transaction is committed but before ovn-controller
   receives the acknowledgment, then when it reconnects ovn-controller
   might retry it, which could lead to an event being pushed two or more
   times.

3. ovsdb-server.  Clients don't typically use the OVSDB protocol feature
   that ensures that a transaction is committed to stable storage before
   it is acknowledged, so an event could get lost if ovsdb-server dies
   after acknowledging a transaction but before it gets written to disk.
   (Clustered OVSDB always does sync to stable storage though.)

4. ovn-northd.  There is a race between ovn-northd acting on an event
   and marking it handled (or deleting it).  There are also the same
   OVSDB protocol and ovsdb-server races in the reverse direction.

We may be able to work around some or all of these issues if necessary.
Have you considered them?  How important are they?
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 1/3] OVN: introduce Controller_Event table

Reply via email to