> On Tue, Jun 11, 2019 at 11:54:31AM +0200, Lorenzo Bianconi wrote:
> > > On Wed, May 29, 2019 at 04:05:07PM +0200, Lorenzo Bianconi wrote:
> > > > > On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> > > > > > Add Controller_Event table to OVN SBDB in order to
> > > > > > report CMS related event.
> > > > > > Introduce event_table hashmap array and controller_event related
> > > > > > structures to ovn-controller in order to track pending events
> > > > > > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> > > > > > array with event_table ovn-sbdb table
> > > > > > 
> > > > > > Signed-off-by: Mark Michelson <[email protected]>
> > > > > > Co-authored-by: Mark Michelson <[email protected]>
> > > > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > 
> > > ...
> > > 
> > > > > 4. What is the tolerance for events that are never delivered or that 
> > > > > are
> > > > >    delivered more than once?  What can actually be guaranteed, given
> > > > >    that the database can die and that ovn-controller can die?  (Also,
> > > > >    OVSDB transactions cannot guarantee exactly-once semantics in 
> > > > > corner
> > > > >    cases unless the transactions are idempotent.)
> > > > 
> > > > If the ovn-controller dies I think there is no too much we can do, 
> > > > events will
> > > > be lost until the controller restarts properly.
> > > > If ovn-northd or the connection to the db dies, controller_event_run() 
> > > > will not
> > > > manage the Controller_Event table and pinctrl_handle_event() will queue 
> > > > the
> > > > pending events in the event_table hash until the upper limit is reached.
> > > > We can probably add a garbage collector for the pending events in the 
> > > > table.
> > > > What do you think?
> > > 
> > > What's the consequence if an event is missed?  What's the consequence if
> > > an event is pushed two or more times?  It's easiest to design a
> > > distributed system so that it's OK if an event is delivered zero times
> > > or multiple times.  It's a little harder to design so that an event is
> > > delivered one or more times.  It's hardest to design so that an event is
> > > delivered exactly one time.
> > 
> > Hi Ben,
> > 
> > thx a lot for your comments,
> > 
> > > 
> > > There are the following obvious points of failure from these points of
> > > view:
> > > 
> > > 1. ovn-controller.  If it dies, it might not push an event that it
> > >    should.  When it comes back up, will it know to push the event that
> > >    it missed?  What about if it dies while it is pushing an event; is it
> > >    possible that it will push it again when it comes up?
> > 
> > This is probably not an issue since if the event is lost because the 
> > controller
> > is dead we will receive a new one when the controller comes back.
> > If the controller dies after sending the event to the db it will not receive
> > new events when it comes back
> > 
> > > 
> > > 2. The OVSDB protocol.  If the OVSDB connection dies after
> > >    ovn-controller's transaction is committed but before ovn-controller
> > >    receives the acknowledgment, then when it reconnects ovn-controller
> > >    might retry it, which could lead to an event being pushed two or more
> > >    times.
> > > 
> > > 3. ovsdb-server.  Clients don't typically use the OVSDB protocol feature
> > >    that ensures that a transaction is committed to stable storage before
> > >    it is acknowledged, so an event could get lost if ovsdb-server dies
> > >    after acknowledging a transaction but before it gets written to disk.
> > >    (Clustered OVSDB always does sync to stable storage though.)
> > > 
> > > 4. ovn-northd.  There is a race between ovn-northd acting on an event
> > >    and marking it handled (or deleting it).  There are also the same
> > >    OVSDB protocol and ovsdb-server races in the reverse direction.
> > 
> > I agree with you, even if these kind of events (duplicated events or 
> > duplicated
> > rows in the db) are quite unlikely since controller_event processing is 
> > done holding
> > pinctrl_mutex, they can happen. However I think these kind of events can be 
> > managed
> > by the CMS since the controller does not have the 'history' of already 
> > handled events.
> 
> OK.
> 
> Will you please document what is (not) guaranteed in the documentation
> somewhere?  It's important to write these things down or people are
> likely to make bad assumptions later.

ack, will do posting a formal series.

Regards,
Lorenzo
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to