> On Tue, Jun 11, 2019 at 11:54:31AM +0200, Lorenzo Bianconi wrote: > > > On Wed, May 29, 2019 at 04:05:07PM +0200, Lorenzo Bianconi wrote: > > > > > On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote: > > > > > > Add Controller_Event table to OVN SBDB in order to > > > > > > report CMS related event. > > > > > > Introduce event_table hashmap array and controller_event related > > > > > > structures to ovn-controller in order to track pending events > > > > > > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap > > > > > > array with event_table ovn-sbdb table > > > > > > > > > > > > Signed-off-by: Mark Michelson <[email protected]> > > > > > > Co-authored-by: Mark Michelson <[email protected]> > > > > > > Signed-off-by: Lorenzo Bianconi <[email protected]> > > > > > > ... > > > > > > > > 4. What is the tolerance for events that are never delivered or that > > > > > are > > > > > delivered more than once? What can actually be guaranteed, given > > > > > that the database can die and that ovn-controller can die? (Also, > > > > > OVSDB transactions cannot guarantee exactly-once semantics in > > > > > corner > > > > > cases unless the transactions are idempotent.) > > > > > > > > If the ovn-controller dies I think there is no too much we can do, > > > > events will > > > > be lost until the controller restarts properly. > > > > If ovn-northd or the connection to the db dies, controller_event_run() > > > > will not > > > > manage the Controller_Event table and pinctrl_handle_event() will queue > > > > the > > > > pending events in the event_table hash until the upper limit is reached. > > > > We can probably add a garbage collector for the pending events in the > > > > table. > > > > What do you think? > > > > > > What's the consequence if an event is missed? What's the consequence if > > > an event is pushed two or more times? It's easiest to design a > > > distributed system so that it's OK if an event is delivered zero times > > > or multiple times. It's a little harder to design so that an event is > > > delivered one or more times. It's hardest to design so that an event is > > > delivered exactly one time. > > > > Hi Ben, > > > > thx a lot for your comments, > > > > > > > > There are the following obvious points of failure from these points of > > > view: > > > > > > 1. ovn-controller. If it dies, it might not push an event that it > > > should. When it comes back up, will it know to push the event that > > > it missed? What about if it dies while it is pushing an event; is it > > > possible that it will push it again when it comes up? > > > > This is probably not an issue since if the event is lost because the > > controller > > is dead we will receive a new one when the controller comes back. > > If the controller dies after sending the event to the db it will not receive > > new events when it comes back > > > > > > > > 2. The OVSDB protocol. If the OVSDB connection dies after > > > ovn-controller's transaction is committed but before ovn-controller > > > receives the acknowledgment, then when it reconnects ovn-controller > > > might retry it, which could lead to an event being pushed two or more > > > times. > > > > > > 3. ovsdb-server. Clients don't typically use the OVSDB protocol feature > > > that ensures that a transaction is committed to stable storage before > > > it is acknowledged, so an event could get lost if ovsdb-server dies > > > after acknowledging a transaction but before it gets written to disk. > > > (Clustered OVSDB always does sync to stable storage though.) > > > > > > 4. ovn-northd. There is a race between ovn-northd acting on an event > > > and marking it handled (or deleting it). There are also the same > > > OVSDB protocol and ovsdb-server races in the reverse direction. > > > > I agree with you, even if these kind of events (duplicated events or > > duplicated > > rows in the db) are quite unlikely since controller_event processing is > > done holding > > pinctrl_mutex, they can happen. However I think these kind of events can be > > managed > > by the CMS since the controller does not have the 'history' of already > > handled events. > > OK. > > Will you please document what is (not) guaranteed in the documentation > somewhere? It's important to write these things down or people are > likely to make bad assumptions later.
ack, will do posting a formal series. Regards, Lorenzo
_______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
