On Wed, Jul 27, 2016 at 5:58 AM, Kevin Benton <ke...@benton.pub> wrote:
> > I'd like to see if we can solve the problems more generally. > > We've tried before but we very quickly run into competing requirements > with regards to eventual consistency. For example, asynchronous background > sync doesn't work if someone wants their backend to confirm that port > details are acceptable (e.g. mac isn't in use by some other system outside > of openstack). Then each backend has different methods for detecting what > is out of sync (e.g. config numbers, hashes, or just full syncs on startup) > that each come with their own requirements for how much data needs to be > resent when an inconsistency is detected. > > If we can come to some common ground of what is required by all of them, > then I would love to get some of this built into the ML2 framework. > However, we've discussed this at meetups/mid-cycles/summits and it > inevitably ends up with two people drawing furiously on a whiteboard, > someone crying in the corner, and everyone else arguing about the lack of > parametric polymorphism in Go. > Ha, yes, makes sense that this is really hard to solve in a way that works for everyone ... > Even between OVN and ODL in this thread, it sounds like the only thing in > common is a background worker that consumes from a queue of tasks in the > db. Maybe realistically the only common thing we can come up with is a > taskflow queue stored in the DB to solve the multiple workers issue... > To clarify, ODL has this background worker and the discussion was whether OVN should try to follow a similar approach. So far, my gut feeling is that it's far too complicated for the problems it would solve. There's one identified multiple-worker related race condition on updates, but I think we can solve that another way. > On Tue, Jul 26, 2016 at 11:31 AM, Russell Bryant <rbry...@redhat.com> > wrote: > >> >> >> On Fri, Jul 22, 2016 at 7:51 AM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> Thanks for the comments Amitabha. >>> Please see comments inline >>> >>> On Fri, Jul 22, 2016 at 5:50 AM, Amitabha Biswas <azbis...@gmail.com> >>> wrote: >>> >>>> Hi Numan, >>>> >>>> Thanks for the proposal. We have also been thinking about this use-case. >>>> >>>> If I’m reading this accurately (and I may not be), it seems that the >>>> proposal is to not have any OVN NB (CUD) operations (R operations outside >>>> the scope) done by the api_worker threads but rather by a new journal >>>> thread. >>>> >>>> >>> Correct. >>> >>> >>> >>>> If this is indeed the case, I’d like to consider the scenario when >>>> there any N neutron nodes, each node with M worker threads. The journal >>>> thread at the each node contain list of pending operations. Could there be >>>> (sequence) dependency in the pending operations amongst each the journal >>>> threads in the nodes that prevents them from getting applied (for e.g. >>>> Logical_Router_Port and Logical_Switch_Port inter-dependency), because we >>>> are returning success on neutron operations that have still not been >>>> committed to the NB DB. >>>> >>>> >>> I >>> ts a valid scenario and should be designed properly to handle such >>> scenarios in case we take this approach. >>> >> >> I believe a new table in the Neutron DB is used to synchronize all of >> the journal threads. >> >> Also note that OVN currently has no custom tables in the Neutron database >> and it would be *very* good to keep it that way if we can. >> >> >>> >>> >>> >>>> Couple of clarifications and thoughts below. >>>> >>>> Thanks >>>> Amitabha <abis...@us.ibm.com> >>>> >>>> On Jul 13, 2016, at 1:20 AM, Numan Siddique <nusid...@redhat.com> >>>> wrote: >>>> >>>> Adding the proper tags in subject >>>> >>>> On Wed, Jul 13, 2016 at 1:22 PM, Numan Siddique <nusid...@redhat.com> >>>> wrote: >>>> >>>>> Hi Neutrinos, >>>>> >>>>> Presently, In the OVN ML2 driver we have 2 ways to sync neutron DB and >>>>> OVN DB >>>>> - At neutron-server startup, OVN ML2 driver syncs the neutron DB and >>>>> OVN DB if sync mode is set to repair. >>>>> - Admin can run the "neutron-ovn-db-sync-util" to sync the DBs. >>>>> >>>>> Recently, in the v2 of networking-odl ML2 driver (Please see (1) below >>>>> which has more details). (ODL folks please correct me if I am wrong here) >>>>> >>>>> - a journal thread is created which does the CRUD operations of >>>>> neutron resources asynchronously (i.e it sends the REST APIs to the ODL >>>>> controller). >>>>> >>>> >>>> Would this be the equivalent of making OVSDB transactions to the OVN NB >>>> DB? >>>> >>> >>> Correct. >>> >>> >>> >>>> >>>> - a maintenance thread is created which does some cleanup >>>>> periodically and at startup does full sync if it detects ODL controller >>>>> cold reboot. >>>>> >>>>> >>>>> Few question I have >>>>> - can OVN ML2 driver take same or similar approach. Are there any >>>>> advantages in taking this approach ? One advantage is neutron resources >>>>> can >>>>> be created/updated/deleted even if the OVN ML2 driver has lost connection >>>>> to the ovsdb-server. The journal thread would eventually sync these >>>>> resources in the OVN DB. I would like to know the communities thoughts on >>>>> this. >>>>> >>>> >>>> >> I question whether making operations appear to be successful even when >> ovsdb-server is unreachable is a useful thing. API calls fail today if the >> Neutron db is unreachable. Why would we bend over backwards for the OVN >> database? >> >> If this was easy to do, sure, but this solution seems *incredibly* >> complex to me, so I see it as an absolute last resort. >> >> >> >>> If we can make it work, it would indeed be a huge plus for system wide >>>> upgrades and some corner cases in the code (ACL specifically), where the >>>> post_commit relies on all transactions to be successful and doesn’t revert >>>> the neutron db if something fails. >>>> >>> >>> >> Can we just improve the ML2 framework to make this problem easier to deal >> with? This problem would affect several drivers. Driver specific partial >> solutions just keep getting replicated. I'd like to see if we can solve >> the problems more generally. >> >> >> >>> >>> >>> >>>> >>>> >>>>> - Are there are other ML2 drivers which might have to handle the DB >>>>> sync's (cases where the other controllers also maintain their own DBs) and >>>>> how they are handling it ? >>>>> >>>>> - Can a common approach be taken to sync the neutron DB and >>>>> controller DBs ? >>>>> >>>>> >>>>> >>>>> ----------------------------------------------------------------------------------------------------------- >>>>> >>>>> (1) >>>>> Sync threads created by networking-odl ML2 driver >>>>> -------------------------------------------------- >>>>> ODL ML2 driver creates 2 threads (threading.Thread module) at init >>>>> - Journal thread >>>>> - Maintenance thread >>>>> >>>>> Journal thread >>>>> ---------------- >>>>> The journal module creates a new journal table by name >>>>> “opendaylightjournal” - >>>>> https://github.com/openstack/networking-odl/blob/master/networking_odl/db/models.py#L23 >>>>> >>>>> Journal thread will be in loop waiting for the sync event from the ODL >>>>> ML2 driver. >>>>> >>>>> - ODL ML2 driver resource (network, subnet, port) precommit functions >>>>> when called by the ML2 plugin adds an entry in the “opendaylightjournal” >>>>> table with the resource data and sets the journal operation state for this >>>>> entry to “PENDING”. >>>>> - The corresponding resource postcommit function of the ODL ML2 >>>>> plugin when called, sets the sync event flag. >>>>> - A timer is also created which sets the sync event flag when it >>>>> expires (the default value is 10 seconds). >>>>> - Journal thread wakes up, looks into the “opendaylightjournal” table >>>>> with the entries with state “pending” and runs the CRUD operation on those >>>>> resources in the ODL DB. Once done, it sets the state to “completed”. >>>>> >>>>> Maintenance thread >>>>> ------------------ >>>>> Maintenance thread does 3 operations >>>>> - JournalCleanup - Delete completed rows from journal table >>>>> “opendaylightjournal”. >>>>> - CleanupProcessing - Mark orphaned processing rows to pending. >>>>> - Full sync - Re-sync when detecting an ODL "cold reboot”. >>>>> >>>>> >>>>> >>>>> Thanks >>>>> Numan >>>>> >>>>> >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org >>>> ?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> >>>> >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: >>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> >>> >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >> >> >> -- >> Russell Bryant >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > -- Russell Bryant
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev