Hi, Isaku and Edgar, As part of the effort to implement L3 router service type framework, I have reworked L3 plugin to introduce a 2-step process, precommit and postcommit, similar to ML2. If you plan to work on L3 code, we can collaborate.
https://blueprints.launchpad.net/neutron/+spec/l3-router-service-type-framework Also, for advanced services such as FW and LBaas, there already is a state transition logic in the plugin. For example, a firewall instance can have CREATE, UPDATE and DELETE_PENDING states. Thanks, Gary On Wed, Nov 20, 2013 at 8:55 AM, Edgar Magana <emag...@plumgrid.com> wrote: > Let me take a look and circle back to you in a bit. This is a very > sensitive part of the code, so we need to > Handle properly any change. > > Thanks, > > Edgar > > On 11/20/13 5:46 AM, "Isaku Yamahata" <isaku.yamah...@gmail.com> wrote: > > >On Tue, Nov 19, 2013 at 08:59:38AM -0800, > >Edgar Magana <emag...@plumgrid.com> wrote: > > > >> Do you have in mind any implementation, any BP? > >> We could actually work on this together, all plugins will get the > >>benefits > >> of a better implementation. > > > >Yes, let's work together. Here is my blueprint (it's somewhat old. > >So needs to be updated.) > > > https://blueprints.launchpad.net/neutron/+spec/fix-races-of-db-based-plugi > >n > >https://docs.google.com/file/d/0B4LNMvjOzyDuU2xNd0piS3JBMHM/edit > > > >Although I've thought of status change(adding more status) and locking > >protocol so far, TaskFlow seems something to look at before starting and > >another possible approach is decoupling backend process from api call > >as Salvatore suggested like NVP plugin. > >Even with taskflow or decoupling approach, some kind of enhancing status > >change/locking protocol will be necessary for performance of creating > >many ports at once. > > > >thanks, > > > >> > >> Thanks, > >> > >> Edgar > >> > >> On 11/19/13 3:57 AM, "Isaku Yamahata" <isaku.yamah...@gmail.com> wrote: > >> > >> >On Mon, Nov 18, 2013 at 03:55:49PM -0500, > >> >Robert Kukura <rkuk...@redhat.com> wrote: > >> > > >> >> On 11/18/2013 03:25 PM, Edgar Magana wrote: > >> >> > Developers, > >> >> > > >> >> > This topic has been discussed before but I do not remember if we > >>have > >> >>a > >> >> > good solution or not. > >> >> > >> >> The ML2 plugin addresses this by calling each MechanismDriver twice. > >>The > >> >> create_network_precommit() method is called as part of the DB > >> >> transaction, and the create_network_postcommit() method is called > >>after > >> >> the transaction has been committed. Interactions with devices or > >> >> controllers are done in the postcommit methods. If the postcommit > >>method > >> >> raises an exception, the plugin deletes that partially-created > >>resource > >> >> and returns the exception to the client. You might consider a similar > >> >> approach in your plugin. > >> > > >> >Splitting works into two phase, pre/post, is good approach. > >> >But there still remains race window. > >> >Once the transaction is committed, the result is visible to outside. > >> >So the concurrent request to same resource will be racy. > >> >There is a window after pre_xxx_yyy before post_xxx_yyy() where > >> >other requests can be handled. > >> > > >> >The state machine needs to be enhanced, I think. (plugins need > >> >modification) > >> >For example, adding more states like pending_{create, delete, update}. > >> >Also we would like to consider serializing between operation of ports > >> >and subnets. or between operation of subnets and network depending on > >> >performance requirement. > >> >(Or carefully audit complex status change. i.e. > >> >changing port during subnet/network update/deletion.) > >> > > >> >I think it would be useful to establish reference locking policy > >> >for ML2 plugin for SDN controllers. > >> >Thoughts or comments? If this is considered useful and acceptable, > >> >I'm willing to help. > >> > > >> >thanks, > >> >Isaku Yamahata > >> > > >> >> -Bob > >> >> > >> >> > Basically, if concurrent API calls are sent to Neutron, all of them > >> >>are > >> >> > sent to the plug-in level where two actions have to be made: > >> >> > > >> >> > 1. DB transaction ? No just for data persistence but also to > >>collect > >> >>the > >> >> > information needed for the next action > >> >> > 2. Plug-in back-end implementation ? In our case is a call to the > >> >>python > >> >> > library than consequentially calls PLUMgrid REST GW (soon SAL) > >> >> > > >> >> > For instance: > >> >> > > >> >> > def create_port(self, context, port): > >> >> > with context.session.begin(subtransactions=True): > >> >> > # Plugin DB - Port Create and Return port > >> >> > port_db = super(NeutronPluginPLUMgridV2, > >> >> > self).create_port(context, > >> >> > > >> >> port) > >> >> > device_id = port_db["device_id"] > >> >> > if port_db["device_owner"] == "network:router_gateway": > >> >> > router_db = self._get_router(context, device_id) > >> >> > else: > >> >> > router_db = None > >> >> > try: > >> >> > LOG.debug(_("PLUMgrid Library: create_port() > >>called")) > >> >> > # Back-end implementation > >> >> > self._plumlib.create_port(port_db, router_db) > >> >> > except Exception: > >> >> > ? > >> >> > > >> >> > The way we have implemented at the plugin-level in Havana (even in > >> >> > Grizzly) is that both action are wrapped in the same "transaction" > >> >>which > >> >> > automatically rolls back any operation done to its original state > >> >> > protecting mostly the DB of having any inconsistency state or left > >> >>over > >> >> > data if the back-end part fails.=. > >> >> > The problem that we are experiencing is when concurrent calls to > >>the > >> >> > same API are sent, the number of operation at the plug-in back-end > >>are > >> >> > long enough to make the next concurrent API call to get stuck at > >>the > >> >>DB > >> >> > transaction level, which creates a hung state for the Neutron > >>Server > >> >>to > >> >> > the point that all concurrent API calls will fail. > >> >> > > >> >> > This can be fixed if we include some "locking" system such as > >>calling: > >> >> > > >> >> > from neutron.common import utile > >> >> > ? > >> >> > > >> >> > @utils.synchronized('any-name', external=True) > >> >> > def create_port(self, context, port): > >> >> > ? > >> >> > > >> >> > Obviously, this will create a serialization of all concurrent calls > >> >> > which will ends up in having a really bad performance. Does anyone > >> >>has a > >> >> > better solution? > >> >> > > >> >> > Thanks, > >> >> > > >> >> > Edgar > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > OpenStack-dev mailing list > >> >> > OpenStack-dev@lists.openstack.org > >> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> >> > > >> >> > >> >> > >> >> _______________________________________________ > >> >> OpenStack-dev mailing list > >> >> OpenStack-dev@lists.openstack.org > >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > >> >-- > >> >Isaku Yamahata <isaku.yamah...@gmail.com> > >> > >> > > > >-- > >Isaku Yamahata <isaku.yamah...@gmail.com> > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev