Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
On Thu, Nov 21, 2013 at 12:34:33PM -0800, Gary Duan gd...@varmour.com wrote: Advanced service plugins don't have two-step transition today. IMO, If vendor plugins/drivers don't maintain their own databases for these services, it might not be urgent to add these steps in the plugin. I agree. OVS/linux bridge plugin (or mechanism driver) is in theory. Unfortunately most of controller based plugin isn't. They update neutron db and delegate the requests to the controller. This is common pattern. No polling or something. It is very fragile. For example restarting neutron-server during processing requests easily causes inconsistency between neutron db and controller side. Errors during delegation of requests tend to be ignored. Do we want to address it to some extent by framework (ie ML2 plugin)? or just leave it and declare that's the problem of plugin/mechanism driver? How to make sure database and back-end implementation in sync need more thought. As configuring backend device can be an a-sync process, rollback database tables can be cumbersome. -- Isaku Yamahata isaku.yamah...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
On Wed, Nov 20, 2013 at 10:16:46PM -0800, Gary Duan gd...@varmour.com wrote: Hi, Isaku and Edgar, Hi. As part of the effort to implement L3 router service type framework, I have reworked L3 plugin to introduce a 2-step process, precommit and postcommit, similar to ML2. If you plan to work on L3 code, we can collaborate. Sure, let's collaborate. This is discussion phase at this moment. I envisage that our plan will be - 1st step: introduce 2-step transition to ML2 plugin (and hope other simple plugin will follow) - 2nd step: introduce locking protocol or any other mechanism like async update similar NVP, or taskflow... (design and implementation) ... - Nth step: introduce debugging/test framework e.g. insert hooks to trigger artificial sleep or context switch in debug mode in order to make race more likely to happen https://blueprints.launchpad.net/neutron/+spec/l3-router-service-type-framework Is there any code publicly available? Also, for advanced services such as FW and LBaas, there already is a state transition logic in the plugin. For example, a firewall instance can have CREATE, UPDATE and DELETE_PENDING states. Oh great! Advanced services have more complex state than core plugin, I suppose. Are you aware of further issues? Does they require further synchronization in addition to 2-step transition? Like lock, serialization, async update... Probably we can learn from other projects, nova, cinder... thanks, -- Isaku Yamahata isaku.yamah...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
See inline, On Thu, Nov 21, 2013 at 2:19 AM, Isaku Yamahata isaku.yamah...@gmail.comwrote: On Wed, Nov 20, 2013 at 10:16:46PM -0800, Gary Duan gd...@varmour.com wrote: Hi, Isaku and Edgar, Hi. As part of the effort to implement L3 router service type framework, I have reworked L3 plugin to introduce a 2-step process, precommit and postcommit, similar to ML2. If you plan to work on L3 code, we can collaborate. Sure, let's collaborate. This is discussion phase at this moment. I envisage that our plan will be - 1st step: introduce 2-step transition to ML2 plugin (and hope other simple plugin will follow) - 2nd step: introduce locking protocol or any other mechanism like async update similar NVP, or taskflow... (design and implementation) ... - Nth step: introduce debugging/test framework e.g. insert hooks to trigger artificial sleep or context switch in debug mode in order to make race more likely to happen https://blueprints.launchpad.net/neutron/+spec/l3-router-service-type-framework Is there any code publicly available? I will do some clean up and post the patch for discussion. Also, for advanced services such as FW and LBaas, there already is a state transition logic in the plugin. For example, a firewall instance can have CREATE, UPDATE and DELETE_PENDING states. Oh great! Advanced services have more complex state than core plugin, I suppose. Are you aware of further issues? Does they require further synchronization in addition to 2-step transition? Like lock, serialization, async update... Probably we can learn from other projects, nova, cinder... Advanced service plugins don't have two-step transition today. IMO, If vendor plugins/drivers don't maintain their own databases for these services, it might not be urgent to add these steps in the plugin. How to make sure database and back-end implementation in sync need more thought. As configuring backend device can be an a-sync process, rollback database tables can be cumbersome. Thanks, Gary thanks, -- Isaku Yamahata isaku.yamah...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
On Tue, Nov 19, 2013 at 08:59:38AM -0800, Edgar Magana emag...@plumgrid.com wrote: Do you have in mind any implementation, any BP? We could actually work on this together, all plugins will get the benefits of a better implementation. Yes, let's work together. Here is my blueprint (it's somewhat old. So needs to be updated.) https://blueprints.launchpad.net/neutron/+spec/fix-races-of-db-based-plugin https://docs.google.com/file/d/0B4LNMvjOzyDuU2xNd0piS3JBMHM/edit Although I've thought of status change(adding more status) and locking protocol so far, TaskFlow seems something to look at before starting and another possible approach is decoupling backend process from api call as Salvatore suggested like NVP plugin. Even with taskflow or decoupling approach, some kind of enhancing status change/locking protocol will be necessary for performance of creating many ports at once. thanks, Thanks, Edgar On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For example, adding more states like pending_{create, delete, update}. Also we would like to consider serializing between operation of ports and subnets. or between operation of subnets and network depending on performance requirement. (Or carefully audit complex status change. i.e. changing port during subnet/network update/deletion.) I think it would be useful to establish reference locking policy for ML2 plugin for SDN controllers. Thoughts or comments? If this is considered useful and acceptable, I'm willing to help. thanks, Isaku Yamahata -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction ? No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation ? In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile @utils.synchronized('any-name', external=True) def create_port(self, context, port): Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Hi guys, New to the discussion, so please correct if I'm off base. if I understand correctly, the controllers handle the state/race-condition in a controller specify way. So what we are talking about is a new state-flow for inside of the openstack db to describe the state of the network as known to openstack. Is this correct? Thanks, john -Original Message- From: Isaku Yamahata [mailto:isaku.yamah...@gmail.com] Sent: Wednesday, November 20, 2013 5:46 AM To: OpenStack Development Mailing List (not for usage questions) Cc: Robert Kukura; Isaku Yamahata Subject: Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation On Tue, Nov 19, 2013 at 08:59:38AM -0800, Edgar Magana emag...@plumgrid.com wrote: Do you have in mind any implementation, any BP? We could actually work on this together, all plugins will get the benefits of a better implementation. Yes, let's work together. Here is my blueprint (it's somewhat old. So needs to be updated.) https://blueprints.launchpad.net/neutron/+spec/fix-races-of-db-based-plugin https://docs.google.com/file/d/0B4LNMvjOzyDuU2xNd0piS3JBMHM/edit Although I've thought of status change(adding more status) and locking protocol so far, TaskFlow seems something to look at before starting and another possible approach is decoupling backend process from api call as Salvatore suggested like NVP plugin. Even with taskflow or decoupling approach, some kind of enhancing status change/locking protocol will be necessary for performance of creating many ports at once. thanks, Thanks, Edgar On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For example, adding more states like pending_{create, delete, update}. Also we would like to consider serializing between operation of ports and subnets. or between operation of subnets and network depending on performance requirement. (Or carefully audit complex status change. i.e. changing port during subnet/network update/deletion.) I think it would be useful to establish reference locking policy for ML2 plugin for SDN controllers. Thoughts or comments? If this is considered useful and acceptable, I'm willing to help. thanks, Isaku Yamahata -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction ? No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation ? In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: Š The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
On Tue, Nov 19, 2013 at 11:22:46PM +0100, Salvatore Orlando sorla...@nicira.com wrote: For what is worth we have considered this aspect from the perspective of the Neutron plugin my team maintains (NVP) during the past release cycle. The synchronous model that most plugins with a controller on the backend currently implement is simple and convenient, but has some flaws: - reliability: the current approach where the plugin orchestrates the backend is not really optimal when it comes to ensuring your running configuration (backend/control plane) is in sync with your desired configuration (neutron/mgmt plane); moreover in some case, due to neutron internals, API calls to the backend are wrapped in a transaction too, leading to very long SQL transactions, which are quite dangerous indeed. It is not easy to recover from a failure due to an eventlet thread deadlocking with a mysql transaction, where by 'recover' I mean ensuring neutron and backend state are in sync. - maintainability: since handling rollback in case of failures on the backend and/or the db is cumbersome, this often leads to spaghetti code which is very hard to maintain regardless of the effort (ok, I agree here that this also depends on how good the devs are - most of the guys in my team are very good, but unfortunately they have me too...). - performance scalability: - roundtrips to the backend take a non-negligible toll on the duration of an API call, whereas most Neutron API calls should probably just terminate at the DB just like a nova boot call does not wait for the VM to be ACTIVE to return. - we need to keep some operation serialized in order to avoid the mentioned race issues For this reason we're progressively moving toward a change in the NVP plugin with a series of patches under this umbrella-blueprint [1]. Interesting. A question from curiosity. successful return of POST/PUT doesn't necessarily mean that creation/update was completed. So polling by client side is needed to wait for its completion. Right? Or some kind of callback? Especially vif creation case would matter. For answering the issues mentioned by Isaku, we've been looking at a task management library with an efficient and reliable set of abstractions for ensuring operations are properly ordered thus avoiding those races (I agree on the observation on the pre/post commit solution). This discussion has been started with core plugin, another resources like service (lbaas, fw, vpn...) have similar race condition, I think. Thanks, Isaku Yamahata We are currently looking at using celery [2] rather than taskflow; mostly because we've already have expertise on how to use it into our applications, and has very easy abstractions for workflow design, as well as for handling task failures. Said that, I think we're still open to switch to taskflow should we become aware of some very good reason for using it. Regards, Salvatore [1] https://blueprints.launchpad.net/neutron/+spec/nvp-async-backend-communication [2] http://docs.celeryproject.org/en/master/index.html On 19 November 2013 19:42, Joshua Harlow harlo...@yahoo-inc.com wrote: And also of course, nearly forgot a similar situation/review in heat. https://review.openstack.org/#/c/49440/ Except theres was/is dealing with stack locking (a heat concept). On 11/19/13 10:33 AM, Joshua Harlow harlo...@yahoo-inc.com wrote: If you start adding these states you might really want to consider the following work that is going on in other projects. It surely appears that everyone is starting to hit the same problem (and joining efforts would produce a more beneficial result). Relevant icehouse etherpads: - https://etherpad.openstack.org/p/CinderTaskFlowFSM - https://etherpad.openstack.org/p/icehouse-oslo-service-synchronization And of course my obvious plug for taskflow (which is designed to be a useful library to help in all these usages). - https://wiki.openstack.org/wiki/TaskFlow The states u just mentioned start to line-up with https://wiki.openstack.org/wiki/TaskFlow/States_of_Task_and_Flow If this sounds like a useful way to go (joining efforts) then lets see how we can make it possible. IRC: #openstack-state-management is where I am usually at. On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Let me take a look and circle back to you in a bit. This is a very sensitive part of the code, so we need to Handle properly any change. Thanks, Edgar On 11/20/13 5:46 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Tue, Nov 19, 2013 at 08:59:38AM -0800, Edgar Magana emag...@plumgrid.com wrote: Do you have in mind any implementation, any BP? We could actually work on this together, all plugins will get the benefits of a better implementation. Yes, let's work together. Here is my blueprint (it's somewhat old. So needs to be updated.) https://blueprints.launchpad.net/neutron/+spec/fix-races-of-db-based-plugi n https://docs.google.com/file/d/0B4LNMvjOzyDuU2xNd0piS3JBMHM/edit Although I've thought of status change(adding more status) and locking protocol so far, TaskFlow seems something to look at before starting and another possible approach is decoupling backend process from api call as Salvatore suggested like NVP plugin. Even with taskflow or decoupling approach, some kind of enhancing status change/locking protocol will be necessary for performance of creating many ports at once. thanks, Thanks, Edgar On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For example, adding more states like pending_{create, delete, update}. Also we would like to consider serializing between operation of ports and subnets. or between operation of subnets and network depending on performance requirement. (Or carefully audit complex status change. i.e. changing port during subnet/network update/deletion.) I think it would be useful to establish reference locking policy for ML2 plugin for SDN controllers. Thoughts or comments? If this is considered useful and acceptable, I'm willing to help. thanks, Isaku Yamahata -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction ? No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation ? In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: ? The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile ?
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Hi, Isaku and Edgar, As part of the effort to implement L3 router service type framework, I have reworked L3 plugin to introduce a 2-step process, precommit and postcommit, similar to ML2. If you plan to work on L3 code, we can collaborate. https://blueprints.launchpad.net/neutron/+spec/l3-router-service-type-framework Also, for advanced services such as FW and LBaas, there already is a state transition logic in the plugin. For example, a firewall instance can have CREATE, UPDATE and DELETE_PENDING states. Thanks, Gary On Wed, Nov 20, 2013 at 8:55 AM, Edgar Magana emag...@plumgrid.com wrote: Let me take a look and circle back to you in a bit. This is a very sensitive part of the code, so we need to Handle properly any change. Thanks, Edgar On 11/20/13 5:46 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Tue, Nov 19, 2013 at 08:59:38AM -0800, Edgar Magana emag...@plumgrid.com wrote: Do you have in mind any implementation, any BP? We could actually work on this together, all plugins will get the benefits of a better implementation. Yes, let's work together. Here is my blueprint (it's somewhat old. So needs to be updated.) https://blueprints.launchpad.net/neutron/+spec/fix-races-of-db-based-plugi n https://docs.google.com/file/d/0B4LNMvjOzyDuU2xNd0piS3JBMHM/edit Although I've thought of status change(adding more status) and locking protocol so far, TaskFlow seems something to look at before starting and another possible approach is decoupling backend process from api call as Salvatore suggested like NVP plugin. Even with taskflow or decoupling approach, some kind of enhancing status change/locking protocol will be necessary for performance of creating many ports at once. thanks, Thanks, Edgar On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For example, adding more states like pending_{create, delete, update}. Also we would like to consider serializing between operation of ports and subnets. or between operation of subnets and network depending on performance requirement. (Or carefully audit complex status change. i.e. changing port during subnet/network update/deletion.) I think it would be useful to establish reference locking policy for ML2 plugin for SDN controllers. Thoughts or comments? If this is considered useful and acceptable, I'm willing to help. thanks, Isaku Yamahata -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction ? No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation ? In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: ? The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For example, adding more states like pending_{create, delete, update}. Also we would like to consider serializing between operation of ports and subnets. or between operation of subnets and network depending on performance requirement. (Or carefully audit complex status change. i.e. changing port during subnet/network update/deletion.) I think it would be useful to establish reference locking policy for ML2 plugin for SDN controllers. Thoughts or comments? If this is considered useful and acceptable, I'm willing to help. thanks, Isaku Yamahata -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction ? No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation ? In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: … The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile … @utils.synchronized('any-name', external=True) def create_port(self, context, port): … Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Isaku Yamahata isaku.yamah...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Isaku, Do you have in mind any implementation, any BP? We could actually work on this together, all plugins will get the benefits of a better implementation. Thanks, Edgar On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For example, adding more states like pending_{create, delete, update}. Also we would like to consider serializing between operation of ports and subnets. or between operation of subnets and network depending on performance requirement. (Or carefully audit complex status change. i.e. changing port during subnet/network update/deletion.) I think it would be useful to establish reference locking policy for ML2 plugin for SDN controllers. Thoughts or comments? If this is considered useful and acceptable, I'm willing to help. thanks, Isaku Yamahata -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction ? No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation ? In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile @utils.synchronized('any-name', external=True) def create_port(self, context, port): Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Isaku Yamahata isaku.yamah...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
If you start adding these states you might really want to consider the following work that is going on in other projects. It surely appears that everyone is starting to hit the same problem (and joining efforts would produce a more beneficial result). Relevant icehouse etherpads: - https://etherpad.openstack.org/p/CinderTaskFlowFSM - https://etherpad.openstack.org/p/icehouse-oslo-service-synchronization And of course my obvious plug for taskflow (which is designed to be a useful library to help in all these usages). - https://wiki.openstack.org/wiki/TaskFlow The states u just mentioned start to line-up with https://wiki.openstack.org/wiki/TaskFlow/States_of_Task_and_Flow If this sounds like a useful way to go (joining efforts) then lets see how we can make it possible. IRC: #openstack-state-management is where I am usually at. On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For example, adding more states like pending_{create, delete, update}. Also we would like to consider serializing between operation of ports and subnets. or between operation of subnets and network depending on performance requirement. (Or carefully audit complex status change. i.e. changing port during subnet/network update/deletion.) I think it would be useful to establish reference locking policy for ML2 plugin for SDN controllers. Thoughts or comments? If this is considered useful and acceptable, I'm willing to help. thanks, Isaku Yamahata -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction ? No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation ? In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: Š The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile Š @utils.synchronized('any-name', external=True) def create_port(self, context, port): Š Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
And also of course, nearly forgot a similar situation/review in heat. https://review.openstack.org/#/c/49440/ Except theres was/is dealing with stack locking (a heat concept). On 11/19/13 10:33 AM, Joshua Harlow harlo...@yahoo-inc.com wrote: If you start adding these states you might really want to consider the following work that is going on in other projects. It surely appears that everyone is starting to hit the same problem (and joining efforts would produce a more beneficial result). Relevant icehouse etherpads: - https://etherpad.openstack.org/p/CinderTaskFlowFSM - https://etherpad.openstack.org/p/icehouse-oslo-service-synchronization And of course my obvious plug for taskflow (which is designed to be a useful library to help in all these usages). - https://wiki.openstack.org/wiki/TaskFlow The states u just mentioned start to line-up with https://wiki.openstack.org/wiki/TaskFlow/States_of_Task_and_Flow If this sounds like a useful way to go (joining efforts) then lets see how we can make it possible. IRC: #openstack-state-management is where I am usually at. On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For example, adding more states like pending_{create, delete, update}. Also we would like to consider serializing between operation of ports and subnets. or between operation of subnets and network depending on performance requirement. (Or carefully audit complex status change. i.e. changing port during subnet/network update/deletion.) I think it would be useful to establish reference locking policy for ML2 plugin for SDN controllers. Thoughts or comments? If this is considered useful and acceptable, I'm willing to help. thanks, Isaku Yamahata -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction ? No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation ? In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: Š The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile Š @utils.synchronized('any-name', external=True) def create_port(self, context, port): Š Obviously, this will create a serialization of all concurrent calls which will
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
For what is worth we have considered this aspect from the perspective of the Neutron plugin my team maintains (NVP) during the past release cycle. The synchronous model that most plugins with a controller on the backend currently implement is simple and convenient, but has some flaws: - reliability: the current approach where the plugin orchestrates the backend is not really optimal when it comes to ensuring your running configuration (backend/control plane) is in sync with your desired configuration (neutron/mgmt plane); moreover in some case, due to neutron internals, API calls to the backend are wrapped in a transaction too, leading to very long SQL transactions, which are quite dangerous indeed. It is not easy to recover from a failure due to an eventlet thread deadlocking with a mysql transaction, where by 'recover' I mean ensuring neutron and backend state are in sync. - maintainability: since handling rollback in case of failures on the backend and/or the db is cumbersome, this often leads to spaghetti code which is very hard to maintain regardless of the effort (ok, I agree here that this also depends on how good the devs are - most of the guys in my team are very good, but unfortunately they have me too...). - performance scalability: - roundtrips to the backend take a non-negligible toll on the duration of an API call, whereas most Neutron API calls should probably just terminate at the DB just like a nova boot call does not wait for the VM to be ACTIVE to return. - we need to keep some operation serialized in order to avoid the mentioned race issues For this reason we're progressively moving toward a change in the NVP plugin with a series of patches under this umbrella-blueprint [1]. For answering the issues mentioned by Isaku, we've been looking at a task management library with an efficient and reliable set of abstractions for ensuring operations are properly ordered thus avoiding those races (I agree on the observation on the pre/post commit solution). We are currently looking at using celery [2] rather than taskflow; mostly because we've already have expertise on how to use it into our applications, and has very easy abstractions for workflow design, as well as for handling task failures. Said that, I think we're still open to switch to taskflow should we become aware of some very good reason for using it. Regards, Salvatore [1] https://blueprints.launchpad.net/neutron/+spec/nvp-async-backend-communication [2] http://docs.celeryproject.org/en/master/index.html On 19 November 2013 19:42, Joshua Harlow harlo...@yahoo-inc.com wrote: And also of course, nearly forgot a similar situation/review in heat. https://review.openstack.org/#/c/49440/ Except theres was/is dealing with stack locking (a heat concept). On 11/19/13 10:33 AM, Joshua Harlow harlo...@yahoo-inc.com wrote: If you start adding these states you might really want to consider the following work that is going on in other projects. It surely appears that everyone is starting to hit the same problem (and joining efforts would produce a more beneficial result). Relevant icehouse etherpads: - https://etherpad.openstack.org/p/CinderTaskFlowFSM - https://etherpad.openstack.org/p/icehouse-oslo-service-synchronization And of course my obvious plug for taskflow (which is designed to be a useful library to help in all these usages). - https://wiki.openstack.org/wiki/TaskFlow The states u just mentioned start to line-up with https://wiki.openstack.org/wiki/TaskFlow/States_of_Task_and_Flow If this sounds like a useful way to go (joining efforts) then lets see how we can make it possible. IRC: #openstack-state-management is where I am usually at. On 11/19/13 3:57 AM, Isaku Yamahata isaku.yamah...@gmail.com wrote: On Mon, Nov 18, 2013 at 03:55:49PM -0500, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. Splitting works into two phase, pre/post, is good approach. But there still remains race window. Once the transaction is committed, the result is visible to outside. So the concurrent request to same resource will be racy. There is a window after pre_xxx_yyy before post_xxx_yyy() where other requests can be handled. The state machine needs to be enhanced, I think. (plugins need modification) For
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Can u explain a little how using celery achieves workflow reliability and avoids races (or mitigates spaghetti code)? To me celery acts as a way to distribute tasks, but does not deal with actually forming a easily understandable way of knowing that a piece of code that u design is actually going to go through the various state transitions (or states workflows) that u expect (this is a higher level mechanism that u can build on-top of a distribution system). So this means that NVP (or neutron or other?) must be maintaining an orchestration/engine layer on-top of celery to add on this additional set of code that 'drives' celery to accomplish a given workflow in a reliable manner. This starts to sound pretty similar to what taskflow is doing, not being a direct competitor to a distributed task queue such as celery but providing this higher level mechanism which adds on these benefits; since they are needed anyway. To me these benefits currently are (may get bigger in the future): 1. A way to define a workflow (in a way that is not tied to celery, since celeries '@task' decorator ties u to celeries internal implementation). - This includes ongoing work to determine how to easily define a state-machine in a way that is relevant to cinder (and other projects). 2. A way to keep track of the state that the workflow goes through (this brings along resumption, progress information… when u track at the right level). 3. A way to execute that workflow reliably (potentially using celery, rpc, local threads, other future hotness) - This becomes important when u ask yourself how did u plan on testing celery in the gate/jenkins/CI? 4. A way to guarantee that the workflow upon failure is *automatically* resumed by some other entity. More details @ http://www.slideshare.net/harlowja/taskflow-27820295 From: Salvatore Orlando sorla...@nicira.commailto:sorla...@nicira.com Date: Tuesday, November 19, 2013 2:22 PM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Cc: Joshua Harlow harlo...@yahoo-inc.commailto:harlo...@yahoo-inc.com, Isaku Yamahata isaku.yamah...@gmail.commailto:isaku.yamah...@gmail.com, Robert Kukura rkuk...@redhat.commailto:rkuk...@redhat.com Subject: Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation For what is worth we have considered this aspect from the perspective of the Neutron plugin my team maintains (NVP) during the past release cycle. The synchronous model that most plugins with a controller on the backend currently implement is simple and convenient, but has some flaws: - reliability: the current approach where the plugin orchestrates the backend is not really optimal when it comes to ensuring your running configuration (backend/control plane) is in sync with your desired configuration (neutron/mgmt plane); moreover in some case, due to neutron internals, API calls to the backend are wrapped in a transaction too, leading to very long SQL transactions, which are quite dangerous indeed. It is not easy to recover from a failure due to an eventlet thread deadlocking with a mysql transaction, where by 'recover' I mean ensuring neutron and backend state are in sync. - maintainability: since handling rollback in case of failures on the backend and/or the db is cumbersome, this often leads to spaghetti code which is very hard to maintain regardless of the effort (ok, I agree here that this also depends on how good the devs are - most of the guys in my team are very good, but unfortunately they have me too...). - performance scalability: - roundtrips to the backend take a non-negligible toll on the duration of an API call, whereas most Neutron API calls should probably just terminate at the DB just like a nova boot call does not wait for the VM to be ACTIVE to return. - we need to keep some operation serialized in order to avoid the mentioned race issues For this reason we're progressively moving toward a change in the NVP plugin with a series of patches under this umbrella-blueprint [1]. For answering the issues mentioned by Isaku, we've been looking at a task management library with an efficient and reliable set of abstractions for ensuring operations are properly ordered thus avoiding those races (I agree on the observation on the pre/post commit solution). We are currently looking at using celery [2] rather than taskflow; mostly because we've already have expertise on how to use it into our applications, and has very easy abstractions for workflow design, as well as for handling task failures. Said that, I think we're still open to switch to taskflow should we become aware of some very good reason for using it. Regards, Salvatore [1] https://blueprints.launchpad.net/neutron/+spec/nvp-async-backend-communication [2] http://docs.celeryproject.org/en/master/index.html On 19 November 2013
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Thanks for your reply Joshua, some more comments inline; however I think I'm probably going off topic here given the initial subject of this thread. Talking about Taskflow, and moving aside for the plugins, which have not a lot of community-wide interest, do you reckon Taskflow might be something that we might look at to improve reliability and efficiency in the nova/neutron interface? I have barely started to analyse how that could be done, but it something which might deserve another thread of discussion. Regards, Salvatore On 19 November 2013 23:56, Joshua Harlow harlo...@yahoo-inc.com wrote: Can u explain a little how using celery achieves workflow reliability and avoids races (or mitigates spaghetti code)? Celery surely does not do that. I have probably not been precise enough in my previous post. I meant to say that celery is merely the tool we are going to use for managing a distributed task queue. To me celery acts as a way to distribute tasks, but does not deal with actually forming a easily understandable way of knowing that a piece of code that u design is actually going to go through the various state transitions (or states workflows) that u expect (this is a higher level mechanism that u can build on-top of a distribution system). So this means that NVP (or neutron or other?) must be maintaining an orchestration/engine layer on-top of celery to add on this additional set of code that 'drives' celery to accomplish a given workflow in a reliable manner. That's pretty much correct. The neutron plugin will define workflows and manage state transitions. This starts to sound pretty similar to what taskflow is doing, not being a direct competitor to a distributed task queue such as celery but providing this higher level mechanism which adds on these benefits; since they are needed anyway. To me these benefits currently are (may get bigger in the future): 1. A way to define a workflow (in a way that is not tied to celery, since celeries '@task' decorator ties u to celeries internal implementation). - This includes ongoing work to determine how to easily define a state-machine in a way that is relevant to cinder (and other projects). 2. A way to keep track of the state that the workflow goes through (this brings along resumption, progress information… when u track at the right level). 3. A way to execute that workflow reliably (potentially using celery, rpc, local threads, other future hotness) - This becomes important when u ask yourself how did u plan on testing celery in the gate/jenkins/CI? 4. A way to guarantee that the workflow upon failure is *automatically* resumed by some other entity. Thanks for this clarification. I will surely look at how the NVP plugin can leverage taskflow; surely I don't want to reinvent the wheel - or, in other words, I surely don't want to write code if that code has already been written by somebody else for me. More details @ http://www.slideshare.net/harlowja/taskflow-27820295 From: Salvatore Orlando sorla...@nicira.com Date: Tuesday, November 19, 2013 2:22 PM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Cc: Joshua Harlow harlo...@yahoo-inc.com, Isaku Yamahata isaku.yamah...@gmail.com, Robert Kukura rkuk...@redhat.com Subject: Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation For what is worth we have considered this aspect from the perspective of the Neutron plugin my team maintains (NVP) during the past release cycle. The synchronous model that most plugins with a controller on the backend currently implement is simple and convenient, but has some flaws: - reliability: the current approach where the plugin orchestrates the backend is not really optimal when it comes to ensuring your running configuration (backend/control plane) is in sync with your desired configuration (neutron/mgmt plane); moreover in some case, due to neutron internals, API calls to the backend are wrapped in a transaction too, leading to very long SQL transactions, which are quite dangerous indeed. It is not easy to recover from a failure due to an eventlet thread deadlocking with a mysql transaction, where by 'recover' I mean ensuring neutron and backend state are in sync. - maintainability: since handling rollback in case of failures on the backend and/or the db is cumbersome, this often leads to spaghetti code which is very hard to maintain regardless of the effort (ok, I agree here that this also depends on how good the devs are - most of the guys in my team are very good, but unfortunately they have me too...). - performance scalability: - roundtrips to the backend take a non-negligible toll on the duration of an API call, whereas most Neutron API calls should probably just terminate at the DB just like a nova boot call does not wait for the VM to be ACTIVE to return
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Hi Edgar, We had a similar issue and worked around by something like the following (which I believe similar to what Aaron said): @@ -45,6 +45,8 @@ SNAT_RULE_PROPERTY = {OS_TENANT_ROUTER_RULE_KEY: SNAT_RULE} class MidonetResourceNotFound(q_exc.NotFound): message = _('MidoNet %(resource_type)s %(id)s could not be found') +from eventlet.semaphore import Semaphore +PORT_ALLOC_SEM = Semaphore() class MidonetPluginV2(db_base_plugin_v2.QuantumDbPluginV2, l3_db.L3_NAT_db_mixin): @@ -428,21 +430,31 @@ class MidonetPluginV2(db_base_plugin_v2.QuantumDbPluginV2, # set midonet port id to quantum port id and create a DB record. port_data['id'] = bridge_port.get_id() -session = context.session -with session.begin(subtransactions=True): -qport = super(MidonetPluginV2, self).create_port(context, port) -if is_compute_interface: -# get ip and mac from DB record. -fixed_ip = qport['fixed_ips'][0]['ip_address'] -mac = qport['mac_address'] - +qport = None +with PORT_ALLOC_SEM: +session = context.session +with session.begin(subtransactions=True): +qport = super(MidonetPluginV2, self).create_port(context, port) +if is_compute_interface: +# get ip and mac from DB record. +id = qport['id'] +fixed_ip = qport['fixed_ips'][0]['ip_address'] +mac = qport['mac_address'] + +if qport and is_compute_interface: +try: # create dhcp host entry under the bridge. dhcp_subnets = bridge.get_dhcp_subnets() if len(dhcp_subnets) 0: dhcp_subnets[0].add_dhcp_host().ip_addr(fixed_ip)\ .mac_addr(mac)\ .create() -return qport +return qport +except Exception: +self.delete_port(context, id) +return None +else: +return qport def update_port(self, context, id, port): We are also looking to fix this for upstream icehouce. Also, I have just submitted a (regression) test for this in tempest: https://review.openstack.org/#/c/57355 Hope the test makes sense. Thanks, Tomoe On Tue, Nov 19, 2013 at 5:25 AM, Edgar Magana emag...@plumgrid.com wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction – No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation – In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: … The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile … @utils.synchronized('any-name', external=True) def create_port(self, context, port): … Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Developers, This topic has been discussed before but I do not remember if we have a good solution or not. Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile @utils.synchronized('any-name', external=True) def create_port(self, context, port): Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
This actually doesn't solve the issue because if you run multiple neutron servers behind a loadbalancer you will still run into the same issue with the transaction on the database I believe. We handle this issue in the NVP plugin by removing the transaction and attempt to manually delete the port if the rest call to nvp failed. In the case where the port was unable to be deleted from the database (unlikely) the operational status of the port eventually goes to error state from a background thread that syncs the operational status from nvp to the neutron database. Then later we have to garbage collect ports in error state. On Mon, Nov 18, 2013 at 12:43 PM, Joshua Harlow harlo...@yahoo-inc.comwrote: An idea, make the lock more granular. Instead of @utils.synchronized('any-name') I wonder if u could do something like. with utils.synchronized('any-name-$device-id'): # Code here Then at least u won't be locking at the method level (which means no concurrency). Would that work? From: Edgar Magana emag...@plumgrid.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Monday, November 18, 2013 12:25 PM To: OpenStack List openstack-dev@lists.openstack.org Subject: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation Developers, This topic has been discussed before but I do not remember if we have a good solution or not. Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction – No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation – In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: … The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile … @utils.synchronized('any-name', external=True) def create_port(self, context, port): … Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction – No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation – In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: … The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile … @utils.synchronized('any-name', external=True) def create_port(self, context, port): … Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
On 11/18/2013 02:25 PM, Edgar Magana wrote: The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile … @utils.synchronized('any-name', external=True) def create_port(self, context, port): … Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Can you break it up into smaller independent transactions so that the number of operations is reasonable? Can you use finer-grained serialization? Is there a way (possibly involving changing the API) that you can do as much as possible (hopefully including all the stuff that is likely to fail) before starting the database transaction? Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Hi All, Thank you everybody for your input. It is clear that any solution requires changes at the plugin level (we were trying to avoid that). So, I am wondering if a re-factor of this code is needed of not (maybe not). The ML2 solution is probably the best alternative right now, so we may go for it. Any extra input is welcome! Thanks, Edgar On 11/18/13 12:55 PM, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile @utils.synchronized('any-name', external=True) def create_port(self, context, port): Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
On 11/18/2013 05:21 PM, Edgar Magana wrote: Hi All, Thank you everybody for your input. It is clear that any solution requires changes at the plugin level (we were trying to avoid that). So, I am wondering if a re-factor of this code is needed of not (maybe not). The ML2 solution is probably the best alternative right now, so we may go for it. Could be a good time to consider converting the plugin to an ML2 MechanismDriver. I'm happy to help work through the details of that if your are interested. -Bob Any extra input is welcome! Thanks, Edgar On 11/18/13 12:55 PM, Robert Kukura rkuk...@redhat.com wrote: On 11/18/2013 03:25 PM, Edgar Magana wrote: Developers, This topic has been discussed before but I do not remember if we have a good solution or not. The ML2 plugin addresses this by calling each MechanismDriver twice. The create_network_precommit() method is called as part of the DB transaction, and the create_network_postcommit() method is called after the transaction has been committed. Interactions with devices or controllers are done in the postcommit methods. If the postcommit method raises an exception, the plugin deletes that partially-created resource and returns the exception to the client. You might consider a similar approach in your plugin. -Bob Basically, if concurrent API calls are sent to Neutron, all of them are sent to the plug-in level where two actions have to be made: 1. DB transaction No just for data persistence but also to collect the information needed for the next action 2. Plug-in back-end implementation In our case is a call to the python library than consequentially calls PLUMgrid REST GW (soon SAL) For instance: def create_port(self, context, port): with context.session.begin(subtransactions=True): # Plugin DB - Port Create and Return port port_db = super(NeutronPluginPLUMgridV2, self).create_port(context, port) device_id = port_db[device_id] if port_db[device_owner] == network:router_gateway: router_db = self._get_router(context, device_id) else: router_db = None try: LOG.debug(_(PLUMgrid Library: create_port() called)) # Back-end implementation self._plumlib.create_port(port_db, router_db) except Exception: Š The way we have implemented at the plugin-level in Havana (even in Grizzly) is that both action are wrapped in the same transaction which automatically rolls back any operation done to its original state protecting mostly the DB of having any inconsistency state or left over data if the back-end part fails.=. The problem that we are experiencing is when concurrent calls to the same API are sent, the number of operation at the plug-in back-end are long enough to make the next concurrent API call to get stuck at the DB transaction level, which creates a hung state for the Neutron Server to the point that all concurrent API calls will fail. This can be fixed if we include some locking system such as calling: from neutron.common import utile Š @utils.synchronized('any-name', external=True) def create_port(self, context, port): Š Obviously, this will create a serialization of all concurrent calls which will ends up in having a really bad performance. Does anyone has a better solution? Thanks, Edgar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev