Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
Right now I'm leaning toward parent always does nothing + PluginWorker. Everything is forked, no special case for workers==0, and explicit designation of the only one case. Of course, it's still early in the day and I haven't had any coffee. I have updated the patch (https://review.openstack.org/#/c/189391/) to implement the above. I have it marked WIP because it doesn't have any tests and it modifies ServicePluginBase to have a call to get_processes(), but almost no service plugins actually inherit from it even though they implement its interface. The get_processes stuff in general could be fleshed out a bit as well. I just wanted to get something up for the purposes of discussion, so anyone interested in this particular problem should take a look and discuss. :) Terry __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
On Wed, Jun 10, 2015 at 2:25 PM, Neil Jerram neil.jer...@metaswitch.com wrote: On 08/06/15 22:02, Kevin Benton wrote: This depends on what initialize is supposed to be doing. If it's just a one-time sync with a back-end, then I think calling it once in each child process might not be what we want. I left a comment on Terry's patch. I think we should just use the callback manager to have a pre-fork and post-fork even to let drivers/plugins do whatever is appropriate for them. Can you point me to more detail about the callback manager (or registry)? I haven't come across that yet. http://docs.openstack.org/developer/neutron/devref/callbacks.html Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
On 08/06/15 22:02, Kevin Benton wrote: This depends on what initialize is supposed to be doing. If it's just a one-time sync with a back-end, then I think calling it once in each child process might not be what we want. I left a comment on Terry's patch. I think we should just use the callback manager to have a pre-fork and post-fork even to let drivers/plugins do whatever is appropriate for them. Can you point me to more detail about the callback manager (or registry)? I haven't come across that yet. Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
There are two classes of behavior that need to be handled: 1) There are things that can only be done after forking like setting up connections or spawning threads. 2) Some things should only be done once regardless of number of forks, like syncing. Even when you just want something to happen once, there is a good chance you may need that to happen post-fork. For example, syncing between OVSDB and neutron databases requires a socket connection and we don't want to have it going on 16 times. Case 1 is a little complex due to how we launch api/rpc worker threads. The obvious place to notify that a fork is complete is in the RpcWorker/WorkerService start() methods, since they are the only code outside of openstack.common.service that is really called post-fork. The problem is the case where api_workers==rpc_workers==0. In this case, the parent process calls start() on both, so you end up with two calls to your post-fork initialization and only one process. It is easy enough to pass whether or not start() should call the initialization, or whether we hold off and let the main process do it before calling waitall()--it's just a bit ugly (see my patch: https://review.openstack.org/#/c/189391/). Another option to handle case 1 would be to kill the case where you have a single process handling both workers. Always have the parent do nothing, and fork a process for each api/rpc worker treating workers=0 as workers=1. Then, start() can safely be used without hacking around the special case. Case 2 the problem is which process is *the one*? The fork() call happens in the weird bastardized eventlet-threading hybrid openstack.common ThreadGroup stuff, so who knows what order things are really happening. The easiest thing to detect as unique is the parent process through some plugin pre-fork call that stores the parent's pid. The problem with using the parent process for the 'do it once' case is that we have to be able to guarantee that all the forking is really done, and it happens eventlety. Maybe an accumulator that fires off an event when api_workers + rpc_workers fork() events received? Anyway, it's messy. Another option for 2 would be to let the plugin specify that it needs its own worker process. If so, spawn it call PluginWorker.start() which initializes after-fork. Seems like it could be cleaner. Right now I'm leaning toward parent always does nothing + PluginWorker. Everything is forked, no special case for workers==0, and explicit designation of the only one case. Of course, it's still early in the day and I haven't had any coffee. Terry - Original Message - This depends on what initialize is supposed to be doing. If it's just a one-time sync with a back-end, then I think calling it once in each child process might not be what we want. I left a comment on Terry's patch. I think we should just use the callback manager to have a pre-fork and post-fork even to let drivers/plugins do whatever is appropriate for them. On Mon, Jun 8, 2015 at 1:00 PM, Robert Kukura kuk...@noironetworks.com wrote: From a driver's perspective, it would be simpler, and I think sufficient, to change ML2 to call initialize() on drivers after the forking, rather than requiring drivers to know about forking. -Bob On 6/8/15 2:59 PM, Armando M. wrote: Interestingly, [1] was filed a few moments ago: [1] https://bugs.launchpad.net/neutron/+bug/1463129 On 2 June 2015 at 22:48, Salvatore Orlando sorla...@nicira.com wrote: I'm not sure if you can test this behaviour on your own because it requires the VMware plugin and the eventlet handling of backend response. But the issue was manifesting and had to be fixed with this mega-hack [1]. The issue was not about several workers executing the same code - the loopingcall was always started on a single thread. The issue I witnessed was that the other API workers just hang. There's probably something we need to understand about how eventlet can work safely with a os.fork (I just think they're not really made to work together!). Regardless, I did not spent too much time on it, because I thought that the multiple workers code might have been rewritten anyway by the pecan switch activities you're doing. Salvatore [1] https://review.openstack.org/#/c/180145/ On 3 June 2015 at 02:20, Kevin Benton blak...@gmail.com wrote: Sorry about the long delay. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Yes, just once. I wasn't able to reproduce the behavior you ran into. Maybe eventlet has some protection for this? Can you provide small sample code for the logging driver that does reproduce the issue? On Wed, May 13, 2015 at 5:19 AM, Neil Jerram
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
Interestingly, [1] was filed a few moments ago: [1] https://bugs.launchpad.net/neutron/+bug/1463129 On 2 June 2015 at 22:48, Salvatore Orlando sorla...@nicira.com wrote: I'm not sure if you can test this behaviour on your own because it requires the VMware plugin and the eventlet handling of backend response. But the issue was manifesting and had to be fixed with this mega-hack [1]. The issue was not about several workers executing the same code - the loopingcall was always started on a single thread. The issue I witnessed was that the other API workers just hang. There's probably something we need to understand about how eventlet can work safely with a os.fork (I just think they're not really made to work together!). Regardless, I did not spent too much time on it, because I thought that the multiple workers code might have been rewritten anyway by the pecan switch activities you're doing. Salvatore [1] https://review.openstack.org/#/c/180145/ On 3 June 2015 at 02:20, Kevin Benton blak...@gmail.com wrote: Sorry about the long delay. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Yes, just once. I wasn't able to reproduce the behavior you ran into. Maybe eventlet has some protection for this? Can you provide small sample code for the logging driver that does reproduce the issue? On Wed, May 13, 2015 at 5:19 AM, Neil Jerram neil.jer...@metaswitch.com wrote: Hi Kevin, Thanks for your response... On 08/05/15 08:43, Kevin Benton wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? https://github.com/Metaswitch/calico/tree/master/calico/openstack Basically, our driver's initialize method immediately kicks off a green thread to audit what is now in the Neutron DB, and to ensure that the other Calico components are consistent with that. I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. Interesting. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Thanks, Neil The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
From a driver's perspective, it would be simpler, and I think sufficient, to change ML2 to call initialize() on drivers after the forking, rather than requiring drivers to know about forking. -Bob On 6/8/15 2:59 PM, Armando M. wrote: Interestingly, [1] was filed a few moments ago: [1] https://bugs.launchpad.net/neutron/+bug/1463129 On 2 June 2015 at 22:48, Salvatore Orlando sorla...@nicira.com mailto:sorla...@nicira.com wrote: I'm not sure if you can test this behaviour on your own because it requires the VMware plugin and the eventlet handling of backend response. But the issue was manifesting and had to be fixed with this mega-hack [1]. The issue was not about several workers executing the same code - the loopingcall was always started on a single thread. The issue I witnessed was that the other API workers just hang. There's probably something we need to understand about how eventlet can work safely with a os.fork (I just think they're not really made to work together!). Regardless, I did not spent too much time on it, because I thought that the multiple workers code might have been rewritten anyway by the pecan switch activities you're doing. Salvatore [1] https://review.openstack.org/#/c/180145/ On 3 June 2015 at 02:20, Kevin Benton blak...@gmail.com mailto:blak...@gmail.com wrote: Sorry about the long delay. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Yes, just once. I wasn't able to reproduce the behavior you ran into. Maybe eventlet has some protection for this? Can you provide small sample code for the logging driver that does reproduce the issue? On Wed, May 13, 2015 at 5:19 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Hi Kevin, Thanks for your response... On 08/05/15 08:43, Kevin Benton wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? https://github.com/Metaswitch/calico/tree/master/calico/openstack Basically, our driver's initialize method immediately kicks off a green thread to audit what is now in the Neutron DB, and to ensure that the other Calico components are consistent with that. I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. Interesting. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Thanks, Neil The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
Right, I think there are use cases for both. I don't think it's a huge burden to have to know about it. I think it's actually quite important to understand when the initialization happens. -- Russell Bryant On 06/08/2015 05:02 PM, Kevin Benton wrote: This depends on what initialize is supposed to be doing. If it's just a one-time sync with a back-end, then I think calling it once in each child process might not be what we want. I left a comment on Terry's patch. I think we should just use the callback manager to have a pre-fork and post-fork even to let drivers/plugins do whatever is appropriate for them. On Mon, Jun 8, 2015 at 1:00 PM, Robert Kukura kuk...@noironetworks.com mailto:kuk...@noironetworks.com wrote: From a driver's perspective, it would be simpler, and I think sufficient, to change ML2 to call initialize() on drivers after the forking, rather than requiring drivers to know about forking. -Bob On 6/8/15 2:59 PM, Armando M. wrote: Interestingly, [1] was filed a few moments ago: [1] https://bugs.launchpad.net/neutron/+bug/1463129 On 2 June 2015 at 22:48, Salvatore Orlando sorla...@nicira.com mailto:sorla...@nicira.com wrote: I'm not sure if you can test this behaviour on your own because it requires the VMware plugin and the eventlet handling of backend response. But the issue was manifesting and had to be fixed with this mega-hack [1]. The issue was not about several workers executing the same code - the loopingcall was always started on a single thread. The issue I witnessed was that the other API workers just hang. There's probably something we need to understand about how eventlet can work safely with a os.fork (I just think they're not really made to work together!). Regardless, I did not spent too much time on it, because I thought that the multiple workers code might have been rewritten anyway by the pecan switch activities you're doing. Salvatore [1] https://review.openstack.org/#/c/180145/ On 3 June 2015 at 02:20, Kevin Benton blak...@gmail.com mailto:blak...@gmail.com wrote: Sorry about the long delay. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Yes, just once. I wasn't able to reproduce the behavior you ran into. Maybe eventlet has some protection for this? Can you provide small sample code for the logging driver that does reproduce the issue? On Wed, May 13, 2015 at 5:19 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Hi Kevin, Thanks for your response... On 08/05/15 08:43, Kevin Benton wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? https://github.com/Metaswitch/calico/tree/master/calico/openstack Basically, our driver's initialize method immediately kicks off a green thread to audit what is now in the Neutron DB, and to ensure that the other Calico components are consistent with that. I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. Interesting. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Thanks, Neil The only time I ever saw anything executed by a child process
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
This depends on what initialize is supposed to be doing. If it's just a one-time sync with a back-end, then I think calling it once in each child process might not be what we want. I left a comment on Terry's patch. I think we should just use the callback manager to have a pre-fork and post-fork even to let drivers/plugins do whatever is appropriate for them. On Mon, Jun 8, 2015 at 1:00 PM, Robert Kukura kuk...@noironetworks.com wrote: From a driver's perspective, it would be simpler, and I think sufficient, to change ML2 to call initialize() on drivers after the forking, rather than requiring drivers to know about forking. -Bob On 6/8/15 2:59 PM, Armando M. wrote: Interestingly, [1] was filed a few moments ago: [1] https://bugs.launchpad.net/neutron/+bug/1463129 On 2 June 2015 at 22:48, Salvatore Orlando sorla...@nicira.com wrote: I'm not sure if you can test this behaviour on your own because it requires the VMware plugin and the eventlet handling of backend response. But the issue was manifesting and had to be fixed with this mega-hack [1]. The issue was not about several workers executing the same code - the loopingcall was always started on a single thread. The issue I witnessed was that the other API workers just hang. There's probably something we need to understand about how eventlet can work safely with a os.fork (I just think they're not really made to work together!). Regardless, I did not spent too much time on it, because I thought that the multiple workers code might have been rewritten anyway by the pecan switch activities you're doing. Salvatore [1] https://review.openstack.org/#/c/180145/ On 3 June 2015 at 02:20, Kevin Benton blak...@gmail.com wrote: Sorry about the long delay. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Yes, just once. I wasn't able to reproduce the behavior you ran into. Maybe eventlet has some protection for this? Can you provide small sample code for the logging driver that does reproduce the issue? On Wed, May 13, 2015 at 5:19 AM, Neil Jerram neil.jer...@metaswitch.com wrote: Hi Kevin, Thanks for your response... On 08/05/15 08:43, Kevin Benton wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? https://github.com/Metaswitch/calico/tree/master/calico/openstack Basically, our driver's initialize method immediately kicks off a green thread to audit what is now in the Neutron DB, and to ensure that the other Calico components are consistent with that. I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. Interesting. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Thanks, Neil The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
Sorry about the long delay. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Yes, just once. I wasn't able to reproduce the behavior you ran into. Maybe eventlet has some protection for this? Can you provide small sample code for the logging driver that does reproduce the issue? On Wed, May 13, 2015 at 5:19 AM, Neil Jerram neil.jer...@metaswitch.com wrote: Hi Kevin, Thanks for your response... On 08/05/15 08:43, Kevin Benton wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? https://github.com/Metaswitch/calico/tree/master/calico/openstack Basically, our driver's initialize method immediately kicks off a green thread to audit what is now in the Neutron DB, and to ensure that the other Calico components are consistent with that. I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. Interesting. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Thanks, Neil The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
I'm not sure if you can test this behaviour on your own because it requires the VMware plugin and the eventlet handling of backend response. But the issue was manifesting and had to be fixed with this mega-hack [1]. The issue was not about several workers executing the same code - the loopingcall was always started on a single thread. The issue I witnessed was that the other API workers just hang. There's probably something we need to understand about how eventlet can work safely with a os.fork (I just think they're not really made to work together!). Regardless, I did not spent too much time on it, because I thought that the multiple workers code might have been rewritten anyway by the pecan switch activities you're doing. Salvatore [1] https://review.openstack.org/#/c/180145/ On 3 June 2015 at 02:20, Kevin Benton blak...@gmail.com wrote: Sorry about the long delay. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Yes, just once. I wasn't able to reproduce the behavior you ran into. Maybe eventlet has some protection for this? Can you provide small sample code for the logging driver that does reproduce the issue? On Wed, May 13, 2015 at 5:19 AM, Neil Jerram neil.jer...@metaswitch.com wrote: Hi Kevin, Thanks for your response... On 08/05/15 08:43, Kevin Benton wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? https://github.com/Metaswitch/calico/tree/master/calico/openstack Basically, our driver's initialize method immediately kicks off a green thread to audit what is now in the Neutron DB, and to ensure that the other Calico components are consistent with that. I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. Interesting. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Thanks, Neil The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
Hi Kevin, Thanks for your response... On 08/05/15 08:43, Kevin Benton wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? https://github.com/Metaswitch/calico/tree/master/calico/openstack Basically, our driver's initialize method immediately kicks off a green thread to audit what is now in the Neutron DB, and to ensure that the other Calico components are consistent with that. I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. Interesting. Even the LOG.error(KEVIN PID=%s network response: %s % (os.getpid(), r.text)) line? Surely the server would have forked before that line was executed - so what could prevent it from executing once in each forked process, and hence generating multiple logs? Thanks, Neil The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
Hi Salvatore, Thanks for your reply... On 08/05/15 09:20, Salvatore Orlando wrote: Just like the Neutron plugin manager, also ML2 driver manager ensure drivers are loaded only once regardless of the number of workers. What Kevin did proves that drivers are correctly loaded before forking (I reckon). Yes, up to a point. It seems clear that we can rely on the following events being ordered: 1. Mechanism drivers are instantiated (__init__) and initialized (initialize). 2. The Neutron server forks (into a number of copies as dictated by api_workers and rpc_workers). 3. Mechanism driver entry points such as create_port_pre/postcommit are called. However... However, forking is something to be careful about especially when using eventlet. For the plugin my team maintains we were creating a periodic task during plugin initialisation. This lead to an interesting condition where API workers were hanging [1]. This situation was fixed with a rather pedestrian fix - by adding a delay. Yes! This is precisely the same situation that I have. Currently I am also planning to 'fix' it by adding a delay of a few seconds. However that is not an amazing fix, because if there is something that a mechanism driver needs to do on startup, it would probably rather do it as soon as possible; and on the other hand because it involves guessing how long steps (1) and (2) above will take. Readers may be wondering why a mechanism driver needs to do something on startup? In general, the answer is so as to recheck the Neutron DB - i.e. any VMs/ports that should already exist - and ensure that the driver's downstream components are all correctly in sync with that. In Calico's case, that means auditing that the routing and iptables on each compute host match to the current VM and security configuration. This need is implied by the existence of the _postcommit entry points. When a mechanism driver is implemented using those entry points, it is possible for driver or downstream software to crash after the Neutron DB believes that a transaction has been committed, and leave dataplane state wrong. Clearly, then, when the driver or downstream software is restarted, it needs to resync against the standing Neutron DB. Generally speaking I would find useful to have a way to identify an API worker in order to designate a specific one for processing that should not be made redundant. On the other hand I self-object to the above statement by saying that API workers are not supposed to do this kind of processing, which should be deferred to some other helper process. +1 on both points :-) There could be a post_fork() mechanism driver entry point. It wouldn't matter which worker or helper process called it; the requirement would be simply that it would only be called once, after all the forking has occurred. Regards, Neil Salvatore [1] https://bugs.launchpad.net/vmware-nsx/+bug/1420278 On 8 May 2015 at 09:43, Kevin Benton blak...@gmail.com mailto:blak...@gmail.com wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe:
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
Just like the Neutron plugin manager, also ML2 driver manager ensure drivers are loaded only once regardless of the number of workers. What Kevin did proves that drivers are correctly loaded before forking (I reckon). However, forking is something to be careful about especially when using eventlet. For the plugin my team maintains we were creating a periodic task during plugin initialisation. This lead to an interesting condition where API workers were hanging [1]. This situation was fixed with a rather pedestrian fix - by adding a delay. Generally speaking I would find useful to have a way to identify an API worker in order to designate a specific one for processing that should not be made redundant. On the other hand I self-object to the above statement by saying that API workers are not supposed to do this kind of processing, which should be deferred to some other helper process. Salvatore [1] https://bugs.launchpad.net/vmware-nsx/+bug/1420278 On 8 May 2015 at 09:43, Kevin Benton blak...@gmail.com wrote: I'm not sure I understand the behavior you are seeing. When your mechanism driver gets initialized and kicks off processing, all of that should be happening in the parent PID. I don't know why your child processes start executing code that wasn't invoked. Can you provide a pointer to the code or give a sample that reproduces the issue? I modified the linuxbridge mech driver to try to reproduce it: http://paste.openstack.org/show/216859/ In the output, I never received any of the init code output I added more than once, including the function spawned using eventlet. The only time I ever saw anything executed by a child process was actual API requests (e.g. the create_port method). On Thu, May 7, 2015 at 6:08 AM, Neil Jerram neil.jer...@metaswitch.com wrote: Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron] Mechanism drivers and Neutron server forking?
Is there a design for how ML2 mechanism drivers are supposed to cope with the Neutron server forking? What I'm currently seeing, with api_workers = 2, is: - my mechanism driver gets instantiated and initialized, and immediately kicks off some processing that involves communicating over the network - the Neutron server process then forks into multiple copies - multiple copies of my driver's network processing then continue, and interfere badly with each other :-) I think what I should do is: - wait until any forking has happened - then decide (somehow) which mechanism driver is going to kick off that processing, and do that. But how can a mechanism driver know when the Neutron server forking has happened? Thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev