Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
Hello, Please tell me if your experience is similar to what I experienced: 1. I would see *at most one* MySQL server has gone away error for each process that was spawned as an API worker. I saw them within a minute of spawning the workers and then I did not see these errors anymore until I restarted the server and spawned new processes. 2. I noted in patch set 7 the line of code that completely fixed this for me. Please confirm that you have applied a patch that includes this fix. https://review.openstack.org/#/c/37131/7/neutron/wsgi.py 3. I did not change anything with pool_recycle or idle_interval in my config files. All I did was set api_workers to the number of workers that I wanted to spawn. The line of code with my comment in it above was sufficient for me. It could be that there is another cause for the errors that you're seeing. For example, is there a max connections setting in mysql that might be exceeded when you spawn multiple workers? More detail would be helpful. Cheers, Carl On Wed, Nov 20, 2013 at 7:40 PM, Zhongyue Luo zhongyue@intel.com wrote: Carl, By 2006 I mean the MySQL server has gong away error code. The error message was still appearing when idle_timeout is set to 1 and the quantum API server did not work in my case. Could you perhaps share your conf file when applying this patch? Thanks. On Thu, Nov 21, 2013 at 3:34 AM, Carl Baldwin c...@ecbaldwin.net wrote: Hi, sorry for the delay in response. I'm glad to look at it. Can you be more specific about the error? Maybe paste the error your seeing in paste.openstack.org? I don't find any reference to 2006. Maybe I'm missing something. Also, is the patch that you applied the most recent? With the final version of the patch it was no longer necessary for me to set pool_recycle or idle_interval. Thanks, Carl On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo zhongyue@intel.com wrote: Carl, Yingjun, I'm still getting the 2006 error even by configuring idle_interval to 1. I applied the patch to the RDO havana dist on centos 6.4. Are there any other options I should be considering such as min/max pool size or use_tpool? Thanks. On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com wrote: This pool_recycle parameter is already configurable using the idle_timeout configuration variable in neutron.conf. I tested this with a value of 1 as suggested and it did get rid of the mysql server gone away messages. This is a great clue but I think I would like a long-term solution that allows the end-user to still configure this like they were before. I'm currently thinking along the lines of calling something like pool.dispose() in each child immediately after it is spawned. I think this should invalidate all of the existing connections so that when a connection is checked out of the pool a new one will be created fresh. Thoughts? I'll be testing. Hopefully, I'll have a fixed patch up soon. Cheers, Carl From: Yingjun Li liyingjun1...@gmail.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Thursday, September 5, 2013 8:28 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron] The three API server multi-worker process patches. +1 for Carl's patch, and i have abandoned my patch.. About the `MySQL server gone away` problem, I fixed it by set 'pool_recycle' to 1 in db/api.py. 在 2013年9月6日星期五,Nachi Ueno 写道: Hi Folks We choose https://review.openstack.org/#/c/37131/ -- This patch to go on. We are also discussing in this patch. Best Nachi 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com: Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
Thanks, I'll give it a try. On Fri, Nov 22, 2013 at 2:35 AM, Carl Baldwin c...@ecbaldwin.net wrote: Hello, Please tell me if your experience is similar to what I experienced: 1. I would see *at most one* MySQL server has gone away error for each process that was spawned as an API worker. I saw them within a minute of spawning the workers and then I did not see these errors anymore until I restarted the server and spawned new processes. 2. I noted in patch set 7 the line of code that completely fixed this for me. Please confirm that you have applied a patch that includes this fix. https://review.openstack.org/#/c/37131/7/neutron/wsgi.py 3. I did not change anything with pool_recycle or idle_interval in my config files. All I did was set api_workers to the number of workers that I wanted to spawn. The line of code with my comment in it above was sufficient for me. It could be that there is another cause for the errors that you're seeing. For example, is there a max connections setting in mysql that might be exceeded when you spawn multiple workers? More detail would be helpful. Cheers, Carl On Wed, Nov 20, 2013 at 7:40 PM, Zhongyue Luo zhongyue@intel.com wrote: Carl, By 2006 I mean the MySQL server has gong away error code. The error message was still appearing when idle_timeout is set to 1 and the quantum API server did not work in my case. Could you perhaps share your conf file when applying this patch? Thanks. On Thu, Nov 21, 2013 at 3:34 AM, Carl Baldwin c...@ecbaldwin.net wrote: Hi, sorry for the delay in response. I'm glad to look at it. Can you be more specific about the error? Maybe paste the error your seeing in paste.openstack.org? I don't find any reference to 2006. Maybe I'm missing something. Also, is the patch that you applied the most recent? With the final version of the patch it was no longer necessary for me to set pool_recycle or idle_interval. Thanks, Carl On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo zhongyue@intel.com wrote: Carl, Yingjun, I'm still getting the 2006 error even by configuring idle_interval to 1. I applied the patch to the RDO havana dist on centos 6.4. Are there any other options I should be considering such as min/max pool size or use_tpool? Thanks. On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com wrote: This pool_recycle parameter is already configurable using the idle_timeout configuration variable in neutron.conf. I tested this with a value of 1 as suggested and it did get rid of the mysql server gone away messages. This is a great clue but I think I would like a long-term solution that allows the end-user to still configure this like they were before. I'm currently thinking along the lines of calling something like pool.dispose() in each child immediately after it is spawned. I think this should invalidate all of the existing connections so that when a connection is checked out of the pool a new one will be created fresh. Thoughts? I'll be testing. Hopefully, I'll have a fixed patch up soon. Cheers, Carl From: Yingjun Li liyingjun1...@gmail.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Thursday, September 5, 2013 8:28 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron] The three API server multi-worker process patches. +1 for Carl's patch, and i have abandoned my patch.. About the `MySQL server gone away` problem, I fixed it by set 'pool_recycle' to 1 in db/api.py. 在 2013年9月6日星期五,Nachi Ueno 写道: Hi Folks We choose https://review.openstack.org/#/c/37131/ -- This patch to go on. We are also discussing in this patch. Best Nachi 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com: Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
Hi, sorry for the delay in response. I'm glad to look at it. Can you be more specific about the error? Maybe paste the error your seeing in paste.openstack.org? I don't find any reference to 2006. Maybe I'm missing something. Also, is the patch that you applied the most recent? With the final version of the patch it was no longer necessary for me to set pool_recycle or idle_interval. Thanks, Carl On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo zhongyue@intel.com wrote: Carl, Yingjun, I'm still getting the 2006 error even by configuring idle_interval to 1. I applied the patch to the RDO havana dist on centos 6.4. Are there any other options I should be considering such as min/max pool size or use_tpool? Thanks. On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com wrote: This pool_recycle parameter is already configurable using the idle_timeout configuration variable in neutron.conf. I tested this with a value of 1 as suggested and it did get rid of the mysql server gone away messages. This is a great clue but I think I would like a long-term solution that allows the end-user to still configure this like they were before. I'm currently thinking along the lines of calling something like pool.dispose() in each child immediately after it is spawned. I think this should invalidate all of the existing connections so that when a connection is checked out of the pool a new one will be created fresh. Thoughts? I'll be testing. Hopefully, I'll have a fixed patch up soon. Cheers, Carl From: Yingjun Li liyingjun1...@gmail.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Thursday, September 5, 2013 8:28 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron] The three API server multi-worker process patches. +1 for Carl's patch, and i have abandoned my patch.. About the `MySQL server gone away` problem, I fixed it by set 'pool_recycle' to 1 in db/api.py. 在 2013年9月6日星期五,Nachi Ueno 写道: Hi Folks We choose https://review.openstack.org/#/c/37131/ -- This patch to go on. We are also discussing in this patch. Best Nachi 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com: Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge bottleneck--essentially the only controller process with massive CPU churn in environments with constant instance churn, or sudden large batches of new instance requests. In Grizzly, this behavior caused addresses not to be issued to some instances during boot, due to quantum-server thinking the DHCP agents timed out and were no longer available, when in reality they were just backlogged (waiting on quantum-server, it seemed). Is it realistically looking like this patch will be cut for h3? -- Brian Cline Software Engineer III, Product Innovation SoftLayer, an IBM Company 4849 Alpha Rd, Dallas, TX 75244 214.782.7876 direct | bcl...@softlayer.com -Original Message- From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] Sent: Wednesday, August 28, 2013 3:04 PM To: Mark McClain Cc: OpenStack Development Mailing List Subject: [openstack-dev] [Neutron] The three API server multi-worker process patches. All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes to the neutron-server. There were a few mistakes made which led to three patches being done independently of each other. Can we settle on one and accept it? I have changed my patch at the suggestion of one of the other 2 authors, Peter Feiner, in attempt to find common ground. It now uses openstack common code and therefore it is more concise than any of the original three and should be pretty easy to review. I'll admit to some bias toward my own implementation
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
Carl, By 2006 I mean the MySQL server has gong away error code. The error message was still appearing when idle_timeout is set to 1 and the quantum API server did not work in my case. Could you perhaps share your conf file when applying this patch? Thanks. On Thu, Nov 21, 2013 at 3:34 AM, Carl Baldwin c...@ecbaldwin.net wrote: Hi, sorry for the delay in response. I'm glad to look at it. Can you be more specific about the error? Maybe paste the error your seeing in paste.openstack.org? I don't find any reference to 2006. Maybe I'm missing something. Also, is the patch that you applied the most recent? With the final version of the patch it was no longer necessary for me to set pool_recycle or idle_interval. Thanks, Carl On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo zhongyue@intel.com wrote: Carl, Yingjun, I'm still getting the 2006 error even by configuring idle_interval to 1. I applied the patch to the RDO havana dist on centos 6.4. Are there any other options I should be considering such as min/max pool size or use_tpool? Thanks. On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com wrote: This pool_recycle parameter is already configurable using the idle_timeout configuration variable in neutron.conf. I tested this with a value of 1 as suggested and it did get rid of the mysql server gone away messages. This is a great clue but I think I would like a long-term solution that allows the end-user to still configure this like they were before. I'm currently thinking along the lines of calling something like pool.dispose() in each child immediately after it is spawned. I think this should invalidate all of the existing connections so that when a connection is checked out of the pool a new one will be created fresh. Thoughts? I'll be testing. Hopefully, I'll have a fixed patch up soon. Cheers, Carl From: Yingjun Li liyingjun1...@gmail.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Thursday, September 5, 2013 8:28 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron] The three API server multi-worker process patches. +1 for Carl's patch, and i have abandoned my patch.. About the `MySQL server gone away` problem, I fixed it by set 'pool_recycle' to 1 in db/api.py. 在 2013年9月6日星期五,Nachi Ueno 写道: Hi Folks We choose https://review.openstack.org/#/c/37131/ -- This patch to go on. We are also discussing in this patch. Best Nachi 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com: Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge bottleneck--essentially the only controller process with massive CPU churn in environments with constant instance churn, or sudden large batches of new instance requests. In Grizzly, this behavior caused addresses not to be issued to some instances during boot, due to quantum-server thinking the DHCP agents timed out and were no longer available, when in reality they were just backlogged (waiting on quantum-server, it seemed). Is it realistically looking like this patch will be cut for h3? -- Brian Cline Software Engineer III, Product Innovation SoftLayer, an IBM Company 4849 Alpha Rd, Dallas, TX 75244 214.782.7876 direct | bcl...@softlayer.com -Original Message- From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] Sent: Wednesday, August 28, 2013 3:04 PM To: Mark McClain Cc: OpenStack Development Mailing List Subject: [openstack-dev] [Neutron] The three API server multi-worker process patches. All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
Carl, Yingjun, I'm still getting the 2006 error even by configuring idle_interval to 1. I applied the patch to the RDO havana dist on centos 6.4. Are there any other options I should be considering such as min/max pool size or use_tpool? Thanks. On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com wrote: This pool_recycle parameter is already configurable using the idle_timeout configuration variable in neutron.conf. I tested this with a value of 1 as suggested and it did get rid of the mysql server gone away messages. This is a great clue but I think I would like a long-term solution that allows the end-user to still configure this like they were before. I'm currently thinking along the lines of calling something like pool.dispose() in each child immediately after it is spawned. I think this should invalidate all of the existing connections so that when a connection is checked out of the pool a new one will be created fresh. Thoughts? I'll be testing. Hopefully, I'll have a fixed patch up soon. Cheers, Carl From: Yingjun Li liyingjun1...@gmail.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Thursday, September 5, 2013 8:28 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron] The three API server multi-worker process patches. +1 for Carl's patch, and i have abandoned my patch.. About the `MySQL server gone away` problem, I fixed it by set 'pool_recycle' to 1 in db/api.py. 在 2013年9月6日星期五,Nachi Ueno 写道: Hi Folks We choose https://review.openstack.org/#/c/37131/ -- This patch to go on. We are also discussing in this patch. Best Nachi 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com: Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge bottleneck--essentially the only controller process with massive CPU churn in environments with constant instance churn, or sudden large batches of new instance requests. In Grizzly, this behavior caused addresses not to be issued to some instances during boot, due to quantum-server thinking the DHCP agents timed out and were no longer available, when in reality they were just backlogged (waiting on quantum-server, it seemed). Is it realistically looking like this patch will be cut for h3? -- Brian Cline Software Engineer III, Product Innovation SoftLayer, an IBM Company 4849 Alpha Rd, Dallas, TX 75244 214.782.7876 direct | bcl...@softlayer.com -Original Message- From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] Sent: Wednesday, August 28, 2013 3:04 PM To: Mark McClain Cc: OpenStack Development Mailing List Subject: [openstack-dev] [Neutron] The three API server multi-worker process patches. All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes to the neutron-server. There were a few mistakes made which led to three patches being done independently of each other. Can we settle on one and accept it? I have changed my patch at the suggestion of one of the other 2 authors, Peter Feiner, in attempt to find common ground. It now uses openstack common code and therefore it is more concise than any of the original three and should be pretty easy to review. I'll admit to some bias toward my own implementation but most importantly, I would like for one of these implementations to land and start seeing broad usage in the community earlier than later. Carl Baldwin PS Here are the two remaining patches. The third has been abandoned. https://review.openstack.org/#/c/37131/ https://review.openstack.org/#/c/36487/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
This is a great lead on 'pool_recycle'. Thank you. Last night I was poking around in the sqlalchemy pool code but hadn't yet come to a complete solution. I will do some testing on this today and hopefully have an updated patch out soon. Carl From: Yingjun Li liyingjun1...@gmail.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Thursday, September 5, 2013 8:28 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron] The three API server multi-worker process patches. +1 for Carl's patch, and i have abandoned my patch.. About the `MySQL server gone away` problem, I fixed it by set 'pool_recycle' to 1 in db/api.py. 在 2013年9月6日星期五,Nachi Ueno 写道: Hi Folks We choose https://review.openstack.org/#/c/37131/ -- This patch to go on. We are also discussing in this patch. Best Nachi 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com: Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge bottleneck--essentially the only controller process with massive CPU churn in environments with constant instance churn, or sudden large batches of new instance requests. In Grizzly, this behavior caused addresses not to be issued to some instances during boot, due to quantum-server thinking the DHCP agents timed out and were no longer available, when in reality they were just backlogged (waiting on quantum-server, it seemed). Is it realistically looking like this patch will be cut for h3? -- Brian Cline Software Engineer III, Product Innovation SoftLayer, an IBM Company 4849 Alpha Rd, Dallas, TX 75244 214.782.7876 direct | bcl...@softlayer.com -Original Message- From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] Sent: Wednesday, August 28, 2013 3:04 PM To: Mark McClain Cc: OpenStack Development Mailing List Subject: [openstack-dev] [Neutron] The three API server multi-worker process patches. All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes to the neutron-server. There were a few mistakes made which led to three patches being done independently of each other. Can we settle on one and accept it? I have changed my patch at the suggestion of one of the other 2 authors, Peter Feiner, in attempt to find common ground. It now uses openstack common code and therefore it is more concise than any of the original three and should be pretty easy to review. I'll admit to some bias toward my own implementation but most importantly, I would like for one of these implementations to land and start seeing broad usage in the community earlier than later. Carl Baldwin PS Here are the two remaining patches. The third has been abandoned. https://review.openstack.org/#/c/37131/ https://review.openstack.org/#/c/36487/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
This pool_recycle parameter is already configurable using the idle_timeout configuration variable in neutron.conf. I tested this with a value of 1 as suggested and it did get rid of the mysql server gone away messages. This is a great clue but I think I would like a long-term solution that allows the end-user to still configure this like they were before. I'm currently thinking along the lines of calling something like pool.dispose() in each child immediately after it is spawned. I think this should invalidate all of the existing connections so that when a connection is checked out of the pool a new one will be created fresh. Thoughts? I'll be testing. Hopefully, I'll have a fixed patch up soon. Cheers, Carl From: Yingjun Li liyingjun1...@gmail.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Thursday, September 5, 2013 8:28 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron] The three API server multi-worker process patches. +1 for Carl's patch, and i have abandoned my patch.. About the `MySQL server gone away` problem, I fixed it by set 'pool_recycle' to 1 in db/api.py. 在 2013年9月6日星期五,Nachi Ueno 写道: Hi Folks We choose https://review.openstack.org/#/c/37131/ -- This patch to go on. We are also discussing in this patch. Best Nachi 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com: Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge bottleneck--essentially the only controller process with massive CPU churn in environments with constant instance churn, or sudden large batches of new instance requests. In Grizzly, this behavior caused addresses not to be issued to some instances during boot, due to quantum-server thinking the DHCP agents timed out and were no longer available, when in reality they were just backlogged (waiting on quantum-server, it seemed). Is it realistically looking like this patch will be cut for h3? -- Brian Cline Software Engineer III, Product Innovation SoftLayer, an IBM Company 4849 Alpha Rd, Dallas, TX 75244 214.782.7876 direct | bcl...@softlayer.com -Original Message- From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] Sent: Wednesday, August 28, 2013 3:04 PM To: Mark McClain Cc: OpenStack Development Mailing List Subject: [openstack-dev] [Neutron] The three API server multi-worker process patches. All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes to the neutron-server. There were a few mistakes made which led to three patches being done independently of each other. Can we settle on one and accept it? I have changed my patch at the suggestion of one of the other 2 authors, Peter Feiner, in attempt to find common ground. It now uses openstack common code and therefore it is more concise than any of the original three and should be pretty easy to review. I'll admit to some bias toward my own implementation but most importantly, I would like for one of these implementations to land and start seeing broad usage in the community earlier than later. Carl Baldwin PS Here are the two remaining patches. The third has been abandoned. https://review.openstack.org/#/c/37131/ https://review.openstack.org/#/c/36487/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge bottleneck--essentially the only controller process with massive CPU churn in environments with constant instance churn, or sudden large batches of new instance requests. In Grizzly, this behavior caused addresses not to be issued to some instances during boot, due to quantum-server thinking the DHCP agents timed out and were no longer available, when in reality they were just backlogged (waiting on quantum-server, it seemed). Is it realistically looking like this patch will be cut for h3? -- Brian Cline Software Engineer III, Product Innovation SoftLayer, an IBM Company 4849 Alpha Rd, Dallas, TX 75244 214.782.7876 direct | bcl...@softlayer.com -Original Message- From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] Sent: Wednesday, August 28, 2013 3:04 PM To: Mark McClain Cc: OpenStack Development Mailing List Subject: [openstack-dev] [Neutron] The three API server multi-worker process patches. All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes to the neutron-server. There were a few mistakes made which led to three patches being done independently of each other. Can we settle on one and accept it? I have changed my patch at the suggestion of one of the other 2 authors, Peter Feiner, in attempt to find common ground. It now uses openstack common code and therefore it is more concise than any of the original three and should be pretty easy to review. I'll admit to some bias toward my own implementation but most importantly, I would like for one of these implementations to land and start seeing broad usage in the community earlier than later. Carl Baldwin PS Here are the two remaining patches. The third has been abandoned. https://review.openstack.org/#/c/37131/ https://review.openstack.org/#/c/36487/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
Hi Folks We choose https://review.openstack.org/#/c/37131/ -- This patch to go on. We are also discussing in this patch. Best Nachi 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com: Brian, As far as I know, no consensus was reached. A problem was discovered that happens when spawning multiple processes. The mysql connection seems to go away after between 10-60 seconds in my testing causing a seemingly random API call to fail. After that, it is okay. This must be due to some interaction between forking the process and the mysql connection pool. This needs to be solved but I haven't had the time to look in to it this week. I'm not sure if the other proposal suffers from this problem. Carl On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote: Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge bottleneck--essentially the only controller process with massive CPU churn in environments with constant instance churn, or sudden large batches of new instance requests. In Grizzly, this behavior caused addresses not to be issued to some instances during boot, due to quantum-server thinking the DHCP agents timed out and were no longer available, when in reality they were just backlogged (waiting on quantum-server, it seemed). Is it realistically looking like this patch will be cut for h3? -- Brian Cline Software Engineer III, Product Innovation SoftLayer, an IBM Company 4849 Alpha Rd, Dallas, TX 75244 214.782.7876 direct | bcl...@softlayer.com -Original Message- From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] Sent: Wednesday, August 28, 2013 3:04 PM To: Mark McClain Cc: OpenStack Development Mailing List Subject: [openstack-dev] [Neutron] The three API server multi-worker process patches. All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes to the neutron-server. There were a few mistakes made which led to three patches being done independently of each other. Can we settle on one and accept it? I have changed my patch at the suggestion of one of the other 2 authors, Peter Feiner, in attempt to find common ground. It now uses openstack common code and therefore it is more concise than any of the original three and should be pretty easy to review. I'll admit to some bias toward my own implementation but most importantly, I would like for one of these implementations to land and start seeing broad usage in the community earlier than later. Carl Baldwin PS Here are the two remaining patches. The third has been abandoned. https://review.openstack.org/#/c/37131/ https://review.openstack.org/#/c/36487/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.
Was any consensus on this ever reached? It appears both reviews are still open. I'm partial to review 37131 as it attacks the problem a more concisely, and, as mentioned, combined the efforts of the two more effective patches. I would echo Carl's sentiments that it's an easy review minus the few minor behaviors discussed on the review thread today. We feel very strongly about these making it into Havana -- being confined to a single neutron-server instance per cluster or region is a huge bottleneck--essentially the only controller process with massive CPU churn in environments with constant instance churn, or sudden large batches of new instance requests. In Grizzly, this behavior caused addresses not to be issued to some instances during boot, due to quantum-server thinking the DHCP agents timed out and were no longer available, when in reality they were just backlogged (waiting on quantum-server, it seemed). Is it realistically looking like this patch will be cut for h3? -- Brian Cline Software Engineer III, Product Innovation SoftLayer, an IBM Company 4849 Alpha Rd, Dallas, TX 75244 214.782.7876 direct | bcl...@softlayer.com -Original Message- From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] Sent: Wednesday, August 28, 2013 3:04 PM To: Mark McClain Cc: OpenStack Development Mailing List Subject: [openstack-dev] [Neutron] The three API server multi-worker process patches. All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes to the neutron-server. There were a few mistakes made which led to three patches being done independently of each other. Can we settle on one and accept it? I have changed my patch at the suggestion of one of the other 2 authors, Peter Feiner, in attempt to find common ground. It now uses openstack common code and therefore it is more concise than any of the original three and should be pretty easy to review. I'll admit to some bias toward my own implementation but most importantly, I would like for one of these implementations to land and start seeing broad usage in the community earlier than later. Carl Baldwin PS Here are the two remaining patches. The third has been abandoned. https://review.openstack.org/#/c/37131/ https://review.openstack.org/#/c/36487/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] The three API server multi-worker process patches.
All, We've known for a while now that some duplication of work happened with respect to adding multiple worker processes to the neutron-server. There were a few mistakes made which led to three patches being done independently of each other. Can we settle on one and accept it? I have changed my patch at the suggestion of one of the other 2 authors, Peter Feiner, in attempt to find common ground. It now uses openstack common code and therefore it is more concise than any of the original three and should be pretty easy to review. I'll admit to some bias toward my own implementation but most importantly, I would like for one of these implementations to land and start seeing broad usage in the community earlier than later. Carl Baldwin PS Here are the two remaining patches. The third has been abandoned. https://review.openstack.org/#/c/37131/ https://review.openstack.org/#/c/36487/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev