Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-11-21 Thread Carl Baldwin
Hello,

Please tell me if your experience is similar to what I experienced:

1.  I would see *at most one* MySQL server has gone away error for
each process that was spawned as an API worker.  I saw them within a
minute of spawning the workers and then I did not see these errors
anymore until I restarted the server and spawned new processes.

2.  I noted in patch set 7 the line of code that completely fixed this
for me.  Please confirm that you have applied a patch that includes
this fix.

https://review.openstack.org/#/c/37131/7/neutron/wsgi.py

3.  I did not change anything with pool_recycle or idle_interval in my
config files.  All I did was set api_workers to the number of workers
that I wanted to spawn.  The line of code with my comment in it above
was sufficient for me.

It could be that there is another cause for the errors that you're
seeing.  For example, is there a max connections setting in mysql that
might be exceeded when you spawn multiple workers?  More detail would
be helpful.

Cheers,
Carl

On Wed, Nov 20, 2013 at 7:40 PM, Zhongyue Luo zhongyue@intel.com wrote:
 Carl,

 By 2006 I mean the MySQL server has gong away error code.

 The error message was still appearing when idle_timeout is set to 1 and the
 quantum API server did not work in my case.

 Could you perhaps share your conf file when applying this patch?

 Thanks.



 On Thu, Nov 21, 2013 at 3:34 AM, Carl Baldwin c...@ecbaldwin.net wrote:

 Hi, sorry for the delay in response.  I'm glad to look at it.

 Can you be more specific about the error?  Maybe paste the error your
 seeing in paste.openstack.org?  I don't find any reference to 2006.
 Maybe I'm missing something.

 Also, is the patch that you applied the most recent?  With the final
 version of the patch it was no longer necessary for me to set
 pool_recycle or idle_interval.

 Thanks,
 Carl

 On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo zhongyue@intel.com
 wrote:
  Carl, Yingjun,
 
  I'm still getting the 2006 error even by configuring idle_interval to 1.
 
  I applied the patch to the RDO havana dist on centos 6.4.
 
  Are there any other options I should be considering such as min/max pool
  size or use_tpool?
 
  Thanks.
 
 
 
  On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron)
  carl.bald...@hp.com wrote:
 
  This pool_recycle parameter is already configurable using the
  idle_timeout
  configuration variable in neutron.conf.  I tested this with a value of
  1
  as suggested and it did get rid of the mysql server gone away messages.
 
  This is a great clue but I think I would like a long-term solution that
  allows the end-user to still configure this like they were before.
 
  I'm currently thinking along the lines of calling something like
  pool.dispose() in each child immediately after it is spawned.  I think
  this should invalidate all of the existing connections so that when a
  connection is checked out of the pool a new one will be created fresh.
 
  Thoughts?  I'll be testing.  Hopefully, I'll have a fixed patch up
  soon.
 
  Cheers,
  Carl
 
  From:  Yingjun Li liyingjun1...@gmail.com
  Reply-To:  OpenStack Development Mailing List
  openstack-dev@lists.openstack.org
  Date:  Thursday, September 5, 2013 8:28 PM
  To:  OpenStack Development Mailing List
  openstack-dev@lists.openstack.org
  Subject:  Re: [openstack-dev] [Neutron] The three API server
  multi-worker
  process patches.
 
 
  +1 for Carl's patch, and i have abandoned my patch..
 
  About the `MySQL server gone away` problem, I fixed it by set
  'pool_recycle' to 1 in db/api.py.
 
  在 2013年9月6日星期五,Nachi Ueno 写道:
 
  Hi Folks
 
  We choose https://review.openstack.org/#/c/37131/ -- This patch to go
  on.
  We are also discussing in this patch.
 
  Best
  Nachi
 
 
 
  2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com:
   Brian,
  
   As far as I know, no consensus was reached.
  
   A problem was discovered that happens when spawning multiple
   processes.
   The mysql connection seems to go away after between 10-60 seconds
   in
   my
   testing causing a seemingly random API call to fail.  After that, it
   is
   okay.  This must be due to some interaction between forking the
   process
   and the mysql connection pool.  This needs to be solved but I haven't
   had
   the time to look in to it this week.
  
   I'm not sure if the other proposal suffers from this problem.
  
   Carl
  
   On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:
  
  Was any consensus on this ever reached? It appears both reviews are
   still
  open. I'm partial to review 37131 as it attacks the problem a more
  concisely, and, as mentioned, combined the efforts of the two more
  effective patches. I would echo Carl's sentiments that it's an easy
  review minus the few minor behaviors discussed on the review thread
  today.
  
  We feel very strongly about these making it into Havana -- being
   confined
  to a single neutron-server instance per cluster or region is a huge

Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-11-21 Thread Zhongyue Luo
Thanks, I'll give it a try.


On Fri, Nov 22, 2013 at 2:35 AM, Carl Baldwin c...@ecbaldwin.net wrote:

 Hello,

 Please tell me if your experience is similar to what I experienced:

 1.  I would see *at most one* MySQL server has gone away error for
 each process that was spawned as an API worker.  I saw them within a
 minute of spawning the workers and then I did not see these errors
 anymore until I restarted the server and spawned new processes.

 2.  I noted in patch set 7 the line of code that completely fixed this
 for me.  Please confirm that you have applied a patch that includes
 this fix.

 https://review.openstack.org/#/c/37131/7/neutron/wsgi.py

 3.  I did not change anything with pool_recycle or idle_interval in my
 config files.  All I did was set api_workers to the number of workers
 that I wanted to spawn.  The line of code with my comment in it above
 was sufficient for me.

 It could be that there is another cause for the errors that you're
 seeing.  For example, is there a max connections setting in mysql that
 might be exceeded when you spawn multiple workers?  More detail would
 be helpful.

 Cheers,
 Carl

 On Wed, Nov 20, 2013 at 7:40 PM, Zhongyue Luo zhongyue@intel.com
 wrote:
  Carl,
 
  By 2006 I mean the MySQL server has gong away error code.
 
  The error message was still appearing when idle_timeout is set to 1 and
 the
  quantum API server did not work in my case.
 
  Could you perhaps share your conf file when applying this patch?
 
  Thanks.
 
 
 
  On Thu, Nov 21, 2013 at 3:34 AM, Carl Baldwin c...@ecbaldwin.net
 wrote:
 
  Hi, sorry for the delay in response.  I'm glad to look at it.
 
  Can you be more specific about the error?  Maybe paste the error your
  seeing in paste.openstack.org?  I don't find any reference to 2006.
  Maybe I'm missing something.
 
  Also, is the patch that you applied the most recent?  With the final
  version of the patch it was no longer necessary for me to set
  pool_recycle or idle_interval.
 
  Thanks,
  Carl
 
  On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo zhongyue@intel.com
  wrote:
   Carl, Yingjun,
  
   I'm still getting the 2006 error even by configuring idle_interval to
 1.
  
   I applied the patch to the RDO havana dist on centos 6.4.
  
   Are there any other options I should be considering such as min/max
 pool
   size or use_tpool?
  
   Thanks.
  
  
  
   On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron)
   carl.bald...@hp.com wrote:
  
   This pool_recycle parameter is already configurable using the
   idle_timeout
   configuration variable in neutron.conf.  I tested this with a value
 of
   1
   as suggested and it did get rid of the mysql server gone away
 messages.
  
   This is a great clue but I think I would like a long-term solution
 that
   allows the end-user to still configure this like they were before.
  
   I'm currently thinking along the lines of calling something like
   pool.dispose() in each child immediately after it is spawned.  I
 think
   this should invalidate all of the existing connections so that when a
   connection is checked out of the pool a new one will be created
 fresh.
  
   Thoughts?  I'll be testing.  Hopefully, I'll have a fixed patch up
   soon.
  
   Cheers,
   Carl
  
   From:  Yingjun Li liyingjun1...@gmail.com
   Reply-To:  OpenStack Development Mailing List
   openstack-dev@lists.openstack.org
   Date:  Thursday, September 5, 2013 8:28 PM
   To:  OpenStack Development Mailing List
   openstack-dev@lists.openstack.org
   Subject:  Re: [openstack-dev] [Neutron] The three API server
   multi-worker
   process patches.
  
  
   +1 for Carl's patch, and i have abandoned my patch..
  
   About the `MySQL server gone away` problem, I fixed it by set
   'pool_recycle' to 1 in db/api.py.
  
   在 2013年9月6日星期五,Nachi Ueno 写道:
  
   Hi Folks
  
   We choose https://review.openstack.org/#/c/37131/ -- This patch to
 go
   on.
   We are also discussing in this patch.
  
   Best
   Nachi
  
  
  
   2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com:
Brian,
   
As far as I know, no consensus was reached.
   
A problem was discovered that happens when spawning multiple
processes.
The mysql connection seems to go away after between 10-60 seconds
in
my
testing causing a seemingly random API call to fail.  After that,
 it
is
okay.  This must be due to some interaction between forking the
process
and the mysql connection pool.  This needs to be solved but I
 haven't
had
the time to look in to it this week.
   
I'm not sure if the other proposal suffers from this problem.
   
Carl
   
On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:
   
   Was any consensus on this ever reached? It appears both reviews are
still
   open. I'm partial to review 37131 as it attacks the problem a more
   concisely, and, as mentioned, combined the efforts of the two more
   effective patches. I would echo Carl's

Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-11-20 Thread Carl Baldwin
Hi, sorry for the delay in response.  I'm glad to look at it.

Can you be more specific about the error?  Maybe paste the error your
seeing in paste.openstack.org?  I don't find any reference to 2006.
Maybe I'm missing something.

Also, is the patch that you applied the most recent?  With the final
version of the patch it was no longer necessary for me to set
pool_recycle or idle_interval.

Thanks,
Carl

On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo zhongyue@intel.com wrote:
 Carl, Yingjun,

 I'm still getting the 2006 error even by configuring idle_interval to 1.

 I applied the patch to the RDO havana dist on centos 6.4.

 Are there any other options I should be considering such as min/max pool
 size or use_tpool?

 Thanks.



 On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron)
 carl.bald...@hp.com wrote:

 This pool_recycle parameter is already configurable using the idle_timeout
 configuration variable in neutron.conf.  I tested this with a value of 1
 as suggested and it did get rid of the mysql server gone away messages.

 This is a great clue but I think I would like a long-term solution that
 allows the end-user to still configure this like they were before.

 I'm currently thinking along the lines of calling something like
 pool.dispose() in each child immediately after it is spawned.  I think
 this should invalidate all of the existing connections so that when a
 connection is checked out of the pool a new one will be created fresh.

 Thoughts?  I'll be testing.  Hopefully, I'll have a fixed patch up soon.

 Cheers,
 Carl

 From:  Yingjun Li liyingjun1...@gmail.com
 Reply-To:  OpenStack Development Mailing List
 openstack-dev@lists.openstack.org
 Date:  Thursday, September 5, 2013 8:28 PM
 To:  OpenStack Development Mailing List
 openstack-dev@lists.openstack.org
 Subject:  Re: [openstack-dev] [Neutron] The three API server multi-worker
 process patches.


 +1 for Carl's patch, and i have abandoned my patch..

 About the `MySQL server gone away` problem, I fixed it by set
 'pool_recycle' to 1 in db/api.py.

 在 2013年9月6日星期五,Nachi Ueno 写道:

 Hi Folks

 We choose https://review.openstack.org/#/c/37131/ -- This patch to go on.
 We are also discussing in this patch.

 Best
 Nachi



 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com:
  Brian,
 
  As far as I know, no consensus was reached.
 
  A problem was discovered that happens when spawning multiple processes.
  The mysql connection seems to go away after between 10-60 seconds in
  my
  testing causing a seemingly random API call to fail.  After that, it is
  okay.  This must be due to some interaction between forking the process
  and the mysql connection pool.  This needs to be solved but I haven't
  had
  the time to look in to it this week.
 
  I'm not sure if the other proposal suffers from this problem.
 
  Carl
 
  On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:
 
 Was any consensus on this ever reached? It appears both reviews are
  still
 open. I'm partial to review 37131 as it attacks the problem a more
 concisely, and, as mentioned, combined the efforts of the two more
 effective patches. I would echo Carl's sentiments that it's an easy
 review minus the few minor behaviors discussed on the review thread
 today.
 
 We feel very strongly about these making it into Havana -- being
  confined
 to a single neutron-server instance per cluster or region is a huge
 bottleneck--essentially the only controller process with massive CPU
 churn in environments with constant instance churn, or sudden large
 batches of new instance requests.
 
 In Grizzly, this behavior caused addresses not to be issued to some
 instances during boot, due to quantum-server thinking the DHCP agents
 timed out and were no longer available, when in reality they were just
 backlogged (waiting on quantum-server, it seemed).
 
 Is it realistically looking like this patch will be cut for h3?
 
 --
 Brian Cline
 Software Engineer III, Product Innovation
 
 SoftLayer, an IBM Company
 4849 Alpha Rd, Dallas, TX 75244
 214.782.7876 direct  |  bcl...@softlayer.com
 
 
 -Original Message-
 From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com]
 Sent: Wednesday, August 28, 2013 3:04 PM
 To: Mark McClain
 Cc: OpenStack Development Mailing List
 Subject: [openstack-dev] [Neutron] The three API server multi-worker
 process patches.
 
 All,
 
 We've known for a while now that some duplication of work happened with
 respect to adding multiple worker processes to the neutron-server.
  There
 were a few mistakes made which led to three patches being done
 independently of each other.
 
 Can we settle on one and accept it?
 
 I have changed my patch at the suggestion of one of the other 2 authors,
 Peter Feiner, in attempt to find common ground.  It now uses openstack
 common code and therefore it is more concise than any of the original
 three and should be pretty easy to review.  I'll admit to some bias
 toward
 my own implementation

Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-11-20 Thread Zhongyue Luo
Carl,

By 2006 I mean the MySQL server has gong away error code.

The error message was still appearing when idle_timeout is set to 1 and the
quantum API server did not work in my case.

Could you perhaps share your conf file when applying this patch?

Thanks.



On Thu, Nov 21, 2013 at 3:34 AM, Carl Baldwin c...@ecbaldwin.net wrote:

 Hi, sorry for the delay in response.  I'm glad to look at it.

 Can you be more specific about the error?  Maybe paste the error your
 seeing in paste.openstack.org?  I don't find any reference to 2006.
 Maybe I'm missing something.

 Also, is the patch that you applied the most recent?  With the final
 version of the patch it was no longer necessary for me to set
 pool_recycle or idle_interval.

 Thanks,
 Carl

 On Tue, Nov 19, 2013 at 7:14 PM, Zhongyue Luo zhongyue@intel.com
 wrote:
  Carl, Yingjun,
 
  I'm still getting the 2006 error even by configuring idle_interval to 1.
 
  I applied the patch to the RDO havana dist on centos 6.4.
 
  Are there any other options I should be considering such as min/max pool
  size or use_tpool?
 
  Thanks.
 
 
 
  On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron)
  carl.bald...@hp.com wrote:
 
  This pool_recycle parameter is already configurable using the
 idle_timeout
  configuration variable in neutron.conf.  I tested this with a value of 1
  as suggested and it did get rid of the mysql server gone away messages.
 
  This is a great clue but I think I would like a long-term solution that
  allows the end-user to still configure this like they were before.
 
  I'm currently thinking along the lines of calling something like
  pool.dispose() in each child immediately after it is spawned.  I think
  this should invalidate all of the existing connections so that when a
  connection is checked out of the pool a new one will be created fresh.
 
  Thoughts?  I'll be testing.  Hopefully, I'll have a fixed patch up soon.
 
  Cheers,
  Carl
 
  From:  Yingjun Li liyingjun1...@gmail.com
  Reply-To:  OpenStack Development Mailing List
  openstack-dev@lists.openstack.org
  Date:  Thursday, September 5, 2013 8:28 PM
  To:  OpenStack Development Mailing List
  openstack-dev@lists.openstack.org
  Subject:  Re: [openstack-dev] [Neutron] The three API server
 multi-worker
  process patches.
 
 
  +1 for Carl's patch, and i have abandoned my patch..
 
  About the `MySQL server gone away` problem, I fixed it by set
  'pool_recycle' to 1 in db/api.py.
 
  在 2013年9月6日星期五,Nachi Ueno 写道:
 
  Hi Folks
 
  We choose https://review.openstack.org/#/c/37131/ -- This patch to go
 on.
  We are also discussing in this patch.
 
  Best
  Nachi
 
 
 
  2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com:
   Brian,
  
   As far as I know, no consensus was reached.
  
   A problem was discovered that happens when spawning multiple
 processes.
   The mysql connection seems to go away after between 10-60 seconds in
   my
   testing causing a seemingly random API call to fail.  After that, it
 is
   okay.  This must be due to some interaction between forking the
 process
   and the mysql connection pool.  This needs to be solved but I haven't
   had
   the time to look in to it this week.
  
   I'm not sure if the other proposal suffers from this problem.
  
   Carl
  
   On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:
  
  Was any consensus on this ever reached? It appears both reviews are
   still
  open. I'm partial to review 37131 as it attacks the problem a more
  concisely, and, as mentioned, combined the efforts of the two more
  effective patches. I would echo Carl's sentiments that it's an easy
  review minus the few minor behaviors discussed on the review thread
  today.
  
  We feel very strongly about these making it into Havana -- being
   confined
  to a single neutron-server instance per cluster or region is a huge
  bottleneck--essentially the only controller process with massive CPU
  churn in environments with constant instance churn, or sudden large
  batches of new instance requests.
  
  In Grizzly, this behavior caused addresses not to be issued to some
  instances during boot, due to quantum-server thinking the DHCP agents
  timed out and were no longer available, when in reality they were just
  backlogged (waiting on quantum-server, it seemed).
  
  Is it realistically looking like this patch will be cut for h3?
  
  --
  Brian Cline
  Software Engineer III, Product Innovation
  
  SoftLayer, an IBM Company
  4849 Alpha Rd, Dallas, TX 75244
  214.782.7876 direct  |  bcl...@softlayer.com
  
  
  -Original Message-
  From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com]
  Sent: Wednesday, August 28, 2013 3:04 PM
  To: Mark McClain
  Cc: OpenStack Development Mailing List
  Subject: [openstack-dev] [Neutron] The three API server multi-worker
  process patches.
  
  All,
  
  We've known for a while now that some duplication of work happened
 with
  respect to adding multiple worker processes

Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-11-19 Thread Zhongyue Luo
Carl, Yingjun,

I'm still getting the 2006 error even by configuring idle_interval to 1.

I applied the patch to the RDO havana dist on centos 6.4.

Are there any other options I should be considering such as min/max pool
size or use_tpool?

Thanks.



On Sat, Sep 7, 2013 at 3:33 AM, Baldwin, Carl (HPCS Neutron) 
carl.bald...@hp.com wrote:

 This pool_recycle parameter is already configurable using the idle_timeout
 configuration variable in neutron.conf.  I tested this with a value of 1
 as suggested and it did get rid of the mysql server gone away messages.

 This is a great clue but I think I would like a long-term solution that
 allows the end-user to still configure this like they were before.

 I'm currently thinking along the lines of calling something like
 pool.dispose() in each child immediately after it is spawned.  I think
 this should invalidate all of the existing connections so that when a
 connection is checked out of the pool a new one will be created fresh.

 Thoughts?  I'll be testing.  Hopefully, I'll have a fixed patch up soon.

 Cheers,
 Carl

 From:  Yingjun Li liyingjun1...@gmail.com
 Reply-To:  OpenStack Development Mailing List
 openstack-dev@lists.openstack.org
 Date:  Thursday, September 5, 2013 8:28 PM
 To:  OpenStack Development Mailing List openstack-dev@lists.openstack.org
 
 Subject:  Re: [openstack-dev] [Neutron] The three API server multi-worker
 process patches.


 +1 for Carl's patch, and i have abandoned my patch..

 About the `MySQL server gone away` problem, I fixed it by set
 'pool_recycle' to 1 in db/api.py.

 在 2013年9月6日星期五,Nachi Ueno 写道:

 Hi Folks

 We choose https://review.openstack.org/#/c/37131/ -- This patch to go on.
 We are also discussing in this patch.

 Best
 Nachi



 2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com:
  Brian,
 
  As far as I know, no consensus was reached.
 
  A problem was discovered that happens when spawning multiple processes.
  The mysql connection seems to go away after between 10-60 seconds in my
  testing causing a seemingly random API call to fail.  After that, it is
  okay.  This must be due to some interaction between forking the process
  and the mysql connection pool.  This needs to be solved but I haven't had
  the time to look in to it this week.
 
  I'm not sure if the other proposal suffers from this problem.
 
  Carl
 
  On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:
 
 Was any consensus on this ever reached? It appears both reviews are still
 open. I'm partial to review 37131 as it attacks the problem a more
 concisely, and, as mentioned, combined the efforts of the two more
 effective patches. I would echo Carl's sentiments that it's an easy
 review minus the few minor behaviors discussed on the review thread
 today.
 
 We feel very strongly about these making it into Havana -- being confined
 to a single neutron-server instance per cluster or region is a huge
 bottleneck--essentially the only controller process with massive CPU
 churn in environments with constant instance churn, or sudden large
 batches of new instance requests.
 
 In Grizzly, this behavior caused addresses not to be issued to some
 instances during boot, due to quantum-server thinking the DHCP agents
 timed out and were no longer available, when in reality they were just
 backlogged (waiting on quantum-server, it seemed).
 
 Is it realistically looking like this patch will be cut for h3?
 
 --
 Brian Cline
 Software Engineer III, Product Innovation
 
 SoftLayer, an IBM Company
 4849 Alpha Rd, Dallas, TX 75244
 214.782.7876 direct  |  bcl...@softlayer.com
 
 
 -Original Message-
 From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com]
 Sent: Wednesday, August 28, 2013 3:04 PM
 To: Mark McClain
 Cc: OpenStack Development Mailing List
 Subject: [openstack-dev] [Neutron] The three API server multi-worker
 process patches.
 
 All,
 
 We've known for a while now that some duplication of work happened with
 respect to adding multiple worker processes to the neutron-server.  There
 were a few mistakes made which led to three patches being done
 independently of each other.
 
 Can we settle on one and accept it?
 
 I have changed my patch at the suggestion of one of the other 2 authors,
 Peter Feiner, in attempt to find common ground.  It now uses openstack
 common code and therefore it is more concise than any of the original
 three and should be pretty easy to review.  I'll admit to some bias
 toward
 my own implementation but most importantly, I would like for one of these
 implementations to land and start seeing broad usage in the community
 earlier than later.
 
 Carl Baldwin
 
 PS Here are the two remaining patches.  The third has been abandoned.
 
 https://review.openstack.org/#/c/37131/
 https://review.openstack.org/#/c/36487/
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-09-06 Thread Baldwin, Carl (HPCS Neutron)
This is a great lead on 'pool_recycle'.  Thank you.  Last night I was
poking around in the sqlalchemy pool code but hadn't yet come to a
complete solution.  I will do some testing on this today and hopefully
have an updated patch out soon.

Carl

From:  Yingjun Li liyingjun1...@gmail.com
Reply-To:  OpenStack Development Mailing List
openstack-dev@lists.openstack.org
Date:  Thursday, September 5, 2013 8:28 PM
To:  OpenStack Development Mailing List openstack-dev@lists.openstack.org
Subject:  Re: [openstack-dev] [Neutron] The three API server multi-worker
process patches.


+1 for Carl's patch, and i have abandoned my patch..

About the `MySQL server gone away` problem, I fixed it by set
'pool_recycle' to 1 in db/api.py.

在 2013年9月6日星期五,Nachi Ueno 写道:

Hi Folks

We choose https://review.openstack.org/#/c/37131/ -- This patch to go on.
We are also discussing in this patch.

Best
Nachi



2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com:
 Brian,

 As far as I know, no consensus was reached.

 A problem was discovered that happens when spawning multiple processes.
 The mysql connection seems to go away after between 10-60 seconds in my
 testing causing a seemingly random API call to fail.  After that, it is
 okay.  This must be due to some interaction between forking the process
 and the mysql connection pool.  This needs to be solved but I haven't had
 the time to look in to it this week.

 I'm not sure if the other proposal suffers from this problem.

 Carl

 On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:

Was any consensus on this ever reached? It appears both reviews are still
open. I'm partial to review 37131 as it attacks the problem a more
concisely, and, as mentioned, combined the efforts of the two more
effective patches. I would echo Carl's sentiments that it's an easy
review minus the few minor behaviors discussed on the review thread
today.

We feel very strongly about these making it into Havana -- being confined
to a single neutron-server instance per cluster or region is a huge
bottleneck--essentially the only controller process with massive CPU
churn in environments with constant instance churn, or sudden large
batches of new instance requests.

In Grizzly, this behavior caused addresses not to be issued to some
instances during boot, due to quantum-server thinking the DHCP agents
timed out and were no longer available, when in reality they were just
backlogged (waiting on quantum-server, it seemed).

Is it realistically looking like this patch will be cut for h3?

--
Brian Cline
Software Engineer III, Product Innovation

SoftLayer, an IBM Company
4849 Alpha Rd, Dallas, TX 75244
214.782.7876 direct  |  bcl...@softlayer.com


-Original Message-
From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com]
Sent: Wednesday, August 28, 2013 3:04 PM
To: Mark McClain
Cc: OpenStack Development Mailing List
Subject: [openstack-dev] [Neutron] The three API server multi-worker
process patches.

All,

We've known for a while now that some duplication of work happened with
respect to adding multiple worker processes to the neutron-server.  There
were a few mistakes made which led to three patches being done
independently of each other.

Can we settle on one and accept it?

I have changed my patch at the suggestion of one of the other 2 authors,
Peter Feiner, in attempt to find common ground.  It now uses openstack
common code and therefore it is more concise than any of the original
three and should be pretty easy to review.  I'll admit to some bias
toward
my own implementation but most importantly, I would like for one of these
implementations to land and start seeing broad usage in the community
earlier than later.

Carl Baldwin

PS Here are the two remaining patches.  The third has been abandoned.

https://review.openstack.org/#/c/37131/
https://review.openstack.org/#/c/36487/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-09-06 Thread Baldwin, Carl (HPCS Neutron)
This pool_recycle parameter is already configurable using the idle_timeout
configuration variable in neutron.conf.  I tested this with a value of 1
as suggested and it did get rid of the mysql server gone away messages.

This is a great clue but I think I would like a long-term solution that
allows the end-user to still configure this like they were before.

I'm currently thinking along the lines of calling something like
pool.dispose() in each child immediately after it is spawned.  I think
this should invalidate all of the existing connections so that when a
connection is checked out of the pool a new one will be created fresh.

Thoughts?  I'll be testing.  Hopefully, I'll have a fixed patch up soon.

Cheers,
Carl

From:  Yingjun Li liyingjun1...@gmail.com
Reply-To:  OpenStack Development Mailing List
openstack-dev@lists.openstack.org
Date:  Thursday, September 5, 2013 8:28 PM
To:  OpenStack Development Mailing List openstack-dev@lists.openstack.org
Subject:  Re: [openstack-dev] [Neutron] The three API server multi-worker
process patches.


+1 for Carl's patch, and i have abandoned my patch..

About the `MySQL server gone away` problem, I fixed it by set
'pool_recycle' to 1 in db/api.py.

在 2013年9月6日星期五,Nachi Ueno 写道:

Hi Folks

We choose https://review.openstack.org/#/c/37131/ -- This patch to go on.
We are also discussing in this patch.

Best
Nachi



2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com:
 Brian,

 As far as I know, no consensus was reached.

 A problem was discovered that happens when spawning multiple processes.
 The mysql connection seems to go away after between 10-60 seconds in my
 testing causing a seemingly random API call to fail.  After that, it is
 okay.  This must be due to some interaction between forking the process
 and the mysql connection pool.  This needs to be solved but I haven't had
 the time to look in to it this week.

 I'm not sure if the other proposal suffers from this problem.

 Carl

 On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:

Was any consensus on this ever reached? It appears both reviews are still
open. I'm partial to review 37131 as it attacks the problem a more
concisely, and, as mentioned, combined the efforts of the two more
effective patches. I would echo Carl's sentiments that it's an easy
review minus the few minor behaviors discussed on the review thread
today.

We feel very strongly about these making it into Havana -- being confined
to a single neutron-server instance per cluster or region is a huge
bottleneck--essentially the only controller process with massive CPU
churn in environments with constant instance churn, or sudden large
batches of new instance requests.

In Grizzly, this behavior caused addresses not to be issued to some
instances during boot, due to quantum-server thinking the DHCP agents
timed out and were no longer available, when in reality they were just
backlogged (waiting on quantum-server, it seemed).

Is it realistically looking like this patch will be cut for h3?

--
Brian Cline
Software Engineer III, Product Innovation

SoftLayer, an IBM Company
4849 Alpha Rd, Dallas, TX 75244
214.782.7876 direct  |  bcl...@softlayer.com


-Original Message-
From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com]
Sent: Wednesday, August 28, 2013 3:04 PM
To: Mark McClain
Cc: OpenStack Development Mailing List
Subject: [openstack-dev] [Neutron] The three API server multi-worker
process patches.

All,

We've known for a while now that some duplication of work happened with
respect to adding multiple worker processes to the neutron-server.  There
were a few mistakes made which led to three patches being done
independently of each other.

Can we settle on one and accept it?

I have changed my patch at the suggestion of one of the other 2 authors,
Peter Feiner, in attempt to find common ground.  It now uses openstack
common code and therefore it is more concise than any of the original
three and should be pretty easy to review.  I'll admit to some bias
toward
my own implementation but most importantly, I would like for one of these
implementations to land and start seeing broad usage in the community
earlier than later.

Carl Baldwin

PS Here are the two remaining patches.  The third has been abandoned.

https://review.openstack.org/#/c/37131/
https://review.openstack.org/#/c/36487/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-09-05 Thread Baldwin, Carl (HPCS Neutron)
Brian,

As far as I know, no consensus was reached.

A problem was discovered that happens when spawning multiple processes.
The mysql connection seems to go away after between 10-60 seconds in my
testing causing a seemingly random API call to fail.  After that, it is
okay.  This must be due to some interaction between forking the process
and the mysql connection pool.  This needs to be solved but I haven't had
the time to look in to it this week.

I'm not sure if the other proposal suffers from this problem.

Carl

On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:

Was any consensus on this ever reached? It appears both reviews are still
open. I'm partial to review 37131 as it attacks the problem a more
concisely, and, as mentioned, combined the efforts of the two more
effective patches. I would echo Carl's sentiments that it's an easy
review minus the few minor behaviors discussed on the review thread today.

We feel very strongly about these making it into Havana -- being confined
to a single neutron-server instance per cluster or region is a huge
bottleneck--essentially the only controller process with massive CPU
churn in environments with constant instance churn, or sudden large
batches of new instance requests.

In Grizzly, this behavior caused addresses not to be issued to some
instances during boot, due to quantum-server thinking the DHCP agents
timed out and were no longer available, when in reality they were just
backlogged (waiting on quantum-server, it seemed).

Is it realistically looking like this patch will be cut for h3?

--
Brian Cline
Software Engineer III, Product Innovation

SoftLayer, an IBM Company
4849 Alpha Rd, Dallas, TX 75244
214.782.7876 direct  |  bcl...@softlayer.com
 

-Original Message-
From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com]
Sent: Wednesday, August 28, 2013 3:04 PM
To: Mark McClain
Cc: OpenStack Development Mailing List
Subject: [openstack-dev] [Neutron] The three API server multi-worker
process patches.

All,

We've known for a while now that some duplication of work happened with
respect to adding multiple worker processes to the neutron-server.  There
were a few mistakes made which led to three patches being done
independently of each other.

Can we settle on one and accept it?

I have changed my patch at the suggestion of one of the other 2 authors,
Peter Feiner, in attempt to find common ground.  It now uses openstack
common code and therefore it is more concise than any of the original
three and should be pretty easy to review.  I'll admit to some bias toward
my own implementation but most importantly, I would like for one of these
implementations to land and start seeing broad usage in the community
earlier than later.

Carl Baldwin

PS Here are the two remaining patches.  The third has been abandoned.

https://review.openstack.org/#/c/37131/
https://review.openstack.org/#/c/36487/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-09-05 Thread Nachi Ueno
Hi Folks

We choose https://review.openstack.org/#/c/37131/ -- This patch to go on.
We are also discussing in this patch.

Best
Nachi



2013/9/5 Baldwin, Carl (HPCS Neutron) carl.bald...@hp.com:
 Brian,

 As far as I know, no consensus was reached.

 A problem was discovered that happens when spawning multiple processes.
 The mysql connection seems to go away after between 10-60 seconds in my
 testing causing a seemingly random API call to fail.  After that, it is
 okay.  This must be due to some interaction between forking the process
 and the mysql connection pool.  This needs to be solved but I haven't had
 the time to look in to it this week.

 I'm not sure if the other proposal suffers from this problem.

 Carl

 On 9/4/13 3:34 PM, Brian Cline bcl...@softlayer.com wrote:

Was any consensus on this ever reached? It appears both reviews are still
open. I'm partial to review 37131 as it attacks the problem a more
concisely, and, as mentioned, combined the efforts of the two more
effective patches. I would echo Carl's sentiments that it's an easy
review minus the few minor behaviors discussed on the review thread today.

We feel very strongly about these making it into Havana -- being confined
to a single neutron-server instance per cluster or region is a huge
bottleneck--essentially the only controller process with massive CPU
churn in environments with constant instance churn, or sudden large
batches of new instance requests.

In Grizzly, this behavior caused addresses not to be issued to some
instances during boot, due to quantum-server thinking the DHCP agents
timed out and were no longer available, when in reality they were just
backlogged (waiting on quantum-server, it seemed).

Is it realistically looking like this patch will be cut for h3?

--
Brian Cline
Software Engineer III, Product Innovation

SoftLayer, an IBM Company
4849 Alpha Rd, Dallas, TX 75244
214.782.7876 direct  |  bcl...@softlayer.com


-Original Message-
From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com]
Sent: Wednesday, August 28, 2013 3:04 PM
To: Mark McClain
Cc: OpenStack Development Mailing List
Subject: [openstack-dev] [Neutron] The three API server multi-worker
process patches.

All,

We've known for a while now that some duplication of work happened with
respect to adding multiple worker processes to the neutron-server.  There
were a few mistakes made which led to three patches being done
independently of each other.

Can we settle on one and accept it?

I have changed my patch at the suggestion of one of the other 2 authors,
Peter Feiner, in attempt to find common ground.  It now uses openstack
common code and therefore it is more concise than any of the original
three and should be pretty easy to review.  I'll admit to some bias toward
my own implementation but most importantly, I would like for one of these
implementations to land and start seeing broad usage in the community
earlier than later.

Carl Baldwin

PS Here are the two remaining patches.  The third has been abandoned.

https://review.openstack.org/#/c/37131/
https://review.openstack.org/#/c/36487/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-09-04 Thread Brian Cline
Was any consensus on this ever reached? It appears both reviews are still open. 
I'm partial to review 37131 as it attacks the problem a more concisely, and, as 
mentioned, combined the efforts of the two more effective patches. I would echo 
Carl's sentiments that it's an easy review minus the few minor behaviors 
discussed on the review thread today.

We feel very strongly about these making it into Havana -- being confined to a 
single neutron-server instance per cluster or region is a huge 
bottleneck--essentially the only controller process with massive CPU churn in 
environments with constant instance churn, or sudden large batches of new 
instance requests.

In Grizzly, this behavior caused addresses not to be issued to some instances 
during boot, due to quantum-server thinking the DHCP agents timed out and were 
no longer available, when in reality they were just backlogged (waiting on 
quantum-server, it seemed).

Is it realistically looking like this patch will be cut for h3?

--
Brian Cline
Software Engineer III, Product Innovation

SoftLayer, an IBM Company
4849 Alpha Rd, Dallas, TX 75244
214.782.7876 direct  |  bcl...@softlayer.com
 

-Original Message-
From: Baldwin, Carl (HPCS Neutron) [mailto:carl.bald...@hp.com] 
Sent: Wednesday, August 28, 2013 3:04 PM
To: Mark McClain
Cc: OpenStack Development Mailing List
Subject: [openstack-dev] [Neutron] The three API server multi-worker process 
patches.

All,

We've known for a while now that some duplication of work happened with
respect to adding multiple worker processes to the neutron-server.  There
were a few mistakes made which led to three patches being done
independently of each other.

Can we settle on one and accept it?

I have changed my patch at the suggestion of one of the other 2 authors,
Peter Feiner, in attempt to find common ground.  It now uses openstack
common code and therefore it is more concise than any of the original
three and should be pretty easy to review.  I'll admit to some bias toward
my own implementation but most importantly, I would like for one of these
implementations to land and start seeing broad usage in the community
earlier than later.

Carl Baldwin

PS Here are the two remaining patches.  The third has been abandoned.

https://review.openstack.org/#/c/37131/
https://review.openstack.org/#/c/36487/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Neutron] The three API server multi-worker process patches.

2013-08-28 Thread Baldwin, Carl (HPCS Neutron)
All,

We've known for a while now that some duplication of work happened with
respect to adding multiple worker processes to the neutron-server.  There
were a few mistakes made which led to three patches being done
independently of each other.

Can we settle on one and accept it?

I have changed my patch at the suggestion of one of the other 2 authors,
Peter Feiner, in attempt to find common ground.  It now uses openstack
common code and therefore it is more concise than any of the original
three and should be pretty easy to review.  I'll admit to some bias toward
my own implementation but most importantly, I would like for one of these
implementations to land and start seeing broad usage in the community
earlier than later.

Carl Baldwin

PS Here are the two remaining patches.  The third has been abandoned.

https://review.openstack.org/#/c/37131/
https://review.openstack.org/#/c/36487/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev