Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-20 Thread Joshua Harlow
Looking at the conductor code it still to me provides a low level database API 
that succumbs to the same races as a the old db access did. Get calls followed 
by some response followed by some python code followed by some rpc update 
followed by more code is still susceptible to consistency  fragility issues.

The API provided is more data oriented and not action oriented. I would argue 
that data oriented leads to lots of consistency issues with multiple 
conductors. Action/task oriented if that is ever accomplished allows the 
conductor to lock resources that are being manipulated so that another 
conductor can not alter the same resource at the same time.

Nova currently has a lot of devoted and hard to follow logic for when resources 
are simultaneously manipulated (deleted while building for example). Just look 
for *not found* exceptions being thrown in the conductor from *get/update 
function calls and check where that exception is handled (are all of them? are 
all resources cleaned up??). These seem like examples of a API that is to low 
level and wouldn't be exposed in a action/task oriented API. It appears that 
nova is trying to handle all of these special exists or not already exists (or 
similar consistency violations) calls correctly, which is good, but having said 
logic scattered sure doesn't inspire confidence that it is correctly doing the 
right logic under all scenarios to me. Does that not worry anyone else??

IMHO adding task logic in the conductor on top of the already hard to follow 
logic for these scenarios worries me personally. That's why I previously 
thought (and others seem to think) task logic and correct locking and such ... 
should be located in a service that can devote its code to just doing said 
tasks reliably. Honestly said code will be much much more complex than a 
database-rpc access layer (especially when the races and simultaneous 
manipulation problems are not hidden/scattered but are dealt with in an upfront 
and easily auditable manner).

But maybe this is nothing new to folks and all of this is already being thought 
about (solutions do seem to be appearing and more discussion about said ideas 
is always beneficial).

Just my thoughts...

Sent from my really tiny device...

On Jul 19, 2013, at 5:30 PM, Peter Feiner pe...@gridcentric.ca wrote:

 On Fri, Jul 19, 2013 at 4:36 PM, Joshua Harlow harlo...@yahoo-inc.com wrote:
 This seems to me to be a good example where a library problem is leaking 
 into the openstack architecture right? That is IMHO a bad path to go down.
 
 I like to think of a world where this isn't a problem and design the correct 
 solution there instead and fix the eventlet problem instead. Other large 
 applications don't fallback to rpc calls to get around a database/eventlet 
 scaling issues afaik.
 
 Honestly I would almost just want to finally fix the eventlet problem (chris 
 b. I think has been working on it) and design a system that doesn't try to 
 work around a libraries lacking. But maybe that's to much idealism, idk...
 
 Well, there are two problems that multiple nova-conductor processes
 fix. One is the bad interaction between eventlet and native code. The
 other is allowing multiprocessing.  That is, once nova-conductor
 starts to handle enough requests, enough time will be spent holding
 the GIL to make it a bottleneck; in fact I've had to scale keystone
 using multiple processes because of GIL contention (i.e., keystone was
 steadily at 100% CPU utilization when I was hitting OpenStack with
 enough requests). So multiple processes isn't avoidable. Indeed, other
 software that strives for high concurrency, such as apache, use
 multiple processes to avoid contention for per-process kernel
 resources like the mmap semaphore.
 
 This doesn't even touch on the synchronization issues that can happen when u 
 start pumping db traffic over a mq. Ex, an update is now queued behind 
 another update, the second one conflicts with the first, where does 
 resolution happen when an async mq call is used. What about when you have X 
 conductors doing Y reads and Z updates; I don't even want to think about the 
 sync/races there (and so on...). Did u hit / check for any consistency 
 issues in your tests? Consistency issues under high load using multiple 
 conductors scare the bejezzus out of me
 
 If a sequence of updates needs to be atomic, then they should be made
 in the same database transaction. Hence nova-conductor's interface
 isn't do_some_sql(query), it's a bunch of high-level nova operations
 that are implemented using transactions.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Day, Phil
 -Original Message-
 From: Dan Smith [mailto:d...@danplanet.com]
 Sent: 16 July 2013 14:51
 To: OpenStack Development Mailing List
 Cc: Day, Phil
 Subject: Re: [openstack-dev] Moving task flow to conductor - concern about
 scale
 
  In the original context of using Conductor as a database proxy then
  the number of conductor instances is directly related to the number of
  compute hosts I need them to serve.
 
 Just a point of note, as far as I know, the plan has always been to establish
 conductor as a thing that sits between the api and compute nodes. However,
 we started with the immediate need, which was the offloading of database
 traffic.


Like I said, I see the need for both a layer between the API and compute and 
between compute and DB, - I just don't see them as having to be part of the 
same thing.

 
  What I not sure is that I would also want to have the same number of
  conductor instances for task control flow - historically even running
  2 schedulers has been a problem, so the thought of having 10's of
  them makes me very concerned at the moment.   However I can't see any
  way to specialise a conductor to only handle one type of request.
 
 Yeah, I don't think the way it's currently being done allows for 
 specialization.
 
 Since you were reviewing actual task code, can you offer any specifics about
 the thing(s) that concern you? I think that scaling conductor (and its tasks)
 horizontally is an important point we need to achieve, so if you see something
 that needs tweaking, please point it out.
 
 Based on what is there now and proposed soon, I think it's mostly fairly safe,
 straightforward, and really no different than what two computes do when
 working together for something like resize or migrate.


There's nothing I've seen so far that causes me alarm,  but then again we're in 
the very early stages and haven't moved anything really complex.
However I think there's an inherent big difference in scaling something which 
is stateless like a DB proxy and scaling a statefull entity like a task 
workflow component.  I'd also suggest that so far there is no real experience 
with that latter within the current code base; compute nodes (which are the 
main scaled-out component so far) work on well defined subsets of the data.


  So I guess my question is, given that it may have to address two
  independent scale drivers, is putting task work flow and DB proxy
  functionality into the same service really the right thing to do - or
  should there be some separation between them.
 
 I think that we're going to need more than one task node, and so it seems
 appropriate to locate one scales-with-computes function with another.
 

I just don't buy into this line of thinking - I need more than one API node for 
HA as well - but that doesn't mean that therefore I want to put anything else 
that needs more than one node in there.

I don't even think these do scale-with-compute in the same way;  DB proxy 
scales with the number of compute hosts because each new host introduces an 
amount of DB load though its periodic tasks.Task work flow scales with the 
number of requests coming into  the system to create / modify servers - and 
that's not directly related to the number of hosts. 

So rather than asking what doesn't work / might not work in the future I 
think the question should be aside from them both being things that could be 
described as a conductor - what's the architectural reason for wanting to have 
these two separate groups of functionality in the same service ?

If it's really just because the concept of conductor got used for a DB proxy 
layer before the task workflow, then we should either think if a new name for 
the latter or rename the former.

If they were separate services and it turns out that I can/want/need to run the 
same number of both then I can pretty easily do that  - but the current 
approach is removing what to be seems a very important degree of freedom around 
deployment on a large scale system.

Cheers,
Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Day, Phil
Hi Josh,

My idea's really pretty simple - make DB proxy and Task workflow separate 
services, and allow people to co-locate them if they want to.

Cheers.
Phil

 -Original Message-
 From: Joshua Harlow [mailto:harlo...@yahoo-inc.com]
 Sent: 17 July 2013 14:57
 To: OpenStack Development Mailing List
 Cc: OpenStack Development Mailing List
 Subject: Re: [openstack-dev] Moving task flow to conductor - concern about
 scale
 
 Hi Phil,
 
 I understand and appreciate your concern and I think everyone is trying to 
 keep
 that in mind. It still appears to me to be to early in this refactoring and 
 task
 restructuring effort to tell where it may end up. I think that's also good 
 news
 since we can get these kinds of ideas (componentized conductors if u will) to
 handle your (and mine) scaling concerns. It would be pretty neat if said
 conductors could be scaled at different rates depending on there component,
 although as u said we need to get much much better with handling said
 patterns (as u said just 2 schedulers is a pita right now). I believe we can 
 do it,
 given the right kind of design and scaling principles we build in from the 
 start
 (right now).
 
 Would like to hear more of your ideas so they get incorporated earlier rather
 than later.
 
 Sent from my really tiny device..
 
 On Jul 16, 2013, at 9:55 AM, Dan Smith d...@danplanet.com wrote:
 
  In the original context of using Conductor as a database proxy then
  the number of conductor instances is directly related to the number
  of compute hosts I need them to serve.
 
  Just a point of note, as far as I know, the plan has always been to
  establish conductor as a thing that sits between the api and compute
  nodes. However, we started with the immediate need, which was the
  offloading of database traffic.
 
  What I not sure is that I would also want to have the same number of
  conductor instances for task control flow - historically even running
  2 schedulers has been a problem, so the thought of having 10's of
  them makes me very concerned at the moment.   However I can't see any
  way to specialise a conductor to only handle one type of request.
 
  Yeah, I don't think the way it's currently being done allows for
  specialization.
 
  Since you were reviewing actual task code, can you offer any specifics
  about the thing(s) that concern you? I think that scaling conductor
  (and its tasks) horizontally is an important point we need to achieve,
  so if you see something that needs tweaking, please point it out.
 
  Based on what is there now and proposed soon, I think it's mostly
  fairly safe, straightforward, and really no different than what two
  computes do when working together for something like resize or migrate.
 
  So I guess my question is, given that it may have to address two
  independent scale drivers, is putting task work flow and DB proxy
  functionality into the same service really the right thing to do - or
  should there be some separation between them.
 
  I think that we're going to need more than one task node, and so it
  seems appropriate to locate one scales-with-computes function with
  another.
 
  Thanks!
 
  --Dan
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Dan Smith
 There's nothing I've seen so far that causes me alarm,  but then
 again we're in the very early stages and haven't moved anything
 really complex.

The migrations (live, cold, and resize) are moving there now. These are
some of the more complex stateful operations I would expect conductor
to manage in the near term, and maybe ever.

 I just don't buy into this line of thinking - I need more than one
 API node for HA as well - but that doesn't mean that therefore I want
 to put anything else that needs more than one node in there.
 
 I don't even think these do scale-with-compute in the same way;  DB
 proxy scales with the number of compute hosts because each new host
 introduces an amount of DB load though its periodic tasks.Task

 to create / modify servers - and that's not directly related to the
 number of hosts. 

Unlike API, the only incoming requests that generate load for the
conductor are things like migrations, which also generate database
traffic.

 So rather than asking what doesn't work / might not work in the
 future I think the question should be aside from them both being
 things that could be described as a conductor - what's the
 architectural reason for wanting to have these two separate groups of
 functionality in the same service ?

IMHO, the architectural reason is lack of proliferation of services and
the added complexity that comes with it. If one expects the
proxy workload to always overshadow the task workload, then making
these two things a single service makes things a lot simpler.

 If they were separate services and it turns out that I can/want/need
 to run the same number of both then I can pretty easily do that  -
 but the current approach is removing what to be seems a very
 important degree of freedom around deployment on a large scale system.

I guess the question, then, is whether other folks agree that the
scaling-separately problem is concerning enough to justify at least an
RPC topic split now which would enable the services to be separated
later if need be.

I would like to point out, however, that the functions are being split
into different interfaces currently. While that doesn't reach low
enough on the stack to allow hosting them in two different places, it
does provide organization such that if we later needed to split them, it
would be a relatively simple (hah) matter of coordinating an RPC
upgrade like anything else.

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Peter Feiner
On Fri, Jul 19, 2013 at 11:06 AM, Dan Smith d...@danplanet.com wrote:
 FWIW, I don't think anyone is suggesting a single conductor, and
 especially not a single database proxy.

This is a critical detail that I missed. Re-reading Phil's original email,
I see you're debating the ratio of nova-conductor DB proxies to
nova-conductor task flow managers.

I had assumed that some of the task management state would exist
in memory. Is it all going to exist in the database?

 Since these queries are made frequently (i.e., easily 100 times
 during instance creation) and while other global locks are held
 (e.g., in the case of nova-compute's ResourceTracker), most of what
 nova-compute does becomes serialized.

 I think your numbers are a bit off. When I measured it just before
 grizzly, an instance create was something like 20-30 database calls.
 Unless that's changed (a lot) lately ... :)

Ah perhaps... at least I had the order of magnitude right :-) Even
with 20-30 calls,
when a bunch of instances are being booted in parallel and all of the
database calls
are serialized, minutes are added in instance creation time.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Joe Gordon
On Jul 19, 2013 9:57 AM, Day, Phil philip@hp.com wrote:

  -Original Message-
  From: Dan Smith [mailto:d...@danplanet.com]
  Sent: 19 July 2013 15:15
  To: OpenStack Development Mailing List
  Cc: Day, Phil
  Subject: Re: [openstack-dev] Moving task flow to conductor - concern
about
  scale
 
   There's nothing I've seen so far that causes me alarm,  but then again
   we're in the very early stages and haven't moved anything really
   complex.
 
  The migrations (live, cold, and resize) are moving there now. These are
some
  of the more complex stateful operations I would expect conductor to
manage in
  the near term, and maybe ever.
 
   I just don't buy into this line of thinking - I need more than one API
   node for HA as well - but that doesn't mean that therefore I want to
   put anything else that needs more than one node in there.
  
   I don't even think these do scale-with-compute in the same way;  DB
   proxy scales with the number of compute hosts because each new host
   introduces an amount of DB load though its periodic tasks.Task
 
   to create / modify servers - and that's not directly related to the
   number of hosts.
 
  Unlike API, the only incoming requests that generate load for the
conductor are
  things like migrations, which also generate database traffic.
 
   So rather than asking what doesn't work / might not work in the
   future I think the question should be aside from them both being
   things that could be described as a conductor - what's the
   architectural reason for wanting to have these two separate groups of
   functionality in the same service ?
 
  IMHO, the architectural reason is lack of proliferation of services
and the
  added complexity that comes with it.
 

 IMO I don't think reducing the number of services is a good enough reason
to group unrelated services (db-proxy, task_workflow).  Otherwise why
aren't we arguing to just add all of these to the existing scheduler
service ?

  If one expects the proxy workload to
  always overshadow the task workload, then making these two things a
single
  service makes things a lot simpler.

 Not if you have to run 40 services to cope with the proxy load, but don't
want the risk/complexity of havign 40 task workflow engines working in
parallel.

   If they were separate services and it turns out that I can/want/need
   to run the same number of both then I can pretty easily do that  - but
   the current approach is removing what to be seems a very important
   degree of freedom around deployment on a large scale system.
 
  I guess the question, then, is whether other folks agree that the
scaling-
  separately problem is concerning enough to justify at least an RPC
topic split
  now which would enable the services to be separated later if need be.
 

 Yep - that's the key question.   An in the interest of keeping the system
stable at scale while we roll through this I think we should be erring on
the side of caution/keeping deployment options open rather than waiting to
see if there's a problem.

++, unless there is some downside to a RPC topic split, this seems like a
reasonable precaution.


  I would like to point out, however, that the functions are being split
into
  different interfaces currently. While that doesn't reach low enough on
the stack
  to allow hosting them in two different places, it does provide
organization such
  that if we later needed to split them, it would be a relatively simple
(hah)
  matter of coordinating an RPC upgrade like anything else.
 
  --Dan

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Peter Feiner
On Fri, Jul 19, 2013 at 10:15 AM, Dan Smith d...@danplanet.com wrote:

  So rather than asking what doesn't work / might not work in the
  future I think the question should be aside from them both being
  things that could be described as a conductor - what's the
  architectural reason for wanting to have these two separate groups of
  functionality in the same service ?

 IMHO, the architectural reason is lack of proliferation of services and
 the added complexity that comes with it. If one expects the
 proxy workload to always overshadow the task workload, then making
 these two things a single service makes things a lot simpler.

I'd like to point a low-level detail that makes scaling nova-conductor
at the process level extremely compelling: the database driver
blocking the eventlet thread serializes nova's database access.

Since the database connection driver is typically implemented in a
library beyond the purview of eventlet's monkeypatching (i.e., a
native python extension like _mysql.so), blocking database calls will
block all eventlet coroutines. Since most of what nova-conductor does
is access the database, a nova-conductor process's handling of
requests is effectively serial.

Nova-conductor is the gateway to the database for nova-compute
processes.  So permitting a single nova-conductor process would
effectively serialize all database queries during instance creation,
deletion, periodic instance refreshes, etc. Since these queries are
made frequently (i.e., easily 100 times during instance creation) and
while other global locks are held (e.g., in the case of nova-compute's
ResourceTracker), most of what nova-compute does becomes serialized.

In parallel performance experiments I've done, I have found that
running multiple nova-conductor processes is the best way to mitigate
the serialization of blocking database calls. Say I am booting N
instances in parallel (usually up to N=40). If I have a single
nova-conductor process, the duration of each nova-conductor RPC
increases linearly with N, which can add _minutes_ to instance
creation time (i.e., dozens of RPCs, some taking several seconds).
However, if I run N nova-conductor processes in parallel, then the
duration of the nova-conductor RPCs do not increase with N; since each
RPC is most likely handled by a different nova-conductor, serial
execution of each process is moot.

Note that there are alternative methods for preventing the eventlet
thread from blocking during database calls. However, none of these
alternatives performed as well as multiple nova-conductor processes:

Instead of using the native database driver like _mysql.so, you can
use a pure-python driver, like pymysql by setting
sql_connection=mysql+pymysql://... in the [DEFAULT] section of
/etc/nova/nova.conf, which eventlet will monkeypatch to avoid
blocking. The problem with this approach is the vastly greater CPU
demand of the pure-python driver compared to the native driver. Since
the pure-python driver is so much more CPU intensive, the eventlet
thread spends most of its time talking to the database, which
effectively the problem we had before!

Instead of making database calls from eventlet's thread, you can
submit them to eventlet's pool of worker threads and wait for the
results. Try this by setting dbapi_use_tpool=True in the [DEFAULT]
section of /etc/nova/nova.conf. The problem I found with this approach
was the overhead of synchronizing with the worker threads. In
particular, the time elapsed between the worker thread finishing and
the waiting coroutine being resumed was typically several times
greater than the duration of the database call itself.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Dan Smith
 Nova-conductor is the gateway to the database for nova-compute
 processes.  So permitting a single nova-conductor process would
 effectively serialize all database queries during instance creation,
 deletion, periodic instance refreshes, etc.

FWIW, I don't think anyone is suggesting a single conductor, and
especially not a single database proxy.

 Since these queries are made frequently (i.e., easily 100 times
 during instance creation) and while other global locks are held
 (e.g., in the case of nova-compute's ResourceTracker), most of what
 nova-compute does becomes serialized.

I think your numbers are a bit off. When I measured it just before
grizzly, an instance create was something like 20-30 database calls.
Unless that's changed (a lot) lately ... :)

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Robert Collins
On 19 July 2013 22:55, Day, Phil philip@hp.com wrote:
 Hi Josh,

 My idea's really pretty simple - make DB proxy and Task workflow separate 
 services, and allow people to co-locate them if they want to.

+1, for all the reasons discussed in this thread. I was weirded out
when I saw non-DB-proxy work being put into the same service. One
additional reason that hasn't been discussed is security : the more
complex the code in the 'actually connects to the DB', the greater the
risk of someone getting direct access that shouldn't via a code bug.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Cloud Services

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-17 Thread Joshua Harlow
Hi Phil, 

I understand and appreciate your concern and I think everyone is trying to keep 
that in mind. It still appears to me to be to early in this refactoring and 
task restructuring effort to tell where it may end up. I think that's also 
good news since we can get these kinds of ideas (componentized conductors if u 
will) to handle your (and mine) scaling concerns. It would be pretty neat if 
said conductors could be scaled at different rates depending on there 
component, although as u said we need to get much much better with handling 
said patterns (as u said just 2 schedulers is a pita right now). I believe we 
can do it, given the right kind of design and scaling principles we build in 
from the start (right now).

Would like to hear more of your ideas so they get incorporated earlier rather 
than later.

Sent from my really tiny device..

On Jul 16, 2013, at 9:55 AM, Dan Smith d...@danplanet.com wrote:

 In the original context of using Conductor as a database proxy then
 the number of conductor instances is directly related to the number
 of compute hosts I need them to serve.
 
 Just a point of note, as far as I know, the plan has always been to
 establish conductor as a thing that sits between the api and compute
 nodes. However, we started with the immediate need, which was the
 offloading of database traffic.
 
 What I not sure is that I would also want to have the same number of
 conductor instances for task control flow - historically even running
 2 schedulers has been a problem, so the thought of having 10's of
 them makes me very concerned at the moment.   However I can't see any
 way to specialise a conductor to only handle one type of request.
 
 Yeah, I don't think the way it's currently being done allows for
 specialization.
 
 Since you were reviewing actual task code, can you offer any specifics
 about the thing(s) that concern you? I think that scaling conductor (and
 its tasks) horizontally is an important point we need to achieve, so if
 you see something that needs tweaking, please point it out.
 
 Based on what is there now and proposed soon, I think it's mostly fairly
 safe, straightforward, and really no different than what two computes do
 when working together for something like resize or migrate.
 
 So I guess my question is, given that it may have to address two
 independent scale drivers, is putting task work flow and DB proxy
 functionality into the same service really the right thing to do - or
 should there be some separation between them.
 
 I think that we're going to need more than one task node, and so it
 seems appropriate to locate one scales-with-computes function with
 another.
 
 Thanks!
 
 --Dan
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Moving task flow to conductor - concern about scale

2013-07-16 Thread Day, Phil
Hi Folks,

Reviewing some the changes to move control flows into conductor made me wonder 
about an issue that I haven't seen discussed so far (apologies if it was and 
I've missed it):

In the original context of using Conductor as a database proxy then the number 
of conductor instances is directly related to the number of compute hosts I 
need them to serve.   I don't have a fee for what this ratio is (as we haven't 
switched yet) but based on the discussions in Portland I have the expectation 
that even with the eventlet performance fix in place there could still need to 
be 10's for a large deployment.

What I not sure is that I would also want to have the same number of conductor 
instances for task control flow - historically even running 2 schedulers has 
been a problem, so the thought of having 10's of them makes me very concerned 
at the moment.   However I can't see any way to specialise a conductor to only 
handle one type of request.

So I guess my question is, given that it may have to address two independent 
scale drivers, is putting task work flow and DB proxy functionality into the 
same service really the right thing to do - or should there be some separation 
between them.

Don't get me wrong - I'm not against the concept of having the task work flow 
in a well defined place - just wondering if conductor is really the logical 
place to do it rather than , for example,  making this part of an extended set 
of functionality for the scheduler (which is already a separate service with 
its own scaling properties).

Thoughts ?

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-16 Thread Dan Smith
 In the original context of using Conductor as a database proxy then
 the number of conductor instances is directly related to the number
 of compute hosts I need them to serve. 

Just a point of note, as far as I know, the plan has always been to
establish conductor as a thing that sits between the api and compute
nodes. However, we started with the immediate need, which was the
offloading of database traffic.

 What I not sure is that I would also want to have the same number of
 conductor instances for task control flow - historically even running
 2 schedulers has been a problem, so the thought of having 10's of
 them makes me very concerned at the moment.   However I can't see any
 way to specialise a conductor to only handle one type of request.

Yeah, I don't think the way it's currently being done allows for
specialization.

Since you were reviewing actual task code, can you offer any specifics
about the thing(s) that concern you? I think that scaling conductor (and
its tasks) horizontally is an important point we need to achieve, so if
you see something that needs tweaking, please point it out.

Based on what is there now and proposed soon, I think it's mostly fairly
safe, straightforward, and really no different than what two computes do
when working together for something like resize or migrate.

 So I guess my question is, given that it may have to address two
 independent scale drivers, is putting task work flow and DB proxy
 functionality into the same service really the right thing to do - or
 should there be some separation between them.

I think that we're going to need more than one task node, and so it
seems appropriate to locate one scales-with-computes function with
another.

Thanks!

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev