On Jun 17, 2013, at 3:50 PM, Chris Behrens <cbehr...@codestud.com> wrote:

> 
> On Jun 17, 2013, at 7:49 AM, Russell Bryant <rbry...@redhat.com> wrote:
> 
>> On 06/16/2013 11:25 PM, Dugger, Donald D wrote:
>>> Looking into the scheduler a bit there's an issue of duplicated effort that 
>>> is a little puzzling.  The database table `compute_nodes' is being updated 
>>> periodically with data about capabilities and resources used (memory, 
>>> vcpus, ...) while at the same time a periodic RPC call is being made to the 
>>> scheduler sending pretty much the same data.
>>> 
>>> Does anyone know why we are updating the same data in two different place 
>>> using two different mechanisms?  Also, assuming we were to remove one of 
>>> these updates, which one should go?  (I thought at one point in time there 
>>> was a goal to create a database free compute node which would imply we 
>>> should remove the DB update.)
>> 
>> Have you looked around to see if any code is using the data from the db?
>> 
>> Having schedulers hit the db for the current state of all compute nodes
>> all of the time would be a large additional db burden that I think we
>> should avoid.  So, it makes sense to keep the rpc fanout_cast of current
>> stats to schedulers.
> 
> This is actually what the scheduler uses. :)   The fanout messages are too 
> infrequent and can be too laggy.  So, the scheduler was moved to using the DB 
> a long, long time ago… but it was very inefficient, at first, because it 
> looped through all instances.  So we added things we needed into compute_node 
> and compute_node_stats so we only had to look at the hosts.  You have to pull 
> the hosts anyway, so we pull the stats at the same time.
> 
> The problem is… when we stopped using certain data from the fanout messages…. 
> we never removed it.   We should AT LEAST do this.  But.. (see below)..
> 
>> 
>> The scheduler also does a fanout_cast to all compute nodes when it
>> starts up to trigger the compute nodes to populate the cache in the
>> scheduler.  It would be nice to never fanout_cast to all compute nodes
>> (given that there may be a *lot* of them).  We could replace this with
>> having the scheduler populate its cache from the database.
> 
> I think we should audit the remaining things that the scheduler uses from 
> these messages and move them to the DB.  I believe it's limited to the 
> hypervisor capabilities to compare against aggregates or some such.  I 
> believe it's things that change very rarely… so an alternative can be to only 
> send fanout messages when capabilities change!   We could always do that as a 
> first step.
> 
>> 
>> Removing the db usage completely would be nice if nothing is actually
>> using it, but we'd have to look into an alternative solution for
>> removing the scheduler fanout_cast to compute.
> 
> Relying on anything but the DB for current memory free, etc, is just too 
> laggy… so we need to stick with it, IMO.
> 
> - Chris
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

As Chris said, the reason it ended up this way using the DB is to quickly get 
up to date usage on hosts to the scheduler.  I certainly understand the point 
that it's a whole lot of increased load on the DB, but the RPC data was quite 
stale.  If there is interest in moving away from the DB updates, I think we 
have to either:

1) Send RPC updates to scheduler  on essentially every state change during a 
build.

or

2) Change the scheduler architecture so there is some "memory" of resources 
consumed between requests.  The scheduler would have to remember which hosts 
recent builds were assigned to.  This could be a bit of a data synchronization 
problem. if you're talking about using multiple scheduler instances.

Brian
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to