Hi, I am new in this area. I got an idea but didn't know whether that works. Fanout_cast is expensive and DB could be a burden. Can we maintain the stat data at nodes, and when and only when a scheduler needs to do any scheduling, the scheduler proactively to ask nodes their stats? The assumption is scheduling doesn't happen frequently, compared with the frequency of fanout_cast?
Best Regards. -- Shane Brian Elliott wrote onĀ 2013-06-18: > > On Jun 17, 2013, at 3:50 PM, Chris Behrens <[email protected]> wrote: > >> >> On Jun 17, 2013, at 7:49 AM, Russell Bryant <[email protected]> wrote: >> >>> On 06/16/2013 11:25 PM, Dugger, Donald D wrote: >>>> Looking into the scheduler a bit there's an issue of duplicated effort >>>> that is a > little puzzling. The database table `compute_nodes' is being updated > periodically with data about capabilities and resources used (memory, vcpus, > ...) > while at the same time a periodic RPC call is being made to the scheduler > sending > pretty much the same data. >>>> >>>> Does anyone know why we are updating the same data in two different > place using two different mechanisms? Also, assuming we were to remove one > of these updates, which one should go? (I thought at one point in time there > was a goal to create a database free compute node which would imply we should > remove the DB update.) >>> >>> Have you looked around to see if any code is using the data from the db? >>> >>> Having schedulers hit the db for the current state of all compute nodes >>> all of the time would be a large additional db burden that I think we >>> should avoid. So, it makes sense to keep the rpc fanout_cast of current >>> stats to schedulers. >> >> This is actually what the scheduler uses. :) The fanout messages are too > infrequent and can be too laggy. So, the scheduler was moved to using the DB > a long, long time ago. but it was very inefficient, at first, because it > looped > through all instances. So we added things we needed into compute_node and > compute_node_stats so we only had to look at the hosts. You have to pull the > hosts anyway, so we pull the stats at the same time. >> >> The problem is. when we stopped using certain data from the fanout > messages.. we never removed it. We should AT LEAST do this. But.. (see > below).. >> >>> >>> The scheduler also does a fanout_cast to all compute nodes when it >>> starts up to trigger the compute nodes to populate the cache in the >>> scheduler. It would be nice to never fanout_cast to all compute nodes >>> (given that there may be a *lot* of them). We could replace this with >>> having the scheduler populate its cache from the database. >> >> I think we should audit the remaining things that the scheduler uses from >> these > messages and move them to the DB. I believe it's limited to the hypervisor > capabilities to compare against aggregates or some such. I believe it's > things > that change very rarely. so an alternative can be to only send fanout messages > when capabilities change! We could always do that as a first step. >> >>> >>> Removing the db usage completely would be nice if nothing is actually >>> using it, but we'd have to look into an alternative solution for >>> removing the scheduler fanout_cast to compute. >> >> Relying on anything but the DB for current memory free, etc, is just >> too laggy. so we need to stick with it, IMO. >> >> - Chris >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> [email protected] >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > As Chris said, the reason it ended up this way using the DB is to quickly get > up to > date usage on hosts to the scheduler. I certainly understand the point that > it's a > whole lot of increased load on the DB, but the RPC data was quite stale. If > there > is interest in moving away from the DB updates, I think we have to either: > > 1) Send RPC updates to scheduler on essentially every state change > during a build. > > or > > 2) Change the scheduler architecture so there is some "memory" of > resources consumed between requests. The scheduler would have to > remember which hosts recent builds were assigned to. This could be a > bit of a data synchronization problem. if you're talking about using > multiple scheduler instances. > > Brian > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
