On 10 October 2015 at 23:47, Clint Byrum <cl...@fewbar.com> wrote: > > Per before, my suggestion was that every scheduler tries to maintain a > copy > > of the cloud's state in memory (in much the same way, per the previous > > example, as every router on the internet tries to make a route table out > of > > what it learns from BGP). They don't have to be perfect. They don't > have > > to be in sync. As long as there's some variability in the decision > making, > > they don't have to update when another scheduler schedules something (and > > you can make the compute node send an immediate update when a new VM is > > run, anyway). They all stand a good chance of scheduling VMs well > > simultaneously. > > > > I'm quite in favor of eventual consistency and retries. Even if we had > a system of perfect updating of all state records everywhere, it would > break sometimes and I'd still want to not trust any record of state as > being correct for the entire distributed system. However, there is an > efficiency win gained by staying _close_ to correct. It is actually a > function of the expected entropy. The more concurrent schedulers, the > more entropy there will be to deal with. >
... and the fewer the servers in total, the larger the entropy as a proportion of the whole system (if that's a thing, it's a long time since I did physical chemistry). But consider the use cases: 1. I have a small cloud, I run two schedulers for redundancy. There's a good possibility that, when the cloud is loaded, the schedulers make poor decisions occasionally. We'd have to consider how likely that was, certainly. 2. I have a large cloud, and I run 20 schedulers for redundancy. There's a good chance that a scheduler is out of date on its information. But there could be several hundred hosts willing to satisfy a scheduling request, and even of the ones with incorrect information a low chance that any of those are close to the threshold where they won't run the VM in question, so good odds it will pick a host that's happy to satsify the request. > But to be fair, we're throwing made up numbers around at this point. > Maybe > > it's time to work out how to test this for scale in a harness - which is > > the bit of work we all really need to do this properly, or there's no > proof > > we've actually helped - and leave people to code their ideas up? > > I'm working on adding meters for rates and amounts of messages and > queries that the system does right now for performance purposes. Rally > though, is the place where I'd go to ask "how fast can we schedule things > right now?". > My only concern is that we're testing a real cloud at scale and I haven't got any more firstborn to sell for hardware, so I wonder if we can fake up a compute node in our test harness. -- Ian.
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev