Ian Wells wrote:
On 10 October 2015 at 23:47, Clint Byrum <[email protected] <mailto:[email protected]>> wrote:> Per before, my suggestion was that every scheduler tries to maintain a copy > of the cloud's state in memory (in much the same way, per the previous > example, as every router on the internet tries to make a route table out of > what it learns from BGP). They don't have to be perfect. They don't have > to be in sync. As long as there's some variability in the decision making, > they don't have to update when another scheduler schedules something (and > you can make the compute node send an immediate update when a new VM is > run, anyway). They all stand a good chance of scheduling VMs well > simultaneously. > I'm quite in favor of eventual consistency and retries. Even if we had a system of perfect updating of all state records everywhere, it would break sometimes and I'd still want to not trust any record of state as being correct for the entire distributed system. However, there is an efficiency win gained by staying _close_ to correct. It is actually a function of the expected entropy. The more concurrent schedulers, the more entropy there will be to deal with. ... and the fewer the servers in total, the larger the entropy as a proportion of the whole system (if that's a thing, it's a long time since I did physical chemistry). But consider the use cases: 1. I have a small cloud, I run two schedulers for redundancy. There's a good possibility that, when the cloud is loaded, the schedulers make poor decisions occasionally. We'd have to consider how likely that was, certainly. 2. I have a large cloud, and I run 20 schedulers for redundancy. There's a good chance that a scheduler is out of date on its information. But there could be several hundred hosts willing to satisfy a scheduling request, and even of the ones with incorrect information a low chance that any of those are close to the threshold where they won't run the VM in question, so good odds it will pick a host that's happy to satsify the request. > But to be fair, we're throwing made up numbers around at this point. Maybe > it's time to work out how to test this for scale in a harness - which is > the bit of work we all really need to do this properly, or there's no proof > we've actually helped - and leave people to code their ideas up? I'm working on adding meters for rates and amounts of messages and queries that the system does right now for performance purposes. Rally though, is the place where I'd go to ask "how fast can we schedule things right now?". My only concern is that we're testing a real cloud at scale and I haven't got any more firstborn to sell for hardware, so I wonder if we can fake up a compute node in our test harness.
Does the openstack foundation have access to a scaling area that can be used by the community for this kind of experimental work? It seems like infra or others should be able make that possible? Maybe we could sacrifice a summit and instead of spending the money on that we (as a community) could spend the money on a really nice scale lab for the community ;)
-- Ian. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
