On 15 July 2015 at 19:25, Robert Collins <[email protected]> wrote: > On 16 July 2015 at 02:18, Ed Leafe <[email protected]> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA512 > ... >> What I'd like to investigate is replacing the current design of having >> the compute nodes communicating with the scheduler via message queues. >> This design is overly complex and has several known scalability >> issues. My thought is to replace this with a Cassandra [1] backend. >> Compute nodes would update their state to Cassandra whenever they >> change, and that data would be read by the scheduler to make its host >> selection. When the scheduler chooses a host, it would post the claim >> to Cassandra wrapped in a lightweight transaction, which would ensure >> that no other scheduler has tried to claim those resources. When the >> host has built the requested VM, it will delete the claim and update >> Cassandra with its current state. > > +1 on doing an experiment. > > Some semi-random thoughts here. Well, not random at all, I've been > mulling on this for a while. > > I think Kafka may fit our model significantly vis-a-vis updating state > more closely than Cassandra does. It would be neat if we could do a > few different sketchy implementations and head-to-head test them. I > love Cassandra in a lot of ways, but lightweight-transaction are two > words that I'd really not expect to see in Cassandra (Yes, I know it > has them in the official docs and design :)) - its a full paxos > interaction to do SERIAL consistency, which is more work than ether > QUORUM or LOCAL_QUORUM. A sharded approach - there is only one compute > node in question for the update needed - can be less work than either > and still race free. > > I too also very much want to see us move to brokerless RPC, > systematically, for all the reasons :). You might need a little of > that mixed in to the experiments, depending on the scale reached. > > In terms of quantification; are you looking to test scalability (e.g. > scheduling some N events per second without races), [there are huge > improvements possible by rewriting the current schedulers innards to > be less wasteful, but that doesn't address active-active setups], > latency (e.g. 99th percentile time-to-schedule) or <...> ?
+1 for trying Kafka I have tried to write up my thoughts on the Kafka approach (and a few related things) in here: https://review.openstack.org/#/c/191914/5/specs/backlog/approved/parallel-scheduler.rst,cm Its trying to describe what I want to prototype for the next scheduler, its also possibly one of the worse specs I have ever seen. There may be some ideas worth nicking in there (there may not be!) John PS I also cover my want for multiple schedulers living in Nova, long term (We already have 2.5 schedulers, depending on how you count them) I can see some of these schedulers being the "best" for a sub set of deployments. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
