On 07/15/15 20:40, Clint Byrum wrote:
What you describe is a spike. It's a grand plan, and you don't need
anyone's permission, so huzzah for the spike!
As far as what should be improved, I hear a lot that having multiple
schedulers does not scale well, so I'd suggest that as a primary target
(maybe measure the _current_ problem, and then set the target as a 10x
improvement over what we have now).
Things to consider while pushing on that goal:
* Do not backslide the resilience in the system. The code is just now
starting to be fault tolerant when talking to RabbitMQ, so make sure
to also consider how tolerant of failures this will be. Cassandra is
typically chosen for its resilience and performance, but Cassandra does
a neat trick in that clients can switch its CAP theorem profile from
Consistent and Available (but slow) to Available and Performant when
reading things. That might be useful in the context of trying to push
the performance _UP_ for schedulers, while not breaking anything else.
* Consider the cost of introducing a brand new technology into the
deployer space. If there _is_ a way to get the desired improvement with,
say, just MySQL and some clever sharding, then that might be a smaller
pill to swallow for deployers.
+1000 to this part regarding introducing a new technology
Anyway, I wish you well on this endeavor and hope to see your results
soon!
Excerpts from Ed Leafe's message of 2015-07-15 07:18:42 -0700:
Hash: SHA512
Changing the architecture of a complex system such as Nova is never
easy, even when we know that the design isn't working as well as we
need it to. And it's even more frustrating because when the change is
complete, it's hard to know if the improvement, if any, was worth it.
So I had an idea: what if we ran a test of that architecture change
out-of-tree? In other words, create a separate deployment, and rip out
the parts that don't work well, replacing them with an alternative
design. There would be no Gerrit reviews or anything that would slow
down the work or add load to the already overloaded reviewers. Then we
could see if this modified system is a significant-enough improvement
to justify investing the time in implementing it in-tree. And, of
course, if the test doesn't show what was hoped for, it is scrapped
and we start thinking anew.
The important part in this process is defining up front what level of
improvement would be needed to make considering actually making such a
change worthwhile, and what sort of tests would demonstrate whether or
not whether this level was met. I'd like to discuss such an experiment
next week at the Nova mid-cycle.
What I'd like to investigate is replacing the current design of having
the compute nodes communicating with the scheduler via message queues.
This design is overly complex and has several known scalability
issues. My thought is to replace this with a Cassandra [1] backend.
Compute nodes would update their state to Cassandra whenever they
change, and that data would be read by the scheduler to make its host
selection. When the scheduler chooses a host, it would post the claim
to Cassandra wrapped in a lightweight transaction, which would ensure
that no other scheduler has tried to claim those resources. When the
host has built the requested VM, it will delete the claim and update
Cassandra with its current state.
One main motivation for using Cassandra over the current design is
that it will enable us to run multiple schedulers without increasing
the raciness of the system. Another is that it will greatly simplify a
lot of the internal plumbing we've set up to implement in Nova what we
would get out of the box with Cassandra. A third is that if this
proves to be a success, it would also be able to be used further down
the road to simplify inter-cell communication (but this is getting
ahead of ourselves...). I've worked with Cassandra before and it has
been rock-solid to run and simple to set up. I've also had preliminary
technical reviews with the engineers at DataStax [2], the company
behind Cassandra, and they agreed that this was a good fit.
At this point I'm sure that most of you are filled with thoughts on
how this won't work, or how much trouble it will be to switch, or how
much more of a pain it will be, or how you hate non-relational DBs, or
any of a zillion other negative thoughts. FWIW, I have them too. But
instead of ranting, I would ask that we acknowledge for now that:
a) it will be disruptive and painful to switch something like this at
this point in Nova's development
b) it would have to provide *significant* improvement to make such a
change worthwhile
So what I'm asking from all of you is to help define the second part:
what we would want improved, and how to measure those benefits. In
other words, what results would you have to see in order to make you
reconsider your initial "nah, this'll never work" reaction, and start
to think that this is will be a worthwhile change to make to Nova.
I'm also asking that you refrain from talking about why this can't
work for now. I know it'll be difficult to do that, since nobody likes
ranting about stuff more than I do, but right now it won't be helpful.
There will be plenty of time for that later, assuming that this
experiment yields anything worthwhile. Instead, think of the current
pain points in the scheduler design, and what sort of improvement you
would have to see in order to seriously consider undertaking this
change to Nova.
I've gotten the OK from my management to pursue this, and several
people in the community have expressed support for both the approach
and the experiment, even though most don't have spare cycles to
contribute. I'd love to have anyone who is interested become involved.
I hope that this will be a positive discussion at the Nova mid-cycle
next week. I know it will be a lively one. :)
[1] http://cassandra.apache.org/
[2] http://www.datastax.com/
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Best Regards,
Maish Saidel-Keesing
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev