On 9/28/2016 12:10 AM, Joshua Harlow wrote:
ACTION: we should make sure workarounds are advertised better
ACTION: we should have some document about "when cells"?
This is a difficult question to answer because "it depends." It's akin
to asking "how many nova-api/nova-conductor processes should I run?"
Well, what hardware is being used, how much traffic do you get, is it
bursty or sustained, are instances created and left alone or are they
torn down regularly, do you prune your database, what version of rabbit
are you using, etc...

I would expect the best answer(s) to this question are going to come
from the operators themselves. What I've seen with cellsv1 is that
someone will decide for themselves that they should put no more than X
computes in a cell and that information filters out to other operators.
That provides a starting point for a new deployment to tune from.

I don't think we need "don't go larger than N nodes" kind of advice. But
we should probably know what kinds of things we expect to be hot spots.
Like mysql load, possibly indicated by system load or high level of db
conflicts. Or rabbit mq load. Or something along those lines.

Basically the things to look out for that indicate your are approaching
a scale point where cells is going to help. That also helps in defining
what kind of scaling issues cells won't help on, which need to be
addressed in other ways (such as optimizations).

Big +1 if we can really get out of the behavior/pattern of
thinking/thought of guessing at the overall system characteristics
*somehow* I think it would be great for our own communities maturity and
for each project/s. Even though I know such things are hard, it scares
the bejeezus out of me when we (as a group) create software but can't
give recommendations on its behavioral characteristics (we aren't doing
quantum physics here the last time I checked).

Just some ideas:

* Rally maybe can help here?
* Fixing a standard set of configuration options and testing that at
scale (using the intel lab?) - and then possibly using rally (or other)
to probe the system characteristics and then giving recommendations
before releasing the software for general consumption based on observed
system characteristics (this is basically what operators are going to
have to do anyway to qualify a release, especially if the community
isn't doing it and/or is shying away from doing it).

I just have a hard time accepting that tribal knowledge about scale that
has to filter from operators to operator (yes I know from personal
experience this is how things trickled down) is a good way to go. It
reminds me of the medicine and practices in the late 1800s where all
sorts of quackery science was happening; and IMHO we can do better than
this :)

Hmm, that reminds me that I'm running low on leeches...


Anyway, back to your regularly scheduled programming,

-Josh

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to