On 11/12/2015 5:34 PM, Joshua Harlow wrote:
Ok, so the following is starting to form:
https://etherpad.openstack.org/p/remote-conductor-performance
Hopefully we can get to the bottom of this (especially for clouds that
run a large amount of computes in a single cell/only one cell).
Andrew Laski wrote:
On 11/12/15 at 10:53am, Clint Byrum wrote:
Excerpts from Joshua Harlow's message of 2015-11-12 10:35:21 -0800:
Mike Dorman wrote:
> We do have a backlog story to investigate this more deeply, we just
have not had the time to do it yet. For us, it’s been easier/faster
to add more hardware to conductor to get over the hump temporarily.
>
> We kind of have that work earmarked for after the Liberty upgrade,
in hopes that maybe it’ll be fixed there.
>
> If anybody else has done even some trivial troubleshooting already,
it’d be great to get that info as a starting point. I.e. which
specific calls to conductor are causing the load, etc.
>
> Mike
>
+1 I think we in the #openstack-performance channel really need to
investigate this, because it really worries me personally from hearing
many many rumors about how the remote conductor falls over. Please join
there and we can try to work through a plan to figure out what to do
about this situation. It would be great if the nova people also joined
there (because in the end, likely something in nova will need to be
fixed/changed/something else to resolve what appears to be a problem
for
many operators).
Falling over is definitely a bad sign. ;)
The concept of pushing messages over a bus instead of just making local
calls shouldn't result in much extra load. Perhaps we just have too many
layers of unoptimized encapsulation. I have to wonder if something like
protobuf would help.
Falling over is also a very broad description and doesn't let us know
what the actual issue is.
From my experience the performance concern with conductor has been in
not understanding the ratio of conductor nodes to computes that are
necessary for our usage. Conductor doesn't add much extra load, but it
concentrates it on a smaller number of services. If we ran one conductor
per compute I suspect we would have no performance issues, but that's a
lot of capacity to use for this.
I am curious what conductor/compute ratios that others are trying to
achieve, given equal hardware types for each, and what are the barriers
to this happening?
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Cool, that's helpful for taking notes. I've posted some questions in there.
I also added this to the next performance team meeting agenda. I have a
conflict at that time so I might not be able to join, but I'm assuming
notes will be put back into the etherpad.
--
Thanks,
Matt Riedemann
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators