Re: [openstack-dev] [tc] [all] OpenStack moving both too fast and too slow at the same time

Dmitry Tantsur Wed, 10 May 2017 04:42:24 -0700

On 05/09/2017 07:59 PM, Joshua Harlow wrote:

Matt Riedemann wrote:
On 5/8/2017 1:10 PM, Octave J. Orgeron wrote:
I do agree that scalability and high-availability are definitely issues
for OpenStack when you dig deeper into the sub-components. There is a
lot of re-inventing of the wheel when you look at how distributed
services are implemented inside of OpenStack and deficiencies. For some
services you have a scheduler that can scale-out, but the conductor or
worker process doesn't. A good example is cinder, where cinder-volume
doesn't scale-out in a distributed manner and doesn't have a good
mechanism for recovering when an instance fails. All across the services
you see different methods for coordinating requests and tasks such as
rabbitmq, redis, memcached, tooz, mysql, etc. So for an operator, you
have to sift through those choices and configure the per-requisite
infrastructure. This is a good example of a problem that should be
solved with a single architecturally sound solution that all services
can standardize on.
There was an architecture workgroup specifically designed to understand
past architectural decisions in OpenStack, and what the differences are
in the projects, and how to address some of those issues, but from lack
of participation the group dissolved shortly after the Barcelona summit.
This is, again, another example of if you want to make these kinds of
massive changes, it's going to take massive involvement and leadership.
I agree on the 'massive changes, it's going to take massive involvement andleadership.' though I am not sure how such changes and involvement actuallyhappens; especially nowadays where companies which such leadership are moving onto something else (k8s, mesos, or other...)
So knowing that what are the options to actually make some kind of change occur?IMHO it must be driven by PTLs (yes I know they are always busy, to bad, so sad,lol). I'd like all the PTLs to get together and restart the arch-wg and make ita *requirement* that PTLs actually show up (and participate) in thatgroup/meeting vs it just being a bunch of senior(ish) folks, such as myself,that showed up. Then if PTLs do not show up, I would start to say that the nexttime around they are running for PTL said lack of participation in the wideropenstack vision should be known and potentially cause them to get kicked out(voted out?) of being a PTL in the future.


How we have whom to blame. Problem solved?

The problem in a lot of those cases comes down to development being
detached from the actual use cases customers and operators are going to
use in the real world. Having a distributed control plane with multiple
instances of the api, scheduler, coordinator, and other processes is
typically not testable without a larger hardware setup. When you get to
large scale deployments, you need an active/active setup for the control
plane. It's definitely not something you could develop for or test
against on a single laptop with devstack. Especially, if you want to use
more than a handful of the OpenStack services.
I've heard *crazy* things about actual use cases customers and operators aredoing because of the scaling limits that projects have (ie nova has a limit of300 compute nodes so ABC customer will then setup X * 300 clouds to reach Ycompute nodes because of that limit).
IMHO I'm not even sure I would want to target said use-cases in the first place,because they feel messed up in the first place (and it seems bad/dumb? to godown the rabbit hole of targeting use-cases that were deployed to band-aid overthe initial problems that created those use-cases/deployments in the first place).
I think we can all agree with this. Developers don't have a lab with
1000 nodes lying around to hack on. There was OSIC but that's gone. I've
been requesting help in Nova from companies to do scale testing and help
us out with knowing what the major issues are, and report those back in
a form so we can work on those issues. People will report there are
issues, but not do the profiling, or at least not report the results of
profiling, upstream to help us out. So again, this is really up to
companies that have the resources to do this kind of scale testing and
report back and help fix the issues upstream in the community. That
doesn't require OpenStack 2.0.
So how do we close that gap? The only way I really know is by having people thatcan see the problems from the get-go, instead of having to discover it at somelater point (when it falls over and ABC customer starts to start having Y cloudsjust to reach the target number of compute nodes they want to reach). Now maybethe skill level in openstack (especially in regards to distributed systems) isjust to low and the only real way to gather data is by having companies do scaletesting (ie some kind of architecting things to work after they are deployed);if so that's sad...
-Josh

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] [all] OpenStack moving both too fast and too slow at the same time

Reply via email to