Hey, I realise I've done a sort of point-bypoint thing below - sorry. Let me say that I'm glad you're focused on what will help users, and their needs - I am too. Hopefully we can figure out why we have different opinions about what things are key, and/or how we can get data to better understand our potential users.
On 28 November 2013 02:39, Jaromir Coufal <[email protected]> wrote: > Important point here is, that we agree on starting with very basics - grow > then. Which is great. > > The whole deployment workflow (not just UI) is all about user experience > which is built on top of TripleO's approach. Here I see two important > factors: > - There are users who are having some needs and expectations. Certainly. Do we have Personas for those people? (And have we done any validation of them?) > - There is underlying concept of TripleO, which we are using for > implementing features which are satisfying those needs. mmm, so the technical aspect of TripleO is about setting up a virtuous circle: where improvements in deploying cluster software via OpenStack makes deploying OpenStack better, and those of us working on deploying OpenStack will make deploying cluster software via OpenStack better in general, as part of solving 'deploying OpenStack' in a nice way. > We are circling around and trying to approach the problem from wrong end - > which is implementation point of view (how to avoid own scheduling). > > Let's try get out of the box and start with thinking about our audience > first - what they expect, what they need. Then we go back, put our > implementation thinking hat on and find out how we are going to re-use > OpenStack components to achieve our goals. In the end we have detailed plan. Certainly, +1. > === Users === > > I would like to start with our targeted audience first - without milestones, > without implementation details. > > I think here is the main point where I disagree and which leads to different > approaches. I don't think, that user of TripleO cares only about deploying > infrastructure without any knowledge where the things go. This is overcloud > user's approach - 'I want VM and I don't care where it runs'. Those are > self-service users / cloud users. I know we are OpenStack on OpenStack, but > we shouldn't go that far that we expect same behavior from undercloud users. > I can tell you various examples of why the operator will care about where > the image goes and what runs on specific node. This may be where we disagree indeed :). Wearing my sysadmin hat ( a little dusty, but never really goes away :P) - I can tell you I spent a lot of time worrying about what went on what machine. But it was never actually what I was paid to do. What I was paid to do was to deliver infrastructure and services to the business. Everything that we could automate, that we could describe with policy and still get robust, reliable results - we did. It's how one runs many hundred machines with an ops team of 2. Planning around failure domains for example, is tedious work; it's needed at a purchasing level - you need to decide if you're buying three datacentres or one datacentre with internal redundancy, but once thats decided the actual mechanics of ensure that each HA service is spread across the (three datacentres) or (three separate zones in the one DC) is not interesting. So - I'm sure that many sysadmins do manually assign work to machines to ensure a good result from performance or HA concerns, but thats out of necessity, not desire. > One quick example: > I have three racks of homogenous hardware and I want to design it the way so > that I have one control node in each, 3 storage nodes and the rest compute. > With that smart deployment, I'll never know what my rack contains in the > end. But if I have control over stuff, I can say that this node is > controller, those three are storage and those are compute - I am happy from > the very beginning. Why does that layout make you happy? What is it about that setup where things will work better for you? Note that in the absence of a sophisticated scheduler you'll have some volumes with redundancy of 3 end up all in one rack: you won't get rack-can-fail safety on the delivered cloud workloads (I mention this as one attempt to understand why knowing there is a control node / 3 storage /rest compute in each rack makes you happy). > Our targeted audience are sysadmins, operators. They hate 'magics'. They > want to have control over things which they are doing. If we put in front of > them workflow, where they click one button and they get cloud installed, > they will get horrified. I don't think this is a good characterisation of the sysadmin / operator mindset. They - like anyone don't like surprises, and they often care intensely about delivering services well, with high performance and high availability. Tools that help them do that are appreciated, tools that are flaky - which a lot of abstract-all-the-details tools seem to be - get a bad rap in sysadmin circles. > That's why I am very sure and convinced that we need to have ability for > user to have control over stuff. What node is having what role. We can be > smart, suggest and advice. But not hiding this functionality from user. > Otherwise, I am afraid that we can fail. I think having that degree of control is failure. Our CloudOS team has considerable experience now in deploying clouds using a high-touch system like you describe - and they are utterly convinced that it doesn't scale. Even at 20 nodes it is super tedious, and beyond that it's ridiculous. > Furthermore, if we put lots of restrictions (like homogenous hardware) in > front of users from the very beginning, we are discouraging people from > using TripleO-UI. We are young project and trying to hit as broad audience > as possible. If we do flexible enough approach to get large audience > interested, solve their problems, we will get more feedback, we will get > early adopters, we will get more contributors, etc. Flexibilty comes with a cost. Right now we have a large audience interested in what we have, but we're delivering two separate things: we have a functional sysadminny interface with command line scripts and heat templates - , and we have a GUI where we can offer a better interface which the tuskar folk are building up. I agree that homogeneous hardware isn't a viable long term constraint. But if we insist on fixing that issue first, we sacrifice our ability to learn about the usefulness of a simple, straight forward interface. We'll be doing a bunch of work - regardless of implementation - to deal with heterogeneity, when we could be bringing Swift and Cinder up to production readiness - which IMO will get many more folk onboard for adoption. > First, let's help cloud operator, who is having some nodes and wants to > deploy OpenStack on them. He wants to have control which node is controller, > which node is compute or storage. Then we can get smarter and guide. Folk that want to manually install openstack on a couple of machines can already do so : we don't change the game for them by replacing a manual system with a manual system. My vision is that we should deliver something significantly better! > === Milestones === > > Based on different user behavior I am talking about, I suggest different > milestones: ... So, I have a suggestion. Lets create a set of all the things we want in the product eventually. https://etherpad.openstack.org/p/tripleo-feature-map >From there we can assess for each thing several things: cost - estimated cost of 'ok'(*) implementation - 0: expensive- multiple cycles, 9: cheap benefit(us) - estimated benefit to design learning by having a functional implementation - 0: learn nothing, 9: learn lots benefit(users) - e.g. estimated increase in # of users for which TripleO will satisfy their needs (as part of a holistic install) - 0: minimal increase, 9: huge increase >From there we can draw a cube: things that are cheap, we learn a lot, and users benefit a lot are no brainers :) Things that are expensive, we don't learn a lot and users don't benefit much are clearly things we don't want to do right now: cost b-us b-users do-when ? 0 0 0 never? 9 9 9 right now 5 5 5 sometime in the middle but more interesting are combinations like: 0 9 9 start now as a background task? 9 2 2 Do if we have nothing better 9 0 9 right now 9 9 0 also right now So I dunno if this is a good idea - it's just an attempt to visualise the tradeoffs in a way that we can be clear what we're saying is good about a specific feature [think of it as a variation on planning poker]. (*): I mean an implementation we could live with for a while, vs whatever the ideal might be. > > === Implementation === > > Above mentioned approach shouldn't lead to reimplementing scheduler. We can > still use nova-scheduler, but we can take advantage of extra params (like > unique identifier), so that we specify more concretely what goes where. That is reimplementing the scheduler. In this case it's forcing sysadmins to be the scheduler, which is a waste of their time. > More details should follow here - how to achieve above mentioned goals, like > what should go through heat, what should go through nova, ironic, etc. > > But first, let's agree on approach and goals. Totally agree! -Rob -- Robert Collins <[email protected]> Distinguished Technologist HP Converged Cloud _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
