Re: [openstack-dev] [TripleO] Summit session wrapup
On Sun Dec 1 00:27:30 2013, Tzu-Mainn Chen wrote: I think it's far more important that we list out requirements and create a design document that people agree upon first. Otherwise, we run the risk of focusing on feature X for release 1 without ensuring that our architecture supports feature Y for release 2. +1 to this. I think that lifeless' https://etherpad.openstack.org/p/tripleo-feature-map pad might be a good way to get moving in that direction. The point of disagreement here - which actually seems quite minor to me - is how far we want to go in defining heterogeneity. Are existing node attributes such as cpu and memory enough? Or do we need to go further? To take examples from this thread, some additional possibilities include: rack, network connectivity, etc. Presumably, such attributes will be user defined and managed within TripleO itself. I took the point of disagreement more about the allowance of manual control. Should a user be able to override the list of what gets provisioned, where? And I don't think you always want heterogeneity. For example, if we treat 'rack' as one of those attributes, a system administrator might specifically want things to NOT share a rack, e.g. for redundancy. That said, I suspect that many of us (myself included) have never designed a data center, so I worry that some of our examples might be a bit contrived. Not necessarily just for this conversation, but I think it'd be handy to have real-world stories here. I'm sure no two are identical, but it'd help make sure we're focused on real-world scenarios. If that understanding is correct, it seems to me that the requirements are broadly in agreement, and that TripleO defined node attributes is a feature that can easily be slotted into this sort of architecture. Whether it needs to come first. . . should be a different discussion (my gut feel is that it shouldn't come first, as it depends on everything else working, but maybe I'm wrong). So to me, that question -- what should come first? -- is exactly what started this discussion. It didn't start out as a question of whether we should allow users to override the schedule, but as a question of where we should start building. Should we start off just letting Nova scheduler do all the hard work for us and let overrides maybe come in later? Or should we we start off requiring that everything is manual and later transition to using Nova? (I don't have a strong opinion either way, but I hope we land one way or the other soon.) -- Matt Wagner Software Engineer, Red Hat signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Summit session wrapup
On 01/12/13 00:27 -0500, Tzu-Mainn Chen wrote: I think we may all be approaching the planning of this project in the wrong way, because of confusions such as: Well, I think there is one small misunderstanding. I've never said that manual way should be primary workflow for us. I agree that we should lean toward as much automation and smartness as possible. But in the same time, I am adding that we need manual fallback for user to change that smart decision. Primary way would be to let TripleO decide, where the stuff go. I think we agree here. That's a pretty fundamental requirement that both sides seem to agree upon - but that agreement got lost in the discussions of what feature should come in which release, etc. That seems backwards to me. I think it's far more important that we list out requirements and create a design document that people agree upon first. Otherwise, we run the risk of focusing on feature X for release 1 without ensuring that our architecture supports feature Y for release 2. To make this example more specific: it seems clear that everyone agrees that the current Tuskar design (where nodes must be assigned to racks, which are then used as the primary means of manipulation) is not quite correct. Instead, we'd like to introduce a philosophy where we assume that users don't want to deal with homogeneous nodes individually, instead letting TripleO make decisions for them. I agree; getting buy-in on a design document up front is going to save us future anguish Regarding this - I think we may want to clarify what the purpose of our releases are at the moment. Personally, I don't think our current planning is about several individual product releases that we expect to be production-ready and usable by the world; I think it's about milestone releases which build towards a more complete product. From that perspective, if I were a prospective user, I would be less concerned with each release containing exactly what I need. Instead, what I would want most out of the project is: a) frequent stable releases (so I can be comfortable with the pace of development and the quality of code) b) design documentation and wireframes (so I can be comfortable that the architecture will support features I need) c) a roadmap (so I have an idea when my requirements will be met) +1 -- Jordan O'Mara jomara at redhat.com Red Hat Engineering, Raleigh pgp7rupTEuBS0.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Summit session wrapup
I think we may all be approaching the planning of this project in the wrong way, because of confusions such as: Well, I think there is one small misunderstanding. I've never said that manual way should be primary workflow for us. I agree that we should lean toward as much automation and smartness as possible. But in the same time, I am adding that we need manual fallback for user to change that smart decision. Primary way would be to let TripleO decide, where the stuff go. I think we agree here. That's a pretty fundamental requirement that both sides seem to agree upon - but that agreement got lost in the discussions of what feature should come in which release, etc. That seems backwards to me. I think it's far more important that we list out requirements and create a design document that people agree upon first. Otherwise, we run the risk of focusing on feature X for release 1 without ensuring that our architecture supports feature Y for release 2. To make this example more specific: it seems clear that everyone agrees that the current Tuskar design (where nodes must be assigned to racks, which are then used as the primary means of manipulation) is not quite correct. Instead, we'd like to introduce a philosophy where we assume that users don't want to deal with homogeneous nodes individually, instead letting TripleO make decisions for them. When we have a bunch of heterogeneous nodes, we want to be able to break them up into several homogeneous groups, and assign different capabilities to each. But again, within each individual homogeneous group, we don't want users dealing with each individual nodes; instead, we want TripleO to take care of business. The point of disagreement here - which actually seems quite minor to me - is how far we want to go in defining heterogeneity. Are existing node attributes such as cpu and memory enough? Or do we need to go further? To take examples from this thread, some additional possibilities include: rack, network connectivity, etc. Presumably, such attributes will be user defined and managed within TripleO itself. If that understanding is correct, it seems to me that the requirements are broadly in agreement, and that TripleO defined node attributes is a feature that can easily be slotted into this sort of architecture. Whether it needs to come first. . . should be a different discussion (my gut feel is that it shouldn't come first, as it depends on everything else working, but maybe I'm wrong). In any case, if we can a) detail requirements without talking about releases and b) create a design architecture, I think that it'll be far easier to come up with a set of milestones that make developmental sense. Folk that want to manually install openstack on a couple of machines can already do so : we don't change the game for them by replacing a manual system with a manual system. My vision is that we should deliver something significantly better! We should! And we can. But I think we shouldn't deliver something, what will discourage people from using TripleO. Especially at the beginning - see user, we are doing first steps here, the distribution is not perfect and what you wanted, but you can do the change you need. You don't have to go away and come back in 6 months when we try to be smarter and address your case. Regarding this - I think we may want to clarify what the purpose of our releases are at the moment. Personally, I don't think our current planning is about several individual product releases that we expect to be production-ready and usable by the world; I think it's about milestone releases which build towards a more complete product. From that perspective, if I were a prospective user, I would be less concerned with each release containing exactly what I need. Instead, what I would want most out of the project is: a) frequent stable releases (so I can be comfortable with the pace of development and the quality of code) b) design documentation and wireframes (so I can be comfortable that the architecture will support features I need) c) a roadmap (so I have an idea when my requirements will be met) Mainn ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Summit session wrapup
On 2013/28/11 06:41, Robert Collins wrote: Certainly. Do we have Personas for those people? (And have we done any validation of them?) We have shorter paragraph to each. But not verified by any survey, so we don't have very solid basis in this area right now and I believe we all are trying to assume at the moment. This may be where we disagree indeed :). Wearing my sysadmin hat ( a little dusty, but never really goes away :P) - I can tell you I spent a lot of time worrying about what went on what machine. But it was never actually what I was paid to do. What I was paid to do was to deliver infrastructure and services to the business. Everything that we could automate, that we could describe with policy and still get robust, reliable results - we did. It's how one runs many hundred machines with an ops team of 2. Planning around failure domains for example, is tedious work; it's needed at a purchasing level - you need to decide if you're buying three datacentres or one datacentre with internal redundancy, but once thats decided the actual mechanics of ensure that each HA service is spread across the (three datacentres) or (three separate zones in the one DC) is not interesting. So - I'm sure that many sysadmins do manually assign work to machines to ensure a good result from performance or HA concerns, but thats out of necessity, not desire. Well, I think there is one small misunderstanding. I've never said that manual way should be primary workflow for us. I agree that we should lean toward as much automation and smartness as possible. But in the same time, I am adding that we need manual fallback for user to change that smart decision. Primary way would be to let TripleO decide, where the stuff go. I think we agree here. But I, as sysadmin, want to see the distribution of stuff before I deploy. And if there is some failure in the automation logic, I need to have possibility to change that. Not from scratch, but do the change in suggested distribution. There always should be way to do that manually. Let's imagine that TripleO will by some mistake or intentionally distribute nodes across my datacenter wrong (wrong for me, not necessarily for somebody else). What would I do? Would I let TripleO to deploy it anyway? No. I will not use TripleO. But If there is something what I need to change and I have a way to do that, I will keep with TripleO, because it allows me to satisfy all I need. We can be smart, but we can't be the smartest and see all reasons of all users. Why does that layout make you happy? What is it about that setup where things will work better for you? Note that in the absence of a sophisticated scheduler you'll have some volumes with redundancy of 3 end up all in one rack: you won't get rack-can-fail safety on the delivered cloud workloads (I mention this as one attempt to understand why knowing there is a control node / 3 storage /rest compute in each rack makes you happy). It doesn't have to make me happy, but somebody else might have strong reasoning for that (or any other setup which we didn't cover). We don't have to know it, but why can't we allow him to do this? One more time, I want to stress this out - I am not fighting for absence of sophisticated scheduler, I am fighting for allowing user to control the stuff if he wants/needs to. I think having that degree of control is failure. Our CloudOS team has considerable experience now in deploying clouds using a high-touch system like you describe - and they are utterly convinced that it doesn't scale. Even at 20 nodes it is super tedious, and beyond that it's ridiculous. Right. And are they convinced that automated tool will do the best job for them? Are they trusting them so strongly, so that they would deploy their whole datacenter without checking the correct distribution? Would they say - OK I said I want 50 compute, 10 block storage, 3 control. As long as it will work, I don't care, be smart, do it for me. It all depends on the GUI design. If we design it well enough, so that we allow user to do quick bulk actions, even manual distribution can be easy. Even for 100 nodes... or more. (But I don't suggest we do that all manual.) Flexibilty comes with a cost. Right now we have a large audience interested in what we have, but we're delivering two separate things: we have a functional sysadminny interface with command line scripts and heat templates - , and we have a GUI where we can offer a better interface which the tuskar folk are building up. I agree that homogeneous hardware isn't a viable long term constraint. But if we insist on fixing that issue first, we sacrifice our ability to learn about the usefulness of a simple, straight forward interface. We'll be doing a bunch of work - regardless of implementation - to deal with heterogeneity, when we could be bringing Swift and Cinder up to production readiness - which IMO will get many more folk onboard for adoption. I agree that
Re: [openstack-dev] [TripleO] Summit session wrapup
Hello, just few notes from me: https://etherpad.openstack.org/p/tripleo-feature-map sounds like a great idea, we should go through them one by one maybe on meeting. We should agree on what is doable for I, without violating the Openstack way in some very ugly way. So do we want to be Openstack on Openstack or Almost Openstack on Openstack? Or what is the goal here? So let's take a simple example, flat network 2 racks (32 nodes), 2 controllers nodes, 2 neutron nodes, 14 nova compute, 14 storage I. Manual way using Heat and Scheduler could be assigning every group of nodes to special flavor by hard. Then nova scheduler will take care of it. 1. How hard it will be to implement 'Assigning a specific nodes to Flavor' ? (probably adding a condition for MAC address?) Or do you have some other idea how to do this in an almost clean way? Without reimplementing nova scheduler. (though this is probably messing with scheduler) 2. How this will be implementable in UI? Just assigning nodes to flavors and uploading a Heat template? II. Having homogeneous hardware, all will be one flavor and then nova scheduler will decide, where to put what. When you give heat e.g. I want to spawn 2 controller images. 1. How hard is to set the policies, like we want to spread those nodes over all racks? 2. How this will be implementable in UI? It is basically building a complex Heat template, right? So just uploading a Heat template? III. Having more flavors 1. We will be able to set in Heat something like, I want Nova compute node on compute_flavor(amazon c1,c3) with high priority or on all_purpose_flavor(amazon m1) with normal_priority. How hard is that? 2. How this will be implementable in UI? Just uploading a Heat template? IV. Tripleo way 1. From the OOO name I infer, we want to use openstack, that means using Heat, Nova scheduler etc. From my point of view having Heat template for deploying e.g. Wordpress installation seems the same to me like having a Heat template to deploy Openstack, it's just much more complex. Is this a valid assumption? If you think it's not, explain why please. Radical idea : we could ask (e.g. on -operators) for a few potential users who'd be willing to let us interview them. Yes please!!! Talking to jcoufal, being able to edit Heat template in UI, being able to assign baremetals to flavors(later connected to template catalog). It could be all we need. Also later visualize what will happen when you actually stack create the template, so we don't go blindly would be very needed. Kind regards, Ladislav On 11/28/2013 06:41 AM, Robert Collins wrote: Hey, I realise I've done a sort of point-bypoint thing below - sorry. Let me say that I'm glad you're focused on what will help users, and their needs - I am too. Hopefully we can figure out why we have different opinions about what things are key, and/or how we can get data to better understand our potential users. On 28 November 2013 02:39, Jaromir Coufal jcou...@redhat.com wrote: Important point here is, that we agree on starting with very basics - grow then. Which is great. The whole deployment workflow (not just UI) is all about user experience which is built on top of TripleO's approach. Here I see two important factors: - There are users who are having some needs and expectations. Certainly. Do we have Personas for those people? (And have we done any validation of them?) - There is underlying concept of TripleO, which we are using for implementing features which are satisfying those needs. mmm, so the technical aspect of TripleO is about setting up a virtuous circle: where improvements in deploying cluster software via OpenStack makes deploying OpenStack better, and those of us working on deploying OpenStack will make deploying cluster software via OpenStack better in general, as part of solving 'deploying OpenStack' in a nice way. We are circling around and trying to approach the problem from wrong end - which is implementation point of view (how to avoid own scheduling). Let's try get out of the box and start with thinking about our audience first - what they expect, what they need. Then we go back, put our implementation thinking hat on and find out how we are going to re-use OpenStack components to achieve our goals. In the end we have detailed plan. Certainly, +1. === Users === I would like to start with our targeted audience first - without milestones, without implementation details. I think here is the main point where I disagree and which leads to different approaches. I don't think, that user of TripleO cares only about deploying infrastructure without any knowledge where the things go. This is overcloud user's approach - 'I want VM and I don't care where it runs'. Those are self-service users / cloud users. I know we are OpenStack on OpenStack, but we shouldn't go that far that we expect same behavior from undercloud users. I can tell you various examples of why the
Re: [openstack-dev] [TripleO] Summit session wrapup
Hi Mark, thanks for your insight, I mostly agree. Just few points below. On 2013/27/11 21:54, Mark McLoughlin wrote: Hi Jarda, ... Yes, I buy this. And I think it's the point worth dwelling on. It would be quite a bit of work to substantiate the point with hard data - e.g. doing user testing of mockups with and without placement control - so we have to at least try to build some consensus without that. I agree here. It will be a lot of work. I'd love to have that, but creating distinct designs, finding users for real testing and testing with them will consume big amount of time and in this agile approach we can't afford it. I believe that we are not very distinct in our goals and that we can get to consensus without that. There was smaller confusion which I tried to clarify in answer to Rob's response. We could do some work on a more detailed description of the persona and their basic goals. This would clear up whether we're designing for the case where one persona owns the undercloud and there's another overcloud operator persona. Yes, we need to have this written down. Or at least get to consensus if we can quickly get there and document it then. Whatever works and doesn't block us. We could also look at other tools targeted to similar use cases and see what they do. I looked and they all do it very manual way. (or at least those which I have seen from Mirantis, Huawei, etc) - and there is some reason for this. As I wrote into Robert's answer, we can do much more, we can be smart, but we can't think that we are the smartest. But yeah - my instinct is that all of that would show that we'd be fighting an uphill battle to persuade our users that this type of magic is what they want. That's exactly my point. Thanks for saying that. We want to help them and feed them with ready-to-deploy solution. But they need to have feeling that they have things under control (maybe just check the solution and/or allow to change). ... === Implementation === Above mentioned approach shouldn't lead to reimplementing scheduler. We can still use nova-scheduler, but we can take advantage of extra params (like unique identifier), so that we specify more concretely what goes where. It's hard to see how what you describe doesn't ultimately mean we completely by pass the Nova scheduler. Yes, if you request placement on a specific node, it does still go through the scheduler ... but it doesn't do any actual scheduling. Maybe we should separate the discussion/design around control nodes and resource (i.e. compute/storage) nodes. Mostly because there should be a large ratio of the latter to the former, so you'd expect it to be less likely for such fine grained control over resource nodes to be useful. e.g. maybe adding more compute nodes doesn't involve the user doing any placement, and we just let the nova scheduler choose from the available nodes which are suitable for compute workloads. Yes, controller nodes will need to get better treatment, but I think not in our first steps. I believe that for now we are fine with going with generic controller node which is running all controller services. I think what would be great to have is to let nova-scheduler to do its job (dry-run), show the distribution and just confirm (or do some change in there). -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Summit session wrapup
On 2013/27/11 16:37, James Slagle wrote: On Wed, Nov 27, 2013 at 8:39 AM, Jaromir Coufal jcou...@redhat.com wrote: V0: basic slick installer - flexibility and control first - enable user to auto-discover (or manual register) nodes - let user decide, which node is going to be controller, which is going to be compute or storage - associate images with these nodes - deploy I think you've made some good points about the user experience helping drive the design of what Tuskar is targeting. I think the conversation around how to design letting the user pick what to deploy where should continue. I wonder though, would it be possible to not have that in a V0? Basically make your V0 above even smaller (eliminating the middle 2 bullets), and just letting nova figure it out, the same as what happens now when we run heat stack-create from the CLI. I see 2 possible reasons for trying this: - Gets us to something people can try even sooner - It may turn out we want this option in the long run ... a figure it out all out for me type of approach, so it wouldn't be wasted effort. Hey James, well as long as we end up with possibility to have control over it in the Icehouse release , I am fine with that. (The 'control' I tried to explain closer in response to Robert's e-mail). As for the milestone approach: I just think that more basic and traditional way for user is to do stuff manually. And that's where I think we can start. That's user's point of view. From implementation point of view, there is already some magic in openstack, so it might be easier to start with that already existing magic, add manual support then and then enhance the magic to much smarter approach. In the end, most of the audience will see the result in Icehouse release, so if we start one way or another - whatever works. I just want to make sure, that we are going to deliver usable solution. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Summit session wrapup
Hi all, just a few thoughts (subjective opinions) regarding the whole debate: * I think that having a manually picking images for machines approach would make TripleO more usable in the beginning. I think it will take a good deal of time to get our smart solution working with the admin rather than against him [1], and a possibility of manual override is a good safety catch. E.g. one question that i wonder about - how would our smart flavor-based approach solve this situation: I have homogenous nodes on which i want to deploy Cinder and Swift. Half of those nodes has better connectivity to the internet than the other half. I want Swift on the ones with better internet connectivity. How will i ensure such deployment with flavor-based approach? Could we use e.g. host aggregates defined on the undercloud for this? I think it will take time before our smart solution can understand such and similar conditions. * On the other hand, i think relying on Nova to pick hosts feels more TripleO-spirited solution to me. It means using OpenStack to deploy OpenStack. So i can't really lean towards one solution or the other. Maybe it's most important to make *something*, gather some feedback, and tweak what needs tweaking. Cheers Jirka [1] http://i.technet.microsoft.com/dynimg/IC284957.jpg ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Summit session wrapup
On 2013/27/11 00:00, Robert Collins wrote: On 26 November 2013 07:41, Jaromir Coufal jcou...@redhat.com wrote: Hey Rob, can we add 'Slick Overcloud deployment through the UI' to the list? There was no session about that, but we discussed it afterwords and agreed that it is high priority for Icehouse as well. I just want to keep it on the list, so we are aware of that. Certainly. Please add a blueprint for that and I'll mark itup appropriately. I will do. Related to that we had a long chat in IRC that I was to follow up here, so - ... Tuskar is refocusing on getting the basics really right - slick basic install, and then work up. At the same time, just about every nova person I've spoken too (a /huge/ sample of three, but meh :)) has expressed horror that Tuskar is doing it's own scheduling, and confusion about the need to manage flavors in such detail. So the discussion on IRC was about getting back to basics - a clean core design and something that we aren't left with technical debt that we need to eliminate in order to move forward - which the scheduler stuff would be. So: my question/proposal was this: lets set a couple of MVPs. 0: slick install homogeneous nodes: - ask about nodes and register them with nova baremetal / Ironic (can use those APIs directly) - apply some very simple heuristics to turn that into a cloud: - 1 machine - all in one - 2 machines - separate hypervisor and the rest - 3 machines - two hypervisors and the rest - 4 machines - two hypervisors, HA the rest - 5 + scale out hypervisors - so total forms needed = 1 gather hw details - internals: heat template with one machine flavor used 1: add support for heterogeneous nodes: - for each service (storage compute etc) supply a list of flavors we're willing to have that run on - pass that into the heat template - teach heat to deal with flavor specific resource exhaustion by asking for a different flavor (or perhaps have nova accept multiple flavors and 'choose one that works'): details to be discussed with heat // nova at the right time. 2: add support for anti-affinity for HA setups: - here we get into the question about short term deliverables vs long term desire, but at least we'll have a polished installer already. -Rob Important point here is, that we agree on starting with very basics - grow then. Which is great. The whole deployment workflow (not just UI) is all about user experience which is built on top of TripleO's approach. Here I see two important factors: - There are *users* who are having some *needs and expectations*. - There is underlying *concept of TripleO*, which we are using for *implementing* features which are satisfying those needs. We are circling around and trying to approach the problem from wrong end - which is implementation point of view (how to avoid own scheduling). Let's try get out of the box and start with thinking about our audience first - what they expect, what they need. Then we go back, put our implementation thinking hat on and find out how we are going to re-use OpenStack components to achieve our goals. In the end we have detailed plan. === Users === I would like to start with our targeted audience first - without milestones, without implementation details. I think here is the main point where I disagree and which leads to different approaches. I don't think, that user of TripleO cares *only* about deploying infrastructure without any knowledge where the things go. This is overcloud user's approach - 'I want VM and I don't care where it runs'. Those are self-service users / cloud users. I know we are OpenStack on OpenStack, but we shouldn't go that far that we expect same behavior from undercloud users. I can tell you various examples of why the operator will care about where the image goes and what runs on specific node. /One quick example:/ I have three racks of homogenous hardware and I want to design it the way so that I have one control node in each, 3 storage nodes and the rest compute. With that smart deployment, I'll never know what my rack contains in the end. But if I have control over stuff, I can say that this node is controller, those three are storage and those are compute - I am happy from the very beginning. Our targeted audience are sysadmins, operators. They hate 'magics'. They want to have control over things which they are doing. If we put in front of them workflow, where they click one button and they get cloud installed, they will get horrified. That's why I am very sure and convinced that we need to have ability for user to have control over stuff. What node is having what role. We can be smart, suggest and advice. But not hiding this functionality from user. Otherwise, I am afraid that we can fail. Furthermore, if we put lots of restrictions (like homogenous hardware) in front of users from the very beginning, we are discouraging people from using TripleO-UI. We are
Re: [openstack-dev] [TripleO] Summit session wrapup
On Wed, Nov 27, 2013 at 8:39 AM, Jaromir Coufal jcou...@redhat.com wrote: V0: basic slick installer - flexibility and control first - enable user to auto-discover (or manual register) nodes - let user decide, which node is going to be controller, which is going to be compute or storage - associate images with these nodes - deploy I think you've made some good points about the user experience helping drive the design of what Tuskar is targeting. I think the conversation around how to design letting the user pick what to deploy where should continue. I wonder though, would it be possible to not have that in a V0? Basically make your V0 above even smaller (eliminating the middle 2 bullets), and just letting nova figure it out, the same as what happens now when we run heat stack-create from the CLI. I see 2 possible reasons for trying this: - Gets us to something people can try even sooner - It may turn out we want this option in the long run ... a figure it out all out for me type of approach, so it wouldn't be wasted effort. -- -- James Slagle -- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Summit session wrapup
Hi Jarda, On Wed, 2013-11-27 at 14:39 +0100, Jaromir Coufal wrote: I think here is the main point where I disagree and which leads to different approaches. I don't think, that user of TripleO cares *only* about deploying infrastructure without any knowledge where the things go. This is overcloud user's approach - 'I want VM and I don't care where it runs'. Those are self-service users / cloud users. I know we are OpenStack on OpenStack, but we shouldn't go that far that we expect same behavior from undercloud users. Nice, I think you're getting really close to identifying the conflicting assumptions/viewpoints here. What OpenStack - and cloud, in general - does is provide a nice self-service abstraction between the owners of the underlying resources and the end-user. We take away an awful lot of placement control away from the self-service in order to allow the operator to provide a usable, large-scale, multi-tenant service. The difference with TripleO is that we assume the undercloud operator and the undercloud user are one and the same. At least, that's what I assume we're designing for. I don't think we're designing for a situation where there is an undercloud operator serving the needs of multiple overcloud operators and it's important for the undercloud operator to have ultimate control over placement. That's hardly the end of the story here, but it is one useful distinction that could justify why this case might be different from the usual application-deployment-on-IaaS case. I can tell you various examples of why the operator will care about where the image goes and what runs on specific node. /One quick example:/ I have three racks of homogenous hardware and I want to design it the way so that I have one control node in each, 3 storage nodes and the rest compute. With that smart deployment, I'll never know what my rack contains in the end. But if I have control over stuff, I can say that this node is controller, those three are storage and those are compute - I am happy from the very beginning. It is valid to ask why this knowledge is important to the user in this case and why it makes them happy. Challenging such assumptions can lead to design breakthroughs, I'm sure you agree. e.g. before AWS came along, you could imagine someone trying to shoot down the entire premise of IaaS with similar arguments. Or the whole they'd have asked for a faster horse thing. Our targeted audience are sysadmins, operators. They hate 'magics'. They want to have control over things which they are doing. If we put in front of them workflow, where they click one button and they get cloud installed, they will get horrified. That's why I am very sure and convinced that we need to have ability for user to have control over stuff. What node is having what role. We can be smart, suggest and advice. But not hiding this functionality from user. Otherwise, I am afraid that we can fail. Furthermore, if we put lots of restrictions (like homogenous hardware) in front of users from the very beginning, we are discouraging people from using TripleO-UI. We are young project and trying to hit as broad audience as possible. If we do flexible enough approach to get large audience interested, solve their problems, we will get more feedback, we will get early adopters, we will get more contributors, etc. First, let's help cloud operator, who is having some nodes and wants to deploy OpenStack on them. He wants to have control which node is controller, which node is compute or storage. Then we can get smarter and guide. Yes, I buy this. And I think it's the point worth dwelling on. It would be quite a bit of work to substantiate the point with hard data - e.g. doing user testing of mockups with and without placement control - so we have to at least try to build some consensus without that. We could do some work on a more detailed description of the persona and their basic goals. This would clear up whether we're designing for the case where one persona owns the undercloud and there's another overcloud operator persona. We could also look at other tools targeted to similar use cases and see what they do. But yeah - my instinct is that all of that would show that we'd be fighting an uphill battle to persuade our users that this type of magic is what they want. ... === Implementation === Above mentioned approach shouldn't lead to reimplementing scheduler. We can still use nova-scheduler, but we can take advantage of extra params (like unique identifier), so that we specify more concretely what goes where. It's hard to see how what you describe doesn't ultimately mean we completely by pass the Nova scheduler. Yes, if you request placement on a specific node, it does still go through the scheduler ... but it doesn't do any actual scheduling. Maybe we should separate the discussion/design around control nodes and resource (i.e. compute/storage)
Re: [openstack-dev] [TripleO] Summit session wrapup
On 26 November 2013 07:41, Jaromir Coufal jcou...@redhat.com wrote: Hey Rob, can we add 'Slick Overcloud deployment through the UI' to the list? There was no session about that, but we discussed it afterwords and agreed that it is high priority for Icehouse as well. I just want to keep it on the list, so we are aware of that. Certainly. Please add a blueprint for that and I'll mark itup appropriately. Related to that we had a long chat in IRC that I was to follow up here, so - ... Tuskar is refocusing on getting the basics really right - slick basic install, and then work up. At the same time, just about every nova person I've spoken too (a /huge/ sample of three, but meh :)) has expressed horror that Tuskar is doing it's own scheduling, and confusion about the need to manage flavors in such detail. So the discussion on IRC was about getting back to basics - a clean core design and something that we aren't left with technical debt that we need to eliminate in order to move forward - which the scheduler stuff would be. So: my question/proposal was this: lets set a couple of MVPs. 0: slick install homogeneous nodes: - ask about nodes and register them with nova baremetal / Ironic (can use those APIs directly) - apply some very simple heuristics to turn that into a cloud: - 1 machine - all in one - 2 machines - separate hypervisor and the rest - 3 machines - two hypervisors and the rest - 4 machines - two hypervisors, HA the rest - 5 + scale out hypervisors - so total forms needed = 1 gather hw details - internals: heat template with one machine flavor used 1: add support for heterogeneous nodes: - for each service (storage compute etc) supply a list of flavors we're willing to have that run on - pass that into the heat template - teach heat to deal with flavor specific resource exhaustion by asking for a different flavor (or perhaps have nova accept multiple flavors and 'choose one that works'): details to be discussed with heat // nova at the right time. 2: add support for anti-affinity for HA setups: - here we get into the question about short term deliverables vs long term desire, but at least we'll have a polished installer already. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Summit session wrapup
Hey Rob, can we add 'Slick Overcloud deployment through the UI' to the list? There was no session about that, but we discussed it afterwords and agreed that it is high priority for Icehouse as well. I just want to keep it on the list, so we are aware of that. Thanks -- Jarda On 2013/25/11 02:17, Robert Collins wrote: I've now gone through and done the post summit cleanup of blueprints and migration of design docs into blueprints as appropriate. We had 50 odd blueprints, many of where were really not effective blueprints - they described single work items with little coordination need, were not changelog items, etc. I've marked those obsolete. Blueprints are not a discussion forum - they are a place that [some] discussions can be captured, but anything initially filed there will take some time before folk notice it - and the lack of a discussion mechanism makes it very hard to reach consensus there. Could TripleO interested folk please raise things here, on the dev list initially, and we'll move it to lower latency // higher bandwidth environments as needed? From the summit we had the following outcomes https://etherpad.openstack.org/p/icehouse-deployment-hardware-autodiscovery - needs to be done in ironic https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-modelling-infrastructure-sla-services - needs more discussion to tease concerns out - in particular I want us to get to a problem statement that Nova core folk understand :) https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-ha-production-configuration - this is ready for folk to act on at any point https://blueprints.launchpad.net/tripleo/+spec/tripleo-tuskar-deployment-scaling-topologies - this is ready for folk to act on - but it's fairly shallow, since most of the answer was 'discuss with heat' :) https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-scaling-design - this is ready for folk to act on; the main thing was gathering a bunch of data so we can make good decisions from here on out The stable branches decision has been documented in the wiki - all done. Cheers, Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO] Summit session wrapup
I've now gone through and done the post summit cleanup of blueprints and migration of design docs into blueprints as appropriate. We had 50 odd blueprints, many of where were really not effective blueprints - they described single work items with little coordination need, were not changelog items, etc. I've marked those obsolete. Blueprints are not a discussion forum - they are a place that [some] discussions can be captured, but anything initially filed there will take some time before folk notice it - and the lack of a discussion mechanism makes it very hard to reach consensus there. Could TripleO interested folk please raise things here, on the dev list initially, and we'll move it to lower latency // higher bandwidth environments as needed? From the summit we had the following outcomes https://etherpad.openstack.org/p/icehouse-deployment-hardware-autodiscovery - needs to be done in ironic https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-modelling-infrastructure-sla-services - needs more discussion to tease concerns out - in particular I want us to get to a problem statement that Nova core folk understand :) https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-ha-production-configuration - this is ready for folk to act on at any point https://blueprints.launchpad.net/tripleo/+spec/tripleo-tuskar-deployment-scaling-topologies - this is ready for folk to act on - but it's fairly shallow, since most of the answer was 'discuss with heat' :) https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-scaling-design - this is ready for folk to act on; the main thing was gathering a bunch of data so we can make good decisions from here on out The stable branches decision has been documented in the wiki - all done. Cheers, Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev