[openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Hi, In parallel to Jarda's updated wireframes, and based on various discussions over the past weeks, here are the updated Tuskar requirements for Icehouse: https://wiki.openstack.org/wiki/TripleO/TuskarIcehouseRequirements Any feedback is appreciated. Thanks! Tzu-Mainn Chen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2014/05/02 15:27, Tzu-Mainn Chen wrote: Hi, In parallel to Jarda's updated wireframes, and based on various discussions over the past weeks, here are the updated Tuskar requirements for Icehouse: https://wiki.openstack.org/wiki/TripleO/TuskarIcehouseRequirements Any feedback is appreciated. Thanks! Tzu-Mainn Chen +1 looks good to me! -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 11/12/13 21:42, Robert Collins wrote: On 12 December 2013 01:17, Jaromir Coufal jcou...@redhat.com wrote: On 2013/10/12 23:09, Robert Collins wrote: [snip] Thats speculation. We don't know if they will or will not because we haven't given them a working system to test. Some part of that is speculation, some part of that is feedback from people who are doing deployments (of course its just very limited audience). Anyway, it is not just pure theory. Sure. Let be me more precise. There is a hypothesis that lack of direct control will be a significant adoption blocker for a primary group of users. I'm sorry for butting in, but I think I can see where your disagreement comes from and maybe explaining it will help resolving it. It's not a hypothesis, but a well documented and researched fact, that transparency has a huge impact on the ease of use of any information artifact. In particular, the easier you can see what is actually happening and how your actions affect the outcome, the faster you can learn to use it and the more efficient you are in using it and resolving any problems with it. It's no surprise that closeness of mapping and hidden dependencies are two important congnitive dimensions that are often measured when assesing the usability of an artifact. Humans simply find it nice when they can tell what is happening, even if theoretically they don't need that knowledge when everything works correctly. This doesn't come from any direct requirements of Tuskar itself, and I am sure that all the workarounds that Robert gave will work somehow in every real-world problem that arises. But the whole will not necessarily be easy or pleasant to learn and use. I am aware, that the requirment to be able to see what is happening is a fundamental problem, because it destroys one of the most important rules in system engineering -- separation of concerns. The parts in the upper layers should simply not care how the parts in the lower layers do their jobs, as long as they work properly. I know that it is a kind of a tradition in Open Source software to create software with the assumption, that it's enough for it to do its job, and if every use case can be somehow done, directly or indirectly, then it's good enough. We have a lot of working tools designed with this principle in mind, such as CSS, autotools or our favorite git. They do their job, and they do it well (except when they break horribly). But I think we can put a little bit more effort into also ensuring that the common use cases are not just doable, but also easy to implement and maintain. And that means that we will sometimes have a requirement that comes from how people think, and not from any particular technical need. I know that it sounds like speculation, or theory, but I think we need to tust in Jarda's experience with usability and his judgement about what works better -- unless of course we are willing to learn all that ourselves, which may take quite some time. What is the point of having an expert, if we know better, after all? -- Radomir Dopieralski ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 13/12/13 09:41 -0500, Jay Dobies wrote: * ability to 'preview' changes going to the scheduler What does this give you? How detailed a preview do you need? What information is critical there? Have you seen the proposed designs for a heat template preview feature - would that be sufficient? Will will probably have a better answer to this, but I feel like at very least this goes back to the psychology point raised earlier (I think in this thread, but if not, definitely one of the TripleO ones). A weird parallel is whenever I do a new install of Fedora. I never accept their default disk partitioning without electing to review/modify it. Even if I didn't expect to change anything, I want to see what they are going to give me. And then I compulsively review the summary of what actual changes will be applied in the follow up screen that's displayed after I say I'm happy with the layout. Perhaps that's more a commentary on my own OCD and cynicism that I feel dirty accepting the magic defaults blindly. I love the idea of anaconda doing the heavy lifting of figuring out sane defaults for home/root/swap and so on (similarly, I love the idea of Nova scheduler rationing out where instances are deployed), but I at least want to know I've seen it before it happens. I fully admit to not knowing how common that sort of thing is. I suspect I'm in the majority of geeks and tame by sys admin standards, but I honestly don't know. So I acknowledge that my entire argument for the preview here is based on my own personality. Jay, I mirror your sentiments exactly here, the Fedora example is a good one and is moreso the case when it comes to node allocation/details and proposed changes in a deployment scenario. Though 9/10 times the defaults Nova scheduler will choose will be fine but there's a 'human' need to review them, changing as necessary. -will pgpt6jWvlbElR.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 13/12/13 19:06 +1300, Robert Collins wrote: On 13 December 2013 06:24, Will Foster wfos...@redhat.com wrote: I just wanted to add a few thoughts: Thank you! For some comparative information here from the field I work extensively on deployments of large OpenStack implementations, most recently with a ~220node/9rack deployment (scaling up to 42racks / 1024 nodes soon). My primary role is of a Devops/Sysadmin nature, and not a specific development area so rapid provisioning/tooling/automation is an area I almost exclusively work within (mostly using API-driven using Foreman/Puppet). The infrastructure our small team designs/builds supports our development and business. I am the target user base you'd probably want to cater to. Absolutely! I can tell you the philosophy and mechanics of Tuskar/OOO are great, something I'd love to start using extensively but there are some needed aspects in the areas of control that I feel should be added (though arguably less for me and more for my ilk who are looking to expand their OpenStack footprint). * ability to 'preview' changes going to the scheduler What does this give you? How detailed a preview do you need? What information is critical there? Have you seen the proposed designs for a heat template preview feature - would that be sufficient? Thanks for the reply. Preview-wise it'd be useful to see node allocation prior to deployment - nothing too in-depth. I have not seen the heat template preview features, are you referring to the YAML templating[1] or something else[2]? I'd like to learn more. [1] - http://docs.openstack.org/developer/heat/template_guide/hot_guide.html [2] - https://github.com/openstack/heat-templates * ability to override/change some aspects within node assignment What would this be used to do? How often do those situations turn up? Whats the impact if you can't do that? One scenario might be that autodiscovery does not pick up an available node in your pool of resources, or detects incorrectly - you could manually change things as you like it. Another (more common) scenario is that you don't have an isolated, flat network with which to deploy and nodes are picked that you do not want included in the provisioning - you could remove those from the set of resources prior to launching overcloud creation. The impact would be that the tooling would seem inflexible to those lacking a thoughtfully prepared network/infrastructure, or more commonly in cases where the existing network design is too inflexible the usefulness and quick/seamless provisioning benefits would fall short. * ability to view at least minimal logging from within Tuskar UI Logging of what - the deployment engine? The heat event-log? Nova undercloud logs? Logs from the deployed instances? If it's not there in V1, but you can get, or already have credentials for the [instances that hold the logs that you wanted] would that be a big adoption blocker, or just a nuisance? Logging of the deployment engine status during the bootstrapping process initially, and some rudimentary node success/failure indication. It should be simplistic enough to not rival existing monitoring/log systems but at least provide deployment logs as the overcloud is being built and a general node/health 'check-in' that it's complete. Afterwards as you mentioned the logs are available on the deployed systems. Think of it as providing some basic written navigational signs for people crossing a small bridge before they get to the highway, there's continuity from start - finish and a clear sense of what's occurring. From my perspective, absence of this type of verbosity may impede adoption of new users (who are used to this type of information with deployment tooling). Here's the main reason - most new adopters of OpenStack/IaaS are going to be running legacy/mixed hardware and while they might have an initiative to explore and invest and even a decent budget most of them are not going to have completely identical hardware, isolated/flat networks and things set aside in such a way that blind auto-discovery/deployment will just work all the time. Thats great information (and something I reasonably well expected, to a degree). We have a hard dependency on no wildcard DHCP servers in the environment (or we can't deploy). Autodiscovery is something we don't have yet, but certainly debugging deployment failures is a very important use case and one we need to improve both at the plumbing layer and in the stories around it in the UI. There will be a need to sometimes adjust, and those coming from a more vertically-scaling infrastructure (most large orgs.) will not have 100% matching standards in place of vendor, machine spec and network design which may make Tuscar/OOO seem inflexible and 'one-way'. This may just be a carry-over or fear of the old ways of deployment but nonetheless it is present. I'm not sure what you mean by matching standards here :). Ironic is
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
* ability to 'preview' changes going to the scheduler What does this give you? How detailed a preview do you need? What information is critical there? Have you seen the proposed designs for a heat template preview feature - would that be sufficient? Will will probably have a better answer to this, but I feel like at very least this goes back to the psychology point raised earlier (I think in this thread, but if not, definitely one of the TripleO ones). A weird parallel is whenever I do a new install of Fedora. I never accept their default disk partitioning without electing to review/modify it. Even if I didn't expect to change anything, I want to see what they are going to give me. And then I compulsively review the summary of what actual changes will be applied in the follow up screen that's displayed after I say I'm happy with the layout. Perhaps that's more a commentary on my own OCD and cynicism that I feel dirty accepting the magic defaults blindly. I love the idea of anaconda doing the heavy lifting of figuring out sane defaults for home/root/swap and so on (similarly, I love the idea of Nova scheduler rationing out where instances are deployed), but I at least want to know I've seen it before it happens. I fully admit to not knowing how common that sort of thing is. I suspect I'm in the majority of geeks and tame by sys admin standards, but I honestly don't know. So I acknowledge that my entire argument for the preview here is based on my own personality. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Mon Dec 9 15:22:04 2013, Robert Collins wrote: On 9 December 2013 23:56, Jaromir Coufal jcou...@redhat.com wrote: Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory stats For registration it is just Management MAC address which is needed right? Or does Ironic need also IP? I think that MAC address might be enough, we can display IP in details of node later on. Ironic needs all the details I listed today. Management MAC is not currently used at all, but would be needed in future when we tackle IPMI IP managed by Neutron. I think what happened here is that two separate things we need got conflated. We need the IP address of the management (IPMI) interface, for power control, etc. We also need the MAC of the host system (*not* its IPMI/management interface) for PXE to serve it the appropriate content. -- Matt Wagner Software Engineer, Red Hat signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Dec 10, 2013, at 5:09 PM, Robert Collins wrote: On 11 December 2013 05:42, Jaromir Coufal jcou...@redhat.com wrote: On 2013/09/12 23:38, Tzu-Mainn Chen wrote: The disagreement comes from whether we need manual node assignment or not. I would argue that we need to step back and take a look at the real use case: heterogeneous nodes. If there are literally no characteristics that differentiate nodes A and B, then why do we care which gets used for what? Why do we need to manually assign one? Ideally, we don't. But with this approach we would take out the possibility to change something or decide something from the user. So, I think this is where the confusion is. Using the nova scheduler doesn't prevent change or control. It just ensures the change and control happen in the right place: the Nova scheduler has had years of work, of features and facilities being added to support HPC, HA and other such use cases. It should have everything we need [1], without going down to manual placement. For clarity: manual placement is when any of the user, Tuskar, or Heat query Ironic, select a node, and then use a scheduler hint to bypass the scheduler. The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Thats speculation. We don't know if they will or will not because we haven't given them a working system to test. Lets break the concern into two halves: A) Users who could have their needs met, but won't use TripleO because meeting their needs in this way is too hard/complex/painful. B) Users who have a need we cannot meet with the current approach. For category B users, their needs might be specific HA things - like the oft discussed failure domains angle, where we need to split up HA clusters across power bars, aircon, switches etc. Clearly long term we want to support them, and the undercloud Nova scheduler is entirely capable of being informed about this, and we can evolve to a holistic statement over time. Lets get a concrete list of the cases we can think of today that won't be well supported initially, and we can figure out where to do the work to support them properly. For category A users, I think that we should get concrete examples, and evolve our design (architecture and UX) to make meeting those needs pleasant. What we shouldn't do is plan complex work without concrete examples that people actually need. Jay's example of some shiny new compute servers with special parts that need to be carved out was a great one - we can put that in category A, and figure out if it's easy enough, or obvious enough - and think about whether we document it or make it a guided workflow or $whatever. Somebody might argue - why do we care? If user doesn't like TripleO paradigm, he shouldn't use the UI and should use another tool. But the UI is not only about TripleO. Yes, it is underlying concept, but we are working on future *official* OpenStack deployment tool. We should care to enable people to deploy OpenStack - large/small scale, homo/heterogeneous hardware, typical or a bit more specific use-cases. The difficulty I'm having is that the discussion seems to assume that 'heterogeneous implies manual', but I don't agree that that implication is necessary! As an underlying paradigm of how to install cloud - awesome idea, awesome concept, it works. But user doesn't care about how it is being deployed for him. He cares about getting what he wants/needs. And we shouldn't go that far that we violently force him to treat his infrastructure as cloud. I believe that possibility to change/control - if needed - is very important and we should care. I propose that we make concrete use cases: 'Fred cannot use TripleO without manual assignment because XYZ'. Then we can assess how important XYZ is to our early adopters and go from there. And what is key for us is to *enable* users - not to prevent them from using our deployment tool, because it doesn't work for their requirements. Totally agreed :) If we can agree on that, then I think it would be sufficient to say that we want a mechanism to allow UI users to deal with heterogeneous nodes, and that mechanism must use nova-scheduler. In my mind, that's what resource classes and node profiles are intended for. Not arguing on this point. Though that mechanism should support also cases, where user specifies a role for a node / removes node from a role. The rest of nodes which I don't care about should be handled by nova-scheduler. Why! What is a use case for removing a
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Dec 11, 2013, at 3:42 PM, Robert Collins wrote: On 12 December 2013 01:17, Jaromir Coufal jcou...@redhat.com wrote: On 2013/10/12 23:09, Robert Collins wrote: The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Thats speculation. We don't know if they will or will not because we haven't given them a working system to test. Some part of that is speculation, some part of that is feedback from people who are doing deployments (of course its just very limited audience). Anyway, it is not just pure theory. Sure. Let be me more precise. There is a hypothesis that lack of direct control will be a significant adoption blocker for a primary group of users. I think it's safe to say that some users in the group 'sysadmins having to deploy an OpenStack cloud' will find it a bridge too far and not use a system without direct control. Call this group A. I think it's also safe to say that some users will not care in the slightest, because their deployment is too small for them to be particularly worried (e.g. about occasional downtime (but they would worry a lot about data loss)). Call this group B. I suspect we don't need to consider group C - folk who won't use a system if it *has* manual control, but thats only a suspicion. It may be that the side effect of adding direct control is to reduce usability below the threshold some folk need... To assess 'significant adoption blocker' we basically need to find the % of users who will care sufficiently that they don't use TripleO. How can we do that? We can do questionnaires, and get such folk to come talk with use, but that suffers from selection bias - group B can use the system with or without direct manual control, so have little motivation to argue vigorously in any particular direction. Group A however have to argue because they won't use the system at all without that feature, and they may want to use the system for other reasons, so that because a crucial aspect for them. A much better way IMO is to test it - to get a bunch of volunteers and see who responds positively to a demo *without* direct manual control. To do that we need a demoable thing, which might just be mockups that show a set of workflows (and include things like Jay's shiny-new-hardware use case in the demo). I rather suspect we're building that anyway as part of doing UX work, so maybe what we do is put a tweet or blog post up asking for sysadmins who a) have not yet deployed openstack, b) want to, and c) are willing to spend 20-30 minutes with us, walk them through a demo showing no manual control, and record what questions they ask, and whether they would like to have that product to us, and if not, then (a) what use cases they can't address with the mockups and (b) what other reasons they have for not using it. This is a bunch of work though! So, do we need to do that work? *If* we can layer manual control on later, then we could defer this testing until we are at the point where we can say 'the nova scheduled version is ready, now lets decide if we add the manual control'. OTOH, if we *cannot* layer manual control on later - if it has tentacles through too much of the code base, then we need to decide earlier, because it will be significantly harder to add later and that may be too late of a ship date for vendors shipping on top of TripleO. So with that as a prelude, my technical sense is that we can layer manual scheduling on later: we provide an advanced screen, show the list of N instances we're going to ask for and allow each instance to be directly customised with a node id selected from either the current node it's running on or an available node. It's significant work both UI and plumbing, but it's not going to be made harder by the other work we're doing AFAICT. - My proposal is that we shelve this discussion until we have the nova/heat scheduled version in 'and now we polish' mode, and then pick it back up and assess user needs. An alternative argument is to say that group A is a majority of the userbase and that doing an automatic version is entirely unnecessary. Thats also possible, but I'm extremely skeptical, given the huge cost of staff time, and the complete lack of interest my sysadmin friends (and my former sysadmin self) have in doing automatable things by hand. Lets break the concern into two halves: A) Users who could have their needs met, but won't use TripleO because meeting their needs in this way is too hard/complex/painful. B) Users who have a
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 12/12/13 09:42 +1300, Robert Collins wrote: On 12 December 2013 01:17, Jaromir Coufal jcou...@redhat.com wrote: On 2013/10/12 23:09, Robert Collins wrote: The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Thats speculation. We don't know if they will or will not because we haven't given them a working system to test. Some part of that is speculation, some part of that is feedback from people who are doing deployments (of course its just very limited audience). Anyway, it is not just pure theory. Sure. Let be me more precise. There is a hypothesis that lack of direct control will be a significant adoption blocker for a primary group of users. I think it's safe to say that some users in the group 'sysadmins having to deploy an OpenStack cloud' will find it a bridge too far and not use a system without direct control. Call this group A. I think it's also safe to say that some users will not care in the slightest, because their deployment is too small for them to be particularly worried (e.g. about occasional downtime (but they would worry a lot about data loss)). Call this group B. I suspect we don't need to consider group C - folk who won't use a system if it *has* manual control, but thats only a suspicion. It may be that the side effect of adding direct control is to reduce usability below the threshold some folk need... To assess 'significant adoption blocker' we basically need to find the % of users who will care sufficiently that they don't use TripleO. How can we do that? We can do questionnaires, and get such folk to come talk with use, but that suffers from selection bias - group B can use the system with or without direct manual control, so have little motivation to argue vigorously in any particular direction. Group A however have to argue because they won't use the system at all without that feature, and they may want to use the system for other reasons, so that because a crucial aspect for them. A much better way IMO is to test it - to get a bunch of volunteers and see who responds positively to a demo *without* direct manual control. To do that we need a demoable thing, which might just be mockups that show a set of workflows (and include things like Jay's shiny-new-hardware use case in the demo). I rather suspect we're building that anyway as part of doing UX work, so maybe what we do is put a tweet or blog post up asking for sysadmins who a) have not yet deployed openstack, b) want to, and c) are willing to spend 20-30 minutes with us, walk them through a demo showing no manual control, and record what questions they ask, and whether they would like to have that product to us, and if not, then (a) what use cases they can't address with the mockups and (b) what other reasons they have for not using it. This is a bunch of work though! So, do we need to do that work? *If* we can layer manual control on later, then we could defer this testing until we are at the point where we can say 'the nova scheduled version is ready, now lets decide if we add the manual control'. OTOH, if we *cannot* layer manual control on later - if it has tentacles through too much of the code base, then we need to decide earlier, because it will be significantly harder to add later and that may be too late of a ship date for vendors shipping on top of TripleO. So with that as a prelude, my technical sense is that we can layer manual scheduling on later: we provide an advanced screen, show the list of N instances we're going to ask for and allow each instance to be directly customised with a node id selected from either the current node it's running on or an available node. It's significant work both UI and plumbing, but it's not going to be made harder by the other work we're doing AFAICT. - My proposal is that we shelve this discussion until we have the nova/heat scheduled version in 'and now we polish' mode, and then pick it back up and assess user needs. An alternative argument is to say that group A is a majority of the userbase and that doing an automatic version is entirely unnecessary. Thats also possible, but I'm extremely skeptical, given the huge cost of staff time, and the complete lack of interest my sysadmin friends (and my former sysadmin self) have in doing automatable things by hand. I just wanted to add a few thoughts: For some comparative information here from the field I work extensively on deployments of large OpenStack implementations, most recently with a ~220node/9rack deployment (scaling up to 42racks / 1024 nodes soon). My primary role is of a Devops/Sysadmin
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 12/12/2013 04:25 PM, Keith Basil wrote: On Dec 12, 2013, at 4:05 PM, Jay Dobies wrote: Maybe this is a valid use case? Cloud operator has several core service nodes of differing configuration types. [node1] -- balanced mix of disk/cpu/ram for general core services [node2] -- lots of disks for Ceilometer data storage [node3] -- low-end appliance like box for a specialized/custom core service (SIEM box for example) All nodes[1,2,3] are in the same deployment grouping (core services). As such, this is a heterogenous deployment grouping. Heterogeneity in this case defined by differing roles and hardware configurations. This is a real use case. How do we handle this? This is the sort of thing I had been concerned with, but I think this is just a variation on Robert's GPU example. Rather than butcher it by paraphrasing, I'll just include the relevant part: The basic stuff we're talking about so far is just about saying each role can run on some set of undercloud flavors. If that new bit of kit has the same coarse metadata as other kit, Nova can't tell it apart. So the way to solve the problem is: - a) teach Ironic about the specialness of the node (e.g. a tag 'GPU') - b) teach Nova that there is a flavor that maps to the presence of that specialness, and c) teach Nova that other flavors may not map to that specialness then in Tuskar whatever Nova configuration is needed to use that GPU is a special role ('GPU compute' for instance) and only that role would be given that flavor to use. That special config is probably being in a host aggregate, with an overcloud flavor that specifies that aggregate, which means at the TripleO level we need to put the aggregate in the config metadata for that role, and the admin does a one-time setup in the Nova Horizon UI to configure their GPU compute flavor. Yes, the core services example is a variation on the above. The idea of _undercloud_ flavor assignment (flavor to role mapping) escaped me when I read that earlier. It appears to be very elegant and provides another attribute for Tuskar's notion of resource classes. So +1 here. You mention three specific nodes, but what you're describing is more likely three concepts: - Balanced Nodes - High Disk I/O Nodes - Low-End Appliance Nodes They may have one node in each, but I think your example of three nodes is potentially *too* simplified to be considered as proper sample size. I'd guess there are more than three in play commonly, in which case the concepts breakdown starts to be more appealing. Correct - definitely more than three, I just wanted to illustrate the use case. I not sure I explained what I was getting at properly. I wasn't implying you thought it was limited to just three. I do the same thing, simplify down for discussion purposes (I've done so in my head about this very topic). But I think this may be a rare case where simplifying actually masks the concept rather than exposes it. Manual feels a bit more desirable in small sample groups but when looking at larger sets of nodes, the flavor concept feels less odd than it does when defining a flavor for a single machine. That's all. :) Maybe that was clear already, but I wanted to make sure I didn't come off as attacking your example. It certainly wasn't my intention. The balanced v. disk machine thing is the sort of thing I'd been thinking for a while but hadn't found a good way to make concrete. I think the disk flavor in particular has quite a few use cases, especially until SSDs are ubiquitous. I'd want to flag those (in Jay terminology, the disk hotness) as hosting the data-intensive portions, but where I had previously been viewing that as manual allocation, it sounds like the approach is to properly categorize them for what they are and teach Nova how to use them. Robert - Please correct me if I misread any of what your intention was, I don't want to drive people down the wrong path if I'm misinterpretting anything. -k ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/10/12 23:09, Robert Collins wrote: On 11 December 2013 05:42, Jaromir Coufal jcou...@redhat.com wrote: On 2013/09/12 23:38, Tzu-Mainn Chen wrote: The disagreement comes from whether we need manual node assignment or not. I would argue that we need to step back and take a look at the real use case: heterogeneous nodes. If there are literally no characteristics that differentiate nodes A and B, then why do we care which gets used for what? Why do we need to manually assign one? Ideally, we don't. But with this approach we would take out the possibility to change something or decide something from the user. So, I think this is where the confusion is. Using the nova scheduler doesn't prevent change or control. It just ensures the change and control happen in the right place: the Nova scheduler has had years of work, of features and facilities being added to support HPC, HA and other such use cases. It should have everything we need [1], without going down to manual placement. For clarity: manual placement is when any of the user, Tuskar, or Heat query Ironic, select a node, and then use a scheduler hint to bypass the scheduler. This is very well written. I am all for things going to right places. The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Thats speculation. We don't know if they will or will not because we haven't given them a working system to test. Some part of that is speculation, some part of that is feedback from people who are doing deployments (of course its just very limited audience). Anyway, it is not just pure theory. Lets break the concern into two halves: A) Users who could have their needs met, but won't use TripleO because meeting their needs in this way is too hard/complex/painful. B) Users who have a need we cannot meet with the current approach. For category B users, their needs might be specific HA things - like the oft discussed failure domains angle, where we need to split up HA clusters across power bars, aircon, switches etc. Clearly long term we want to support them, and the undercloud Nova scheduler is entirely capable of being informed about this, and we can evolve to a holistic statement over time. Lets get a concrete list of the cases we can think of today that won't be well supported initially, and we can figure out where to do the work to support them properly. My question is - can't we help them now? To enable users to use our app even when we don't have enough smartness to help them 'auto' way? For category A users, I think that we should get concrete examples, and evolve our design (architecture and UX) to make meeting those needs pleasant. +1... I tried to pull some operators into this discussion thread, will try to get more. What we shouldn't do is plan complex work without concrete examples that people actually need. Jay's example of some shiny new compute servers with special parts that need to be carved out was a great one - we can put that in category A, and figure out if it's easy enough, or obvious enough - and think about whether we document it or make it a guided workflow or $whatever. Somebody might argue - why do we care? If user doesn't like TripleO paradigm, he shouldn't use the UI and should use another tool. But the UI is not only about TripleO. Yes, it is underlying concept, but we are working on future *official* OpenStack deployment tool. We should care to enable people to deploy OpenStack - large/small scale, homo/heterogeneous hardware, typical or a bit more specific use-cases. The difficulty I'm having is that the discussion seems to assume that 'heterogeneous implies manual', but I don't agree that that implication is necessary! No, I don't agree with this either. Heterogeneous hardware can be very well managed automatically as well as homogeneous (classes, node profiles). As an underlying paradigm of how to install cloud - awesome idea, awesome concept, it works. But user doesn't care about how it is being deployed for him. He cares about getting what he wants/needs. And we shouldn't go that far that we violently force him to treat his infrastructure as cloud. I believe that possibility to change/control - if needed - is very important and we should care. I propose that we make concrete use cases: 'Fred cannot use TripleO without manual assignment because XYZ'. Then we can assess how important XYZ is to our early adopters and go from there. +1, yes. I will try to bug more relevant people, who could contribute at this area. And what is key for us is to *enable* users - not to prevent them from
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/10/12 19:39, Tzu-Mainn Chen wrote: Ideally, we don't. But with this approach we would take out the possibility to change something or decide something from the user. The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Somebody might argue - why do we care? If user doesn't like TripleO paradigm, he shouldn't use the UI and should use another tool. But the UI is not only about TripleO. Yes, it is underlying concept, but we are working on future *official* OpenStack deployment tool. We should care to enable people to deploy OpenStack - large/small scale, homo/heterogeneous hardware, typical or a bit more specific use-cases. I think this is a very important clarification, and I'm glad you made it. It sounds like manual assignment is actually a sub-requirement, and the feature you're arguing for is: supporting non-TripleO deployments. Mostly but not only. The other argument is - keeping control on stuff I am doing. Note that undercloud user is different from overcloud user. That might be a worthy goal, but I think it's a distraction for the Icehouse timeframe. Each new deployment strategy requires not only a new UI, but different deployment architectures that could have very little common with each other. Designing them all to work in the same space is a recipe for disaster, a convoluted gnarl of code that doesn't do any one thing particularly well. To use an analogy: there's a reason why no one makes a flying boat car. I'm going to strongly advocate that for Icehouse, we focus exclusively on large scale TripleO deployments, working to make that UI and architecture as sturdy as we can. Future deployment strategies should be discussed in the future, and if they're not TripleO based, they should be discussed with the proper OpenStack group. One concern here is - it is quite likely that we get people excited about this approach - it will be a new boom - 'wow', there is automagic doing everything for me. But then the question would be reality - how many from that excited users will actually use TripleO for their real deployments (I mean in the early stages)? Would it be only couple of them (because of covered use cases, concerns of maturity, lack of control scarcity)? Can we assure them that if anything goes wrong, they have control over it? -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/10/12 19:39, Tzu-Mainn Chen wrote: Ideally, we don't. But with this approach we would take out the possibility to change something or decide something from the user. The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Somebody might argue - why do we care? If user doesn't like TripleO paradigm, he shouldn't use the UI and should use another tool. But the UI is not only about TripleO. Yes, it is underlying concept, but we are working on future *official* OpenStack deployment tool. We should care to enable people to deploy OpenStack - large/small scale, homo/heterogeneous hardware, typical or a bit more specific use-cases. I think this is a very important clarification, and I'm glad you made it. It sounds like manual assignment is actually a sub-requirement, and the feature you're arguing for is: supporting non-TripleO deployments. Mostly but not only. The other argument is - keeping control on stuff I am doing. Note that undercloud user is different from overcloud user. Sure, but again, that argument seems to me to be a non-TripleO approach. I'm not saying that it's not a possible use case, I'm saying that you're advocating for a deployment strategy that fundamentally diverges from the TripleO philosophy - and as such, that strategy will likely require a separate UI, underlying architecture, etc, and should not be planned for in the Icehouse timeframe. That might be a worthy goal, but I think it's a distraction for the Icehouse timeframe. Each new deployment strategy requires not only a new UI, but different deployment architectures that could have very little common with each other. Designing them all to work in the same space is a recipe for disaster, a convoluted gnarl of code that doesn't do any one thing particularly well. To use an analogy: there's a reason why no one makes a flying boat car. I'm going to strongly advocate that for Icehouse, we focus exclusively on large scale TripleO deployments, working to make that UI and architecture as sturdy as we can. Future deployment strategies should be discussed in the future, and if they're not TripleO based, they should be discussed with the proper OpenStack group. One concern here is - it is quite likely that we get people excited about this approach - it will be a new boom - 'wow', there is automagic doing everything for me. But then the question would be reality - how many from that excited users will actually use TripleO for their real deployments (I mean in the early stages)? Would it be only couple of them (because of covered use cases, concerns of maturity, lack of control scarcity)? Can we assure them that if anything goes wrong, they have control over it? -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/09/12 17:15, Tzu-Mainn Chen wrote: - As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. Why? Whats her motivation. One plausible one for me is 'a machine needs to be serviced so Anna wants to remove it from the deployment to avoid causing user visible downtime.' So lets say that: Anna needs to be able to take machines out of service so they can be maintained or disposed of. Node being serviced is a different user story for me. I believe we are still 'fighting' here with two approaches and I believe we need both. We can't only provide a way 'give us resources we will do a magic'. Yes this is preferred way - especially for large deployments, but we also need a fallback so that user can say - no, this node doesn't belong to the class, I don't want it there - unassign. Or I need to have this node there - assign. Just for clarification - the wireframes don't cover individual nodes being manually assigned, do they? I thought the concession to manual control was entirely through resource classes and node profiles, which are still parameters to be passed through to the nova-scheduler filter. To me, that's very different from manual assignment. Mainn It's all doable and wireframes are prepared for the manual assignment as well, Mainn. I just was not designing details for now, since we are going to focus on auto-distribution first. But I will cover this use case in later iterations of wireframes. Cheers -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/09/12 21:22, Robert Collins wrote: Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory stats For registration it is just Management MAC address which is needed right? Or does Ironic need also IP? I think that MAC address might be enough, we can display IP in details of node later on. Ironic needs all the details I listed today. Management MAC is not currently used at all, but would be needed in future when we tackle IPMI IP managed by Neutron. OK, I will reflect that in wireframes for UI. * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) Why is this under 'nodes'? I challenge the idea that it should be there. We will need to surface some stuff about nodes, but the underlying idea is to take a cloud approach here - so we're monitoring services, that happen to be on nodes. There is room to monitor nodes, as an undercloud feature set, but lets be very very specific about what is sitting at what layer. We need both - we need to track services but also state of nodes (CPU, RAM, Network bandwidth, etc). So in node detail you should be able to track both. Those are instance characteristics, not node characteristics. An instance is software running on a Node, and the amount of CPU/RAM/NIC utilisation is specific to that software while it's on that Node, not to future or past instances running on that Node. I think this is minor detail. Node has certain CPU/RAM/NIC capacity and instance is consuming it. Either way it is important for us to display this utilization in the UI as well as service statistics. * Resource nodes ^ nodes is again confusing layers - nodes are what things are deployed to, but they aren't the entry point Can you, please be a bit more specific here? I don't understand this note. By the way, can you get your email client to insert before the text you are replying to rather than HTML | marks? Hard to tell what I wrote and what you did :). Oh right, sure, sorry. Should be fixed ;) By that note I meant, that Nodes are not resources, Resource instances run on Nodes. Nodes are the generic pool of hardware we can deploy things onto. Well right, this is the terminology. From my point of view, resources for overcloud are the instances which are running on Nodes. Once we deploy the nodes with appropriate software they become Resource Nodes (from unallocated pool). If this terminology is confusing already then we should fix it. Any suggestions for improvements? * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. It can be auto-allocation. I don't see problem with 'unallocated' term. Ok, it's not a biggy. I do think it will frame things poorly and lead to an expectation about how TripleO works that doesn't match how it does, but we can change it later if I'm right, and if I'm wrong, well it won't be the first time :). I think we will figure it out in the other thread (where we talk about allocation). Anyway - I am interested in how differently would you formulate Unallocated / Resource / Management Nodes? Maybe your is better :) -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/09/12 23:38, Tzu-Mainn Chen wrote: Thanks for the explanation! I'm going to claim that the thread revolves around two main areas of disagreement. Then I'm going to propose a way through: a) Manual Node Assignment I think that everyone is agreed that automated node assignment through nova-scheduler is by far the most ideal case; there's no disagreement there. +1 The disagreement comes from whether we need manual node assignment or not. I would argue that we need to step back and take a look at the real use case: heterogeneous nodes. If there are literally no characteristics that differentiate nodes A and B, then why do we care which gets used for what? Why do we need to manually assign one? Ideally, we don't. But with this approach we would take out the possibility to change something or decide something from the user. The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Somebody might argue - why do we care? If user doesn't like TripleO paradigm, he shouldn't use the UI and should use another tool. But the UI is not only about TripleO. Yes, it is underlying concept, but we are working on future *official* OpenStack deployment tool. We should care to enable people to deploy OpenStack - large/small scale, homo/heterogeneous hardware, typical or a bit more specific use-cases. As an underlying paradigm of how to install cloud - awesome idea, awesome concept, it works. But user doesn't care about how it is being deployed for him. He cares about getting what he wants/needs. And we shouldn't go that far that we violently force him to treat his infrastructure as cloud. I believe that possibility to change/control - if needed - is very important and we should care. And what is key for us is to *enable* users - not to prevent them from using our deployment tool, because it doesn't work for their requirements. If we can agree on that, then I think it would be sufficient to say that we want a mechanism to allow UI users to deal with heterogeneous nodes, and that mechanism must use nova-scheduler. In my mind, that's what resource classes and node profiles are intended for. Not arguing on this point. Though that mechanism should support also cases, where user specifies a role for a node / removes node from a role. The rest of nodes which I don't care about should be handled by nova-scheduler. One possible objection might be: nova scheduler doesn't have the appropriate filter that we need to separate out two nodes. In that case, I would say that needs to be taken up with nova developers. Give it to Nova guys to fix it... What if that user's need would be undercloud specific requirement? Why should Nova guys care? What should our unhappy user do until then? Use other tool? Will he be willing to get back to use our tool once it is ready? I can also see other use-cases. It can be distribution based on power sockets, networking connections, etc. We can't think about all the ways which our user will need. b) Terminology It feels a bit like some of the disagreement come from people using different words for the same thing. For example, the wireframes already details a UI where Robert's roles come first, but I think that message was confused because I mentioned node types in the requirements. So could we come to some agreement on what the most exact terminology would be? I've listed some examples below, but I'm sure there are more. node type | role +1 role management node | ? resource node | ? unallocated | aqvailable | undeployed +1 unallocated ceate a node distribution | size the deployment * Distribute nodes resource classes | ? Service classes? node profiles | ? So when we talk about 'unallocated Nodes', the implication is that users 'allocate Nodes', but they don't: they size roles, and after doing all that there may be some Nodes that are - yes - unallocated, or have nothing scheduled to them. So... I'm not debating that we should have a list of free hardware - we totally should - I'm debating how we frame it. 'Available Nodes' or 'Undeployed machines' or whatever. The allocation can happen automatically, so from my point of view I don't see big problem with 'allocate' term. I just want to get away from talking about something ([manual] allocation) that we don't offer. We don't at the moment but we should :) -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Thanks for the reply! Comments in-line: The disagreement comes from whether we need manual node assignment or not. I would argue that we need to step back and take a look at the real use case: heterogeneous nodes. If there are literally no characteristics that differentiate nodes A and B, then why do we care which gets used for what? Why do we need to manually assign one? Ideally, we don't. But with this approach we would take out the possibility to change something or decide something from the user. The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Somebody might argue - why do we care? If user doesn't like TripleO paradigm, he shouldn't use the UI and should use another tool. But the UI is not only about TripleO. Yes, it is underlying concept, but we are working on future *official* OpenStack deployment tool. We should care to enable people to deploy OpenStack - large/small scale, homo/heterogeneous hardware, typical or a bit more specific use-cases. I think this is a very important clarification, and I'm glad you made it. It sounds like manual assignment is actually a sub-requirement, and the feature you're arguing for is: supporting non-TripleO deployments. That might be a worthy goal, but I think it's a distraction for the Icehouse timeframe. Each new deployment strategy requires not only a new UI, but different deployment architectures that could have very little common with each other. Designing them all to work in the same space is a recipe for disaster, a convoluted gnarl of code that doesn't do any one thing particularly well. To use an analogy: there's a reason why no one makes a flying boat car. I'm going to strongly advocate that for Icehouse, we focus exclusively on large scale TripleO deployments, working to make that UI and architecture as sturdy as we can. Future deployment strategies should be discussed in the future, and if they're not TripleO based, they should be discussed with the proper OpenStack group. As an underlying paradigm of how to install cloud - awesome idea, awesome concept, it works. But user doesn't care about how it is being deployed for him. He cares about getting what he wants/needs. And we shouldn't go that far that we violently force him to treat his infrastructure as cloud. I believe that possibility to change/control - if needed - is very important and we should care. And what is key for us is to *enable* users - not to prevent them from using our deployment tool, because it doesn't work for their requirements. If we can agree on that, then I think it would be sufficient to say that we want a mechanism to allow UI users to deal with heterogeneous nodes, and that mechanism must use nova-scheduler. In my mind, that's what resource classes and node profiles are intended for. Not arguing on this point. Though that mechanism should support also cases, where user specifies a role for a node / removes node from a role. The rest of nodes which I don't care about should be handled by nova-scheduler. One possible objection might be: nova scheduler doesn't have the appropriate filter that we need to separate out two nodes. In that case, I would say that needs to be taken up with nova developers. Give it to Nova guys to fix it... What if that user's need would be undercloud specific requirement? Why should Nova guys care? What should our unhappy user do until then? Use other tool? Will he be willing to get back to use our tool once it is ready? I can also see other use-cases. It can be distribution based on power sockets, networking connections, etc. We can't think about all the ways which our user will need. In this case - it would be our job to make the Nova guys care and to work with them to develop the feature. Creating parallel services with the same fundamental purpose - I think that runs counter to what OpenStack is designed for. b) Terminology It feels a bit like some of the disagreement come from people using different words for the same thing. For example, the wireframes already details a UI where Robert's roles come first, but I think that message was confused because I mentioned node types in the requirements. So could we come to some agreement on what the most exact terminology would be? I've listed some examples below, but I'm sure there are more. node type | role +1 role management node | ? resource node | ? unallocated | aqvailable | undeployed +1 unallocated ceate a node distribution | size the
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Thanks for the explanation! I'm going to claim that the thread revolves around two main areas of disagreement. Then I'm going to propose a way through: a) Manual Node Assignment I think that everyone is agreed that automated node assignment through nova-scheduler is by far the most ideal case; there's no disagreement there. The disagreement comes from whether we need manual node assignment or not. I would argue that we need to step back and take a look at the real use case: heterogeneous nodes. If there are literally no characteristics that differentiate nodes A and B, then why do we care which gets used for what? Why do we need to manually assign one? This is a better way of verbalizing my concerns. I suspect there are going to be quite a few heterogeneous environments built from legacy pieces in the near term and fewer built from the ground up with all new matching hotness. On the other side of it, instead of handling legacy hardware I was worried about the new hotness (not sure why I keep using that term) specialized for a purpose. This is exactly what Robert described in his GPU example. I think his explanation of how to use the scheduler to accommodate that makes a lot of sense, so I'm much less behind the idea of a strict manual assignment than I previously was. If we can agree on that, then I think it would be sufficient to say that we want a mechanism to allow UI users to deal with heterogeneous nodes, and that mechanism must use nova-scheduler. In my mind, that's what resource classes and node profiles are intended for. One possible objection might be: nova scheduler doesn't have the appropriate filter that we need to separate out two nodes. In that case, I would say that needs to be taken up with nova developers. b) Terminology It feels a bit like some of the disagreement come from people using different words for the same thing. For example, the wireframes already details a UI where Robert's roles come first, but I think that message was confused because I mentioned node types in the requirements. So could we come to some agreement on what the most exact terminology would be? I've listed some examples below, but I'm sure there are more. node type | role management node | ? resource node | ? unallocated | available | undeployed create a node distribution | size the deployment resource classes | ? node profiles | ? Mainn - Original Message - On 10 December 2013 09:55, Tzu-Mainn Chen tzuma...@redhat.com wrote: * created as part of undercloud install process By that note I meant, that Nodes are not resources, Resource instances run on Nodes. Nodes are the generic pool of hardware we can deploy things onto. I don't think resource nodes is intended to imply that nodes are resources; rather, it's supposed to indicate that it's a node where a resource instance runs. It's supposed to separate it from management node and unallocated node. So the question is are we looking at /nodes/ that have a /current role/, or are we looking at /roles/ that have some /current nodes/. My contention is that the role is the interesting thing, and the nodes is the incidental thing. That is, as a sysadmin, my hierarchy of concerns is something like: A: are all services running B: are any of them in a degraded state where I need to take prompt action to prevent a service outage [might mean many things: - software update/disk space criticals/a machine failed and we need to scale the cluster back up/too much load] C: are there any planned changes I need to make [new software deploy, feature request from user, replacing a faulty machine] D: are there long term issues sneaking up on me [capacity planning, machine obsolescence] If we take /nodes/ as the interesting thing, and what they are doing right now as the incidental thing, it's much harder to map that onto the sysadmin concerns. If we start with /roles/ then can answer: A: by showing the list of roles and the summary stats (how many machines, service status aggregate), role level alerts (e.g. nova-api is not responding) B: by showing the list of roles and more detailed stats (overall load, response times of services, tickets against services and a list of in trouble instances in each role - instances with alerts against them - low disk, overload, failed service, early-detection alerts from hardware C: probably out of our remit for now in the general case, but we need to enable some things here like replacing faulty machines D: by looking at trend graphs for roles (not machines), but also by looking at the hardware in aggregate - breakdown by age of machines, summary data for tickets filed against instances that were deployed to a particular machine C: and D: are (F) category work, but for all but the very last thing, it seems clear how to approach this from a roles perspective. I've tried to approach this using /nodes/ as the starting point, and after two terrible drafts
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 11 December 2013 05:42, Jaromir Coufal jcou...@redhat.com wrote: On 2013/09/12 23:38, Tzu-Mainn Chen wrote: The disagreement comes from whether we need manual node assignment or not. I would argue that we need to step back and take a look at the real use case: heterogeneous nodes. If there are literally no characteristics that differentiate nodes A and B, then why do we care which gets used for what? Why do we need to manually assign one? Ideally, we don't. But with this approach we would take out the possibility to change something or decide something from the user. So, I think this is where the confusion is. Using the nova scheduler doesn't prevent change or control. It just ensures the change and control happen in the right place: the Nova scheduler has had years of work, of features and facilities being added to support HPC, HA and other such use cases. It should have everything we need [1], without going down to manual placement. For clarity: manual placement is when any of the user, Tuskar, or Heat query Ironic, select a node, and then use a scheduler hint to bypass the scheduler. The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Thats speculation. We don't know if they will or will not because we haven't given them a working system to test. Lets break the concern into two halves: A) Users who could have their needs met, but won't use TripleO because meeting their needs in this way is too hard/complex/painful. B) Users who have a need we cannot meet with the current approach. For category B users, their needs might be specific HA things - like the oft discussed failure domains angle, where we need to split up HA clusters across power bars, aircon, switches etc. Clearly long term we want to support them, and the undercloud Nova scheduler is entirely capable of being informed about this, and we can evolve to a holistic statement over time. Lets get a concrete list of the cases we can think of today that won't be well supported initially, and we can figure out where to do the work to support them properly. For category A users, I think that we should get concrete examples, and evolve our design (architecture and UX) to make meeting those needs pleasant. What we shouldn't do is plan complex work without concrete examples that people actually need. Jay's example of some shiny new compute servers with special parts that need to be carved out was a great one - we can put that in category A, and figure out if it's easy enough, or obvious enough - and think about whether we document it or make it a guided workflow or $whatever. Somebody might argue - why do we care? If user doesn't like TripleO paradigm, he shouldn't use the UI and should use another tool. But the UI is not only about TripleO. Yes, it is underlying concept, but we are working on future *official* OpenStack deployment tool. We should care to enable people to deploy OpenStack - large/small scale, homo/heterogeneous hardware, typical or a bit more specific use-cases. The difficulty I'm having is that the discussion seems to assume that 'heterogeneous implies manual', but I don't agree that that implication is necessary! As an underlying paradigm of how to install cloud - awesome idea, awesome concept, it works. But user doesn't care about how it is being deployed for him. He cares about getting what he wants/needs. And we shouldn't go that far that we violently force him to treat his infrastructure as cloud. I believe that possibility to change/control - if needed - is very important and we should care. I propose that we make concrete use cases: 'Fred cannot use TripleO without manual assignment because XYZ'. Then we can assess how important XYZ is to our early adopters and go from there. And what is key for us is to *enable* users - not to prevent them from using our deployment tool, because it doesn't work for their requirements. Totally agreed :) If we can agree on that, then I think it would be sufficient to say that we want a mechanism to allow UI users to deal with heterogeneous nodes, and that mechanism must use nova-scheduler. In my mind, that's what resource classes and node profiles are intended for. Not arguing on this point. Though that mechanism should support also cases, where user specifies a role for a node / removes node from a role. The rest of nodes which I don't care about should be handled by nova-scheduler. Why! What is a use case for removing a role from a node while leaving that node in service? Lets be specific, always, when we're using categories of use
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 12/06/2013 09:39 PM, Tzu-Mainn Chen wrote: Thanks for the comments and questions! I fully expect that this list of requirements will need to be fleshed out, refined, and heavily modified, so the more the merrier. Comments inline: *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES Note that everything in this section should be Ironic API calls. * Creation * Manual registration * hardware specs from Ironic based on mac address (M) Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory stats * IP auto populated from Neutron (F) Do you mean IPMI IP ? I'd say IPMI address managed by Neutron here. * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) Why is this under 'nodes'? I challenge the idea that it should be there. We will need to surface some stuff about nodes, but the underlying idea is to take a cloud approach here - so we're monitoring services, that happen to be on nodes. There is room to monitor nodes, as an undercloud feature set, but lets be very very specific about what is sitting at what layer. That's a fair point. At the same time, the UI does want to monitor both services and the nodes that the services are running on, correct? I would think that a user would want this. Would it be better to explicitly split this up into two separate requirements? That was my understanding as well, that Tuskar would not only care about the services of the undercloud but the health of the actual hardware on which it's running. As I write that I think you're correct, two separate requirements feels much more explicit in how that's different from elsewhere in OpenStack. * Management node (where triple-o is installed) This should be plural :) - TripleO isn't a single service to be installed - We've got Tuskar, Ironic, Nova, Glance, Keystone, Neutron, etc. I misspoke here - this should be where the undercloud is installed. My current understanding is that our initial release will only support the undercloud being installed onto a single node, but my understanding could very well be flawed. * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes ^ nodes is again confusing layers - nodes are what things are deployed to, but they aren't the entry point * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types Not by users though. We need to stop thinking of this as 'what we do to nodes' - Nova/Ironic operate on nodes, we operate on Heat templates. Right, I didn't mean to imply that users would be doing this allocation. But once Nova does this allocation, the UI does want to be aware of how the allocation is done, right? That's what this requirement meant. * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) Whats a node type? Compute/controller/object storage/block storage. Is another term besides node type more accurate? * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) I'm not clear on this - you can list the nodes that have had a particular thing deployed on them; we probably can get a good answer to being able to see what nodes a particular flavor can deploy to, but we don't want to be second guessing the scheduler.. Correct; the goal here is to provide a way through the UI to send additional filtering requirements that will eventually be passed into the scheduler, allowing the scheduler to apply additional filters. * nodes can be viewed by node types * additional group by status, hardware specification *Instances* - e.g. hypervisors, storage, block storage etc. * controller node type Again, need to get away from node type here. * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. Is it imprecise to say that nodes are allocated by the scheduler? Would something like 'active/idle' be better? * Archived nodes (F) * Will be
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 09/12/13 18:01, Jay Dobies wrote: I believe we are still 'fighting' here with two approaches and I believe we need both. We can't only provide a way 'give us resources we will do a magic'. Yes this is preferred way - especially for large deployments, but we also need a fallback so that user can say - no, this node doesn't belong to the class, I don't want it there - unassign. Or I need to have this node there - assign. +1 to this. I think there are still a significant amount of admins out there that are really opposed to magic and want that fine-grained control. Even if they don't use it that frequently, in my experience they want to know it's there in the event they need it (and will often dream up a case that they'll need it). +1 to the responses to the 'automagic' vs 'manual' discussion. The latter is in fact only really possible in small deployments. But that's not to say it is not a valid use case. Perhaps we need to split it altogether into two use cases. At least we should have a level of agreement here and register blueprints for both: for Icehouse the auto selection of which services go onto which nodes (i.e. allocation of services to nodes is entirely transparent). For post Icehouse allow manual allocation of services to nodes. This last bit may also coincide with any work being done in Ironic/Nova scheduler which will make this allocation prettier than the current force_nodes situation. I'm absolutely for pushing the magic approach as the preferred use. And in large deployments that's where people are going to see the biggest gain. The fine-grained approach can even be pushed off as a future feature. But I wouldn't be surprised to see people asking for it and I'd like to at least be able to say it's been talked about. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. Why? This is super generic and could mean anything. I believe this has something to do with 'archived nodes'. But correct me if I am wrong. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
- As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. Why? Whats her motivation. One plausible one for me is 'a machine needs to be serviced so Anna wants to remove it from the deployment to avoid causing user visible downtime.' So lets say that: Anna needs to be able to take machines out of service so they can be maintained or disposed of. Node being serviced is a different user story for me. I believe we are still 'fighting' here with two approaches and I believe we need both. We can't only provide a way 'give us resources we will do a magic'. Yes this is preferred way - especially for large deployments, but we also need a fallback so that user can say - no, this node doesn't belong to the class, I don't want it there - unassign. Or I need to have this node there - assign. Just for clarification - the wireframes don't cover individual nodes being manually assigned, do they? I thought the concession to manual control was entirely through resource classes and node profiles, which are still parameters to be passed through to the nova-scheduler filter. To me, that's very different from manual assignment. Mainn ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 9 December 2013 23:56, Jaromir Coufal jcou...@redhat.com wrote: On 2013/07/12 01:59, Robert Collins wrote: * Creation * Manual registration * hardware specs from Ironic based on mac address (M) Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory stats For registration it is just Management MAC address which is needed right? Or does Ironic need also IP? I think that MAC address might be enough, we can display IP in details of node later on. Ironic needs all the details I listed today. Management MAC is not currently used at all, but would be needed in future when we tackle IPMI IP managed by Neutron. * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) Why is this under 'nodes'? I challenge the idea that it should be there. We will need to surface some stuff about nodes, but the underlying idea is to take a cloud approach here - so we're monitoring services, that happen to be on nodes. There is room to monitor nodes, as an undercloud feature set, but lets be very very specific about what is sitting at what layer. We need both - we need to track services but also state of nodes (CPU, RAM, Network bandwidth, etc). So in node detail you should be able to track both. Those are instance characteristics, not node characteristics. An instance is software running on a Node, and the amount of CPU/RAM/NIC utilisation is specific to that software while it's on that Node, not to future or past instances running on that Node. * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes ^ nodes is again confusing layers - nodes are what things are deployed to, but they aren't the entry point Can you, please be a bit more specific here? I don't understand this note. By the way, can you get your email client to insert before the text you are replying to rather than HTML | marks? Hard to tell what I wrote and what you did :). By that note I meant, that Nodes are not resources, Resource instances run on Nodes. Nodes are the generic pool of hardware we can deploy things onto. * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types Not by users though. We need to stop thinking of this as 'what we do to nodes' - Nova/Ironic operate on nodes, we operate on Heat templates. Discussed in other threads, but I still believe (and I am not alone) that we need to allow 'force nodes'. I'll respond in the other thread :). * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. It can be auto-allocation. I don't see problem with 'unallocated' term. Ok, it's not a biggy. I do think it will frame things poorly and lead to an expectation about how TripleO works that doesn't match how it does, but we can change it later if I'm right, and if I'm wrong, well it won't be the first time :). -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 10 December 2013 09:55, Tzu-Mainn Chen tzuma...@redhat.com wrote: * created as part of undercloud install process By that note I meant, that Nodes are not resources, Resource instances run on Nodes. Nodes are the generic pool of hardware we can deploy things onto. I don't think resource nodes is intended to imply that nodes are resources; rather, it's supposed to indicate that it's a node where a resource instance runs. It's supposed to separate it from management node and unallocated node. So the question is are we looking at /nodes/ that have a /current role/, or are we looking at /roles/ that have some /current nodes/. My contention is that the role is the interesting thing, and the nodes is the incidental thing. That is, as a sysadmin, my hierarchy of concerns is something like: A: are all services running B: are any of them in a degraded state where I need to take prompt action to prevent a service outage [might mean many things: - software update/disk space criticals/a machine failed and we need to scale the cluster back up/too much load] C: are there any planned changes I need to make [new software deploy, feature request from user, replacing a faulty machine] D: are there long term issues sneaking up on me [capacity planning, machine obsolescence] If we take /nodes/ as the interesting thing, and what they are doing right now as the incidental thing, it's much harder to map that onto the sysadmin concerns. If we start with /roles/ then can answer: A: by showing the list of roles and the summary stats (how many machines, service status aggregate), role level alerts (e.g. nova-api is not responding) B: by showing the list of roles and more detailed stats (overall load, response times of services, tickets against services and a list of in trouble instances in each role - instances with alerts against them - low disk, overload, failed service, early-detection alerts from hardware C: probably out of our remit for now in the general case, but we need to enable some things here like replacing faulty machines D: by looking at trend graphs for roles (not machines), but also by looking at the hardware in aggregate - breakdown by age of machines, summary data for tickets filed against instances that were deployed to a particular machine C: and D: are (F) category work, but for all but the very last thing, it seems clear how to approach this from a roles perspective. I've tried to approach this using /nodes/ as the starting point, and after two terrible drafts I've deleted the section. I'd love it if someone could show me how it would work:) * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. It can be auto-allocation. I don't see problem with 'unallocated' term. Ok, it's not a biggy. I do think it will frame things poorly and lead to an expectation about how TripleO works that doesn't match how it does, but we can change it later if I'm right, and if I'm wrong, well it won't be the first time :). I'm interested in what the distinction you're making here is. I'd rather get things defined correctly the first time, and it's very possible that I'm missing a fundamental definition here. So we have: - node - a physical general purpose machine capable of running in many roles. Some nodes may have hardware layout that is particularly useful for a given role. - role - a specific workload we want to map onto one or more nodes. Examples include 'undercloud control plane', 'overcloud control plane', 'overcloud storage', 'overcloud compute' etc. - instance - A role deployed on a node - this is where work actually happens. - scheduling - the process of deciding which role is deployed on which node. The way TripleO works is that we defined a Heat template that lays out policy: 5 instances of 'overcloud control plane please', '20 hypervisors' etc. Heat passes that to Nova, which pulls the image for the role out of Glance, picks a node, and deploys the image to the node. Note in particular the order: Heat - Nova - Scheduler - Node chosen. The user action is not 'allocate a Node to 'overcloud control plane', it is 'size the control plane through heat'. So when we talk about 'unallocated Nodes', the implication is that users 'allocate Nodes', but they don't: they size roles, and after doing all that there may be some Nodes that are - yes - unallocated, or have nothing scheduled to them. So... I'm not debating that we should have a list of free hardware - we totally should - I'm debating how we frame it. 'Available Nodes' or 'Undeployed machines' or whatever. I just want to get away from talking about something ([manual] allocation) that we don't offer. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
So the question is are we looking at /nodes/ that have a /current role/, or are we looking at /roles/ that have some /current nodes/. My contention is that the role is the interesting thing, and the nodes is the incidental thing. That is, as a sysadmin, my hierarchy of concerns is something like: A: are all services running B: are any of them in a degraded state where I need to take prompt action to prevent a service outage [might mean many things: - software update/disk space criticals/a machine failed and we need to scale the cluster back up/too much load] C: are there any planned changes I need to make [new software deploy, feature request from user, replacing a faulty machine] D: are there long term issues sneaking up on me [capacity planning, machine obsolescence] If we take /nodes/ as the interesting thing, and what they are doing right now as the incidental thing, it's much harder to map that onto the sysadmin concerns. If we start with /roles/ then can answer: A: by showing the list of roles and the summary stats (how many machines, service status aggregate), role level alerts (e.g. nova-api is not responding) B: by showing the list of roles and more detailed stats (overall load, response times of services, tickets against services and a list of in trouble instances in each role - instances with alerts against them - low disk, overload, failed service, early-detection alerts from hardware C: probably out of our remit for now in the general case, but we need to enable some things here like replacing faulty machines D: by looking at trend graphs for roles (not machines), but also by looking at the hardware in aggregate - breakdown by age of machines, summary data for tickets filed against instances that were deployed to a particular machine C: and D: are (F) category work, but for all but the very last thing, it seems clear how to approach this from a roles perspective. I've tried to approach this using /nodes/ as the starting point, and after two terrible drafts I've deleted the section. I'd love it if someone could show me how it would work:) * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. It can be auto-allocation. I don't see problem with 'unallocated' term. Ok, it's not a biggy. I do think it will frame things poorly and lead to an expectation about how TripleO works that doesn't match how it does, but we can change it later if I'm right, and if I'm wrong, well it won't be the first time :). I'm interested in what the distinction you're making here is. I'd rather get things defined correctly the first time, and it's very possible that I'm missing a fundamental definition here. So we have: - node - a physical general purpose machine capable of running in many roles. Some nodes may have hardware layout that is particularly useful for a given role. - role - a specific workload we want to map onto one or more nodes. Examples include 'undercloud control plane', 'overcloud control plane', 'overcloud storage', 'overcloud compute' etc. - instance - A role deployed on a node - this is where work actually happens. - scheduling - the process of deciding which role is deployed on which node. This glossary is really handy to make sure we're all speaking the same language. The way TripleO works is that we defined a Heat template that lays out policy: 5 instances of 'overcloud control plane please', '20 hypervisors' etc. Heat passes that to Nova, which pulls the image for the role out of Glance, picks a node, and deploys the image to the node. Note in particular the order: Heat - Nova - Scheduler - Node chosen. The user action is not 'allocate a Node to 'overcloud control plane', it is 'size the control plane through heat'. So when we talk about 'unallocated Nodes', the implication is that users 'allocate Nodes', but they don't: they size roles, and after doing all that there may be some Nodes that are - yes - unallocated, I'm not sure if I should ask this here or to your point above, but what about multi-role nodes? Is there any piece in here that says The policy wants 5 instances but I can fit two of them on this existing underutilized node and three of them on unallocated nodes or since it's all at the image level you get just what's in the image and that's the finest-level of granularity? or have nothing scheduled to them. So... I'm not debating that we should have a list of free hardware - we totally should - I'm debating how we frame it. 'Available Nodes' or 'Undeployed machines' or whatever. I just want to get away from talking about something ([manual] allocation) that we don't offer. My only concern here is that we're not talking about cloud users, we're talking about admins adminning (we'll pretend it's a word, come with me) a cloud. To a cloud user, give me some power so I can do some stuff is a safe use case if I trust the cloud I'm running on. I trust
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 10 December 2013 10:57, Jay Dobies jason.dob...@redhat.com wrote: So we have: - node - a physical general purpose machine capable of running in many roles. Some nodes may have hardware layout that is particularly useful for a given role. - role - a specific workload we want to map onto one or more nodes. Examples include 'undercloud control plane', 'overcloud control plane', 'overcloud storage', 'overcloud compute' etc. - instance - A role deployed on a node - this is where work actually happens. - scheduling - the process of deciding which role is deployed on which node. This glossary is really handy to make sure we're all speaking the same language. The way TripleO works is that we defined a Heat template that lays out policy: 5 instances of 'overcloud control plane please', '20 hypervisors' etc. Heat passes that to Nova, which pulls the image for the role out of Glance, picks a node, and deploys the image to the node. Note in particular the order: Heat - Nova - Scheduler - Node chosen. The user action is not 'allocate a Node to 'overcloud control plane', it is 'size the control plane through heat'. So when we talk about 'unallocated Nodes', the implication is that users 'allocate Nodes', but they don't: they size roles, and after doing all that there may be some Nodes that are - yes - unallocated, I'm not sure if I should ask this here or to your point above, but what about multi-role nodes? Is there any piece in here that says The policy wants 5 instances but I can fit two of them on this existing underutilized node and three of them on unallocated nodes or since it's all at the image level you get just what's in the image and that's the finest-level of granularity? The way we handle that today is to create a composite role that says 'overcloud-compute+cinder storage', for instance - because image is the level of granularity. If/when we get automatic container subdivision - see the other really interesting long-term thread - we could subdivide, but I'd still do that using image as the level of granularity, it's just that we'd have the host image + the container images. or have nothing scheduled to them. So... I'm not debating that we should have a list of free hardware - we totally should - I'm debating how we frame it. 'Available Nodes' or 'Undeployed machines' or whatever. I just want to get away from talking about something ([manual] allocation) that we don't offer. My only concern here is that we're not talking about cloud users, we're talking about admins adminning (we'll pretend it's a word, come with me) a cloud. To a cloud user, give me some power so I can do some stuff is a safe use case if I trust the cloud I'm running on. I trust that the cloud provider has taken the proper steps to ensure that my CPU isn't in New York and my storage in Tokyo. Sure :) To the admin setting up an overcloud, they are the ones providing that trust to eventual cloud users. That's where I feel like more visibility and control are going to be desired/appreciated. I admit what I just said isn't at all concrete. Might even be flat out wrong. I was never an admin, I've just worked on sys management software long enough to have the opinion that their levels of OCD are legendary. I can't shake this feeling that someone is going to slap some fancy new jacked-up piece of hardware onto the network and have a specific purpose they are going to want to use it for. But maybe that's antiquated thinking on my part. I think concrete use cases are the only way we'll get light at the end of the tunnel. So lets say someone puts a new bit of fancy kit onto their network and wants it for e.g. GPU VM instances only. Thats a reasonable desire. The basic stuff we're talking about so far is just about saying each role can run on some set of undercloud flavors. If that new bit of kit has the same coarse metadata as other kit, Nova can't tell it apart. So the way to solve the problem is: - a) teach Ironic about the specialness of the node (e.g. a tag 'GPU') - b) teach Nova that there is a flavor that maps to the presence of that specialness, and c) teach Nova that other flavors may not map to that specialness then in Tuskar whatever Nova configuration is needed to use that GPU is a special role ('GPU compute' for instance) and only that role would be given that flavor to use. That special config is probably being in a host aggregate, with an overcloud flavor that specifies that aggregate, which means at the TripleO level we need to put the aggregate in the config metadata for that role, and the admin does a one-time setup in the Nova Horizon UI to configure their GPU compute flavor. This isn't 'manual allocation' to me - it's surfacing the capabilities from the bottom ('has GPU') and the constraints from the top ('needs GPU') and letting Nova and Heat sort it out. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Thanks for the explanation! I'm going to claim that the thread revolves around two main areas of disagreement. Then I'm going to propose a way through: a) Manual Node Assignment I think that everyone is agreed that automated node assignment through nova-scheduler is by far the most ideal case; there's no disagreement there. The disagreement comes from whether we need manual node assignment or not. I would argue that we need to step back and take a look at the real use case: heterogeneous nodes. If there are literally no characteristics that differentiate nodes A and B, then why do we care which gets used for what? Why do we need to manually assign one? If we can agree on that, then I think it would be sufficient to say that we want a mechanism to allow UI users to deal with heterogeneous nodes, and that mechanism must use nova-scheduler. In my mind, that's what resource classes and node profiles are intended for. One possible objection might be: nova scheduler doesn't have the appropriate filter that we need to separate out two nodes. In that case, I would say that needs to be taken up with nova developers. b) Terminology It feels a bit like some of the disagreement come from people using different words for the same thing. For example, the wireframes already details a UI where Robert's roles come first, but I think that message was confused because I mentioned node types in the requirements. So could we come to some agreement on what the most exact terminology would be? I've listed some examples below, but I'm sure there are more. node type | role management node | ? resource node | ? unallocated | available | undeployed create a node distribution | size the deployment resource classes | ? node profiles | ? Mainn - Original Message - On 10 December 2013 09:55, Tzu-Mainn Chen tzuma...@redhat.com wrote: * created as part of undercloud install process By that note I meant, that Nodes are not resources, Resource instances run on Nodes. Nodes are the generic pool of hardware we can deploy things onto. I don't think resource nodes is intended to imply that nodes are resources; rather, it's supposed to indicate that it's a node where a resource instance runs. It's supposed to separate it from management node and unallocated node. So the question is are we looking at /nodes/ that have a /current role/, or are we looking at /roles/ that have some /current nodes/. My contention is that the role is the interesting thing, and the nodes is the incidental thing. That is, as a sysadmin, my hierarchy of concerns is something like: A: are all services running B: are any of them in a degraded state where I need to take prompt action to prevent a service outage [might mean many things: - software update/disk space criticals/a machine failed and we need to scale the cluster back up/too much load] C: are there any planned changes I need to make [new software deploy, feature request from user, replacing a faulty machine] D: are there long term issues sneaking up on me [capacity planning, machine obsolescence] If we take /nodes/ as the interesting thing, and what they are doing right now as the incidental thing, it's much harder to map that onto the sysadmin concerns. If we start with /roles/ then can answer: A: by showing the list of roles and the summary stats (how many machines, service status aggregate), role level alerts (e.g. nova-api is not responding) B: by showing the list of roles and more detailed stats (overall load, response times of services, tickets against services and a list of in trouble instances in each role - instances with alerts against them - low disk, overload, failed service, early-detection alerts from hardware C: probably out of our remit for now in the general case, but we need to enable some things here like replacing faulty machines D: by looking at trend graphs for roles (not machines), but also by looking at the hardware in aggregate - breakdown by age of machines, summary data for tickets filed against instances that were deployed to a particular machine C: and D: are (F) category work, but for all but the very last thing, it seems clear how to approach this from a roles perspective. I've tried to approach this using /nodes/ as the starting point, and after two terrible drafts I've deleted the section. I'd love it if someone could show me how it would work:) * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. It can be auto-allocation. I don't see problem with 'unallocated' term. Ok, it's not a biggy. I do think it will frame things poorly and lead to an expectation about how TripleO works that doesn't match how it does, but we can change it later if I'm right, and if I'm wrong, well it won't be the first time :). I'm interested in what the distinction you're making
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/06/12 21:26, Tzu-Mainn Chen wrote: * can be allocated as one of four node types It's pretty clear by the current verbiage but I'm going to ask anyway: one and only one? Yep, that's right! Confirming. One and only one. My gut reaction is that we want to bite this off sooner rather than later. This will have data model and API implications that, even if we don't commit to it for Icehouse, should still be in our minds during it, so it might make sense to make it a first class thing to just nail down now. That is entirely correct, which is one reason it's on the list of requirements. The forthcoming API design will have to account for it. Not recreating the entire data model between releases is a key goal :) Well yeah, that's why we should try to think in a longer-term and wireframes are covering also a bit more than might land in Icehouse. So that we are aware of future direction and we don't have to completely rebuild underlying models later on. * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) To my understanding, once this is in Icehouse, we'll have to support upgrades. If this filtering is pushed off, could we get into a situation where an allocation created in Icehouse would no longer be valid in Icehouse+1 once these filters are in place? If so, we might want to make it more of a priority to get them in place earlier and not eat the headache of addressing these sorts of integrity issues later. Hm, can you be a bit more specific about how the allocation created in I might no longer be valid in I+1? That's true. The problem is that to my understanding, the filters we'd need in nova-scheduler are not yet fully in place. I think at the moment there are 'extra params' which we might use to some level. But yes, AFAIK there is missing part for filtered scheduling in nova. I also think that this is an issue that we'll need to address no matter what. Even once filters exist, if a user applies a filter *after* nodes are allocated, we'll need to do something clever if the already-allocated nodes don't meet the filter criteria. Well here is a thing. Once nodes are allocated, you can get warning, that those nodes in the resource class are not fulfilling the criteria (if they were changed) but that's all. It will be up to user's decision if he wants to keep them in or unallocate them. The profiles are important when a decision 'which node can get in' is being made. * nodes can be viewed by node types * additional group by status, hardware specification * controller node type * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes Is there more still being flushed out here? Things like: * Listing unallocated nodes * Unallocating a previously allocated node (does this make it a vanilla resource or does it retain the resource type? is this the only way to change a node's resource type?) If we use policy based approach then yes this is correct. First unallocate a node and then increase number of resources in other class. But I believe that we need keep control over your infrastructure and not to relay only on policies. So I hope we can get into something like 'reallocate'/'allocate manually' which will force a node to be part of specific class. * Unregistering nodes from Tuskar's inventory (I put this under unallocated under the assumption that the workflow will be an explicit unallocate before unregister; I'm not sure if this is the same as archive below). Ah, you're entirely right. I'll add these to the list. * Archived nodes (F) Can you elaborate a bit more on what this is? To be honest, I'm a bit fuzzy about this myself; Jarda mentioned that there was an OpenStack service in the process of being planned that would handle this requirement. Jarda, can you detail a bit? So the thing is based on historical data. At the moment, there is no service which would keep this type of data (might be new project?). Since Tuskar will not be only deploying but also monitoring your deployment, it is important to have historical data available. If user removes some nodes from infrastructure, he would lose all the data and we would not be able to generate graphs.That's why archived nodes = nodes which were registered in past but are no longer available. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/06/12 22:55, Matt Wagner wrote: - As an infrastructure administrator, Anna wants to review the distribution of the nodes that she has assigned before kicking off the Deploy task. What does she expect to see here on the review screen that she didn't see on the previous screens, if anything? Is this just a summation, or is she expecting to see things like which node will get which role? (I'd argue for the former; I don't know that we can predict the latter.) At the beginning, just summation. Later (when we have nova-scheduler reservation) we can get the real distribution of which node is taking which role. - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. I think there's an implied ...through the UI here, versus tailing log files to watch state. Does she just expect to see states like Pending, Deploying, or Finished, versus, say, having the full logs shown in the UI? (I'd vote 'yes'.) For simplified view - yes, only change of states and progress bar. However log should be available. - As an infrastructure administrator, Anna needs to be able to troubleshoot any errors that may occur during the deployment of nodes process. I'm not sure that the ...through the UI implication I mentioned above extends here. (IMHO) I assume that if things fail, Anna might be okay with us showing a message that $foo failed on $bar, and she should try looking in /var/log/$baz for full details. Does that seem fair? (At least early on.) As said above, for simplified views, it is ok to say $foo failed on $bar, but she should be able to track the problem - logs section in the UI. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. Why does she want to view history of past nodes? Note that I'm not arguing against this; it's just not abundantly clear to me what she'll be using this information for. Does she want a history to check off an Audit log checkbox, or will she be looking to extract certain data from this history? Short answer is Graphs - history of utilization of the class etc. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/07/12 02:20, Robert Collins wrote: - As an infrastructure administrator, Anna needs to assign a role to each of the necessary nodes in her OpenStack deployment. The nodes could be either controller, compute, networking, or storage resources depending on the needs of this deployment. Definitely not: she needs to deliver a running cloud. Manually saying 'machine X is a compute node' is confusing an implementation with a need. She needs to know that her cloud will have enough capacity to meet her users needs; she needs to know that it will be resilient against a wide set of failures (and this might be a dial with different clouds having different uptime guarantees); she may need to ensure that some specific hardware configuration is used for storage, as a performance optimisation. None of those needs imply assigning roles to machines. Yes, in ideal world and large deployments. But there might be cases when Anna will need to say - deploy storage to this specific node. Not arguing that we want to have policy based approach, but we need to cover also manual control (forcing node to take some role). - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. I don't think she wants to do that. I think she wants to be told if there is a problem that needs her intervention to solve - e.g. bad IPMI details for a node, or a node not responding when asked to boot via PXE. I think by this user story Liz wanted to capture that Anna wants to see if the deployment process is still being in progress or if it has finished/failed, etc. Which I agree with. I don't think that she will sit and watch what is happening. - As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. Why? Whats her motivation. One plausible one for me is 'a machine needs to be serviced so Anna wants to remove it from the deployment to avoid causing user visible downtime.' So lets say that: Anna needs to be able to take machines out of service so they can be maintained or disposed of. Node being serviced is a different user story for me. I believe we are still 'fighting' here with two approaches and I believe we need both. We can't only provide a way 'give us resources we will do a magic'. Yes this is preferred way - especially for large deployments, but we also need a fallback so that user can say - no, this node doesn't belong to the class, I don't want it there - unassign. Or I need to have this node there - assign. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. Why? This is super generic and could mean anything. I believe this has something to do with 'archived nodes'. But correct me if I am wrong. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 2013/07/12 01:59, Robert Collins wrote: * Creation * Manual registration * hardware specs from Ironic based on mac address (M) Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory stats For registration it is just Management MAC address which is needed right? Or does Ironic need also IP? I think that MAC address might be enough, we can display IP in details of node later on. * IP auto populated from Neutron (F) Do you mean IPMI IP ? I'd say IPMI address managed by Neutron here. +1 * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) Why is this under 'nodes'? I challenge the idea that it should be there. We will need to surface some stuff about nodes, but the underlying idea is to take a cloud approach here - so we're monitoring services, that happen to be on nodes. There is room to monitor nodes, as an undercloud feature set, but lets be very very specific about what is sitting at what layer. We need both - we need to track services but also state of nodes (CPU, RAM, Network bandwidth, etc). So in node detail you should be able to track both. * Management node (where triple-o is installed) This should be plural :) - TripleO isn't a single service to be installed - We've got Tuskar, Ironic, Nova, Glance, Keystone, Neutron, etc. * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes ^ nodes is again confusing layers - nodes are what things are deployed to, but they aren't the entry point Can you, please be a bit more specific here? I don't understand this note. * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types Not by users though. We need to stop thinking of this as 'what we do to nodes' - Nova/Ironic operate on nodes, we operate on Heat templates. Discussed in other threads, but I still believe (and I am not alone) that we need to allow 'force nodes'. * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. It can be auto-allocation. I don't see problem with 'unallocated' term. * defaulted, with no option to change * allow modification (F) * review distribution map (F) * notification when a deployment is ready to go or whenever something changes Is this an (M) ? Might be M but with higher priority. I see it in the middle. But if we have to decide, it can be M. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 12/09/2013 11:56 AM, Jaromir Coufal wrote: On 2013/07/12 01:59, Robert Collins wrote: * Monitoring * assignment, availability, status * capacity, historical statistics (M) Why is this under 'nodes'? I challenge the idea that it should be there. We will need to surface some stuff about nodes, but the underlying idea is to take a cloud approach here - so we're monitoring services, that happen to be on nodes. There is room to monitor nodes, as an undercloud feature set, but lets be very very specific about what is sitting at what layer. We need both - we need to track services but also state of nodes (CPU, RAM, Network bandwidth, etc). So in node detail you should be able to track both. I agree. Monitoring services and monitoring nodes are both valid features for Tuskar. I think splitting it into two separate requirements as Mainn suggested would make a lot of sense. * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types Not by users though. We need to stop thinking of this as 'what we do to nodes' - Nova/Ironic operate on nodes, we operate on Heat templates. Discussed in other threads, but I still believe (and I am not alone) that we need to allow 'force nodes'. Yeah, having both approaches would be nice to have. Instead of using the existing 'force nodes' implementation, wouldn't it be better/cleaner to implement support for it in Nova and Heat? Imre ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Mainn, Thanks for pulling this together. * NODES * Management node (where triple-o is installed) * created as part of undercloud install process I think getting the undercloud installed/deployed should be a requirement for Icehouse. I'm not sure if you meant that or were assuming that it would already be done :). I'd like to see a simpler process than building the seed vm, starting it, deploying undercloud, etc. But, that's something we can work to define if others agree as well. * can create additional management nodes (F) By this, do you mean using the undercloud to scale itself? e.g., using nova on the undercloud to launch an additional undercloud compute node, etc. I like that concept, and don't see any reason why that wouldn't be technically possible. * DEPLOYMENT ACTION * Heat template generated on the fly * hardcoded images * allow image selection (F) So, I think this may be what Robert was getting at, but I think this one should be M or possibly even committed to Icehouse. I think it's very likely we're going to need to update which image is used to do the deployment, e.g., if you build a new image to pick up a security update. IIRC, the image is just referenced by name in the template. So, maybe the process is just: * build the new image * rename/delete the old image * upload the new image with the required name (overcloud-compute, overcloud-control) However, having a nicer image selection process would be nice. -- -- James Slagle -- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Fri, Dec 6, 2013 at 4:55 PM, Matt Wagner matt.wag...@redhat.com wrote: - As an infrastructure administrator, Anna expects that the management node for the deployment services is already up and running and the status of this node is shown in the UI. The 'management node' here is the undercloud node that Anna is interacting with, as I understand it. (Someone correct me if I'm wrong.) So it's not a bad idea to show its status, but I guess the mere fact that she's using it will indicate that it's operational. That's how I read it as well, which assumes that you're using the undercloud to manage itself. FWIW, based on the OpenStack personas I think that Anna would be the one doing the undercloud setup. So, maybe this use case should be: - As an infrastructure administrator, Anna wants to install the undercloud so she can use the UI. That piece is going to be a pretty big part of the entire deployment process, so I think having a use case for it makes sense. Nice work on the use cases Liz, thanks for pulling them together. -- -- James Slagle -- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Dec 9, 2013, at 4:29 AM, Jaromir Coufal jcou...@redhat.com wrote: On 2013/06/12 22:55, Matt Wagner wrote: - As an infrastructure administrator, Anna wants to review the distribution of the nodes that she has assigned before kicking off the Deploy task. What does she expect to see here on the review screen that she didn't see on the previous screens, if anything? Is this just a summation, or is she expecting to see things like which node will get which role? (I'd argue for the former; I don't know that we can predict the latter.) At the beginning, just summation. Later (when we have nova-scheduler reservation) we can get the real distribution of which node is taking which role. Yes, the idea is that Anna wants to see some representation of what the distribution of nodes would be (how many would be assigned to each profile) before kicking off the deploy action. - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. I think there's an implied ...through the UI here, versus tailing log files to watch state. Does she just expect to see states like Pending, Deploying, or Finished, versus, say, having the full logs shown in the UI? (I'd vote 'yes'.) For simplified view - yes, only change of states and progress bar. However log should be available. I'd vote 'yes' as well. These are definitely design decisions we should be making based on what we know of our end user. Although some use cases like troubleshooting might point towards using logs, this one definitely seems like a UI addition. I'll update the use case to be more specific. [1] - As an infrastructure administrator, Anna needs to be able to troubleshoot any errors that may occur during the deployment of nodes process. I'm not sure that the ...through the UI implication I mentioned above extends here. (IMHO) I assume that if things fail, Anna might be okay with us showing a message that $foo failed on $bar, and she should try looking in /var/log/$baz for full details. Does that seem fair? (At least early on.) As said above, for simplified views, it is ok to say $foo failed on $bar, but she should be able to track the problem - logs section in the UI. Yes, this is meant to be through the UI. I've updated the use case. [1] - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. Why does she want to view history of past nodes? Note that I'm not arguing against this; it's just not abundantly clear to me what she'll be using this information for. Does she want a history to check off an Audit log checkbox, or will she be looking to extract certain data from this history? Short answer is Graphs - history of utilization of the class etc. I've updated this one to be more specific about the reasons why historic nodes is important to Anna. [1] Thanks for all of the feedback, Liz [1] https://wiki.openstack.org/wiki/TripleO/Tuskar/IcehouseUserStories -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Dec 9, 2013, at 4:57 AM, Jaromir Coufal jcou...@redhat.com wrote: On 2013/07/12 02:20, Robert Collins wrote: - As an infrastructure administrator, Anna needs to assign a role to each of the necessary nodes in her OpenStack deployment. The nodes could be either controller, compute, networking, or storage resources depending on the needs of this deployment. Definitely not: she needs to deliver a running cloud. Manually saying 'machine X is a compute node' is confusing an implementation with a need. She needs to know that her cloud will have enough capacity to meet her users needs; she needs to know that it will be resilient against a wide set of failures (and this might be a dial with different clouds having different uptime guarantees); she may need to ensure that some specific hardware configuration is used for storage, as a performance optimisation. None of those needs imply assigning roles to machines. Yes, in ideal world and large deployments. But there might be cases when Anna will need to say - deploy storage to this specific node. Not arguing that we want to have policy based approach, but we need to cover also manual control (forcing node to take some role). Perhaps the use case is that Anna would want to define the different capacities that her cloud deployment will need? You both a right though, we don't want to force the user to manually select which nodes will run which services, but we should allow it for cases in which it's needed. I've updated the use case as an attempt to clear this up. [1] - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. I don't think she wants to do that. I think she wants to be told if there is a problem that needs her intervention to solve - e.g. bad IPMI details for a node, or a node not responding when asked to boot via PXE. I think by this user story Liz wanted to capture that Anna wants to see if the deployment process is still being in progress or if it has finished/failed, etc. Which I agree with. I don't think that she will sit and watch what is happening. Yes, definitely. I've updated this use case to reflect reality in that Anna would not sit there and actively monitor, but rather she would want to ultimately make sure that there weren't any errors during the deployment process. [1] - As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. Why? Whats her motivation. One plausible one for me is 'a machine needs to be serviced so Anna wants to remove it from the deployment to avoid causing user visible downtime.' So lets say that: Anna needs to be able to take machines out of service so they can be maintained or disposed of. Node being serviced is a different user story for me. I believe we are still 'fighting' here with two approaches and I believe we need both. We can't only provide a way 'give us resources we will do a magic'. Yes this is preferred way - especially for large deployments, but we also need a fallback so that user can say - no, this node doesn't belong to the class, I don't want it there - unassign. Or I need to have this node there - assign. This is a great question, Robert. I think the reason you bring up for Anna wanting to remove a node is actually more of a Disable node action. This way she could potentially bring it back up after the maintenance is done. I will add some more details to this use case to try to clarify. [1] - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. Why? This is super generic and could mean anything. I believe this has something to do with 'archived nodes'. But correct me if I am wrong. I was assuming it would be incase the user wants to go back to view the history of a certain node. Potentially the user could bring an archived node back online? Although maybe at this point it would just be rediscovered? Thanks, Liz [1] https://wiki.openstack.org/wiki/TripleO/Tuskar/IcehouseUserStories -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 06/12/13 04:31, Tzu-Mainn Chen wrote: Hey all, I've attempted to spin out the requirements behind Jarda's excellent wireframes (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html). Hopefully this can add some perspective on both the wireframes and the needed changes to the tuskar-api. All comments are welcome! Thanks, Tzu-Mainn Chen *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES * Creation * Manual registration * hardware specs from Ironic based on mac address (M) * IP auto populated from Neutron (F) * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) * Management node (where triple-o is installed) * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) * nodes can be viewed by node types * additional group by status, hardware specification * controller node type * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes * Archived nodes (F) * Will be separate openstack service (F) * DEPLOYMENT * multiple deployments allowed (F) * initially just one * deployment specifies a node distribution across node types * node distribution can be updated after creation * deployment configuration, used for initial creation only * defaulted, with no option to change * allow modification (F) * review distribution map (F) * notification when a deployment is ready to go or whenever something changes * DEPLOYMENT ACTION * Heat template generated on the fly * hardcoded images * allow image selection (F) * pre-created template fragments for each node type * node type distribution affects generated template sorry am a bit late to the discussion - fyi: there are two sides to these previous points 1) temp solution using merge.py from tuskar and the tripleo-heat-templates repo. (Icehouse, imo) and 2) doing it 'properly' with the merge functionality pushed into heat. (F, imo). For 1) various bits are in play: fyi/if interested: /#/c/56947/ (Make merge.py invokable), /#/c/58823/ (Make merge.py installable) and /#/c/52045/ (WIP : sketch of what using merge.py looks like for tuskar) this last one needs updating and thought. Also /#/c/58229/ and /#/c/57210/ which need some more thought, * nova scheduler allocates nodes * filters based on resource class and node profile information (M) * Deployment action can create or update * status indicator to determine overall state of deployment * status indicator for nodes as well * status includes 'time left' (F) * NETWORKS (F) * IMAGES (F) * LOGS (F) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Dec 9, 2013, at 8:58 AM, James Slagle james.sla...@gmail.com wrote: On Fri, Dec 6, 2013 at 4:55 PM, Matt Wagner matt.wag...@redhat.com wrote: - As an infrastructure administrator, Anna expects that the management node for the deployment services is already up and running and the status of this node is shown in the UI. The 'management node' here is the undercloud node that Anna is interacting with, as I understand it. (Someone correct me if I'm wrong.) So it's not a bad idea to show its status, but I guess the mere fact that she's using it will indicate that it's operational. That's how I read it as well, which assumes that you're using the undercloud to manage itself. FWIW, based on the OpenStack personas I think that Anna would be the one doing the undercloud setup. So, maybe this use case should be: - As an infrastructure administrator, Anna wants to install the undercloud so she can use the UI. That piece is going to be a pretty big part of the entire deployment process, so I think having a use case for it makes sense. +1. I've added this as the very first use case. Nice work on the use cases Liz, thanks for pulling them together. Thanks to all for the great discussion on these use cases. The questions/comments that they've generated is exactly what I was hoping for. I will continue to make updates and refine these[1] based on discussions. Of course, feel free to add to/change these yourself as well. Liz [1] https://wiki.openstack.org/wiki/TripleO/Tuskar/IcehouseUserStories -- -- James Slagle -- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Dec 6, 2013, at 8:20 PM, Robert Collins robe...@robertcollins.net wrote: On 7 December 2013 09:31, Liz Blanchard lsure...@redhat.com wrote: This list is great, thanks very much for taking the time to write this up! I think a big part of the User Experience design is to take a step back and understand the requirements from an end user's point of view…what would they want to accomplish by using this UI? This might influence the design in certain ways, so I've taken a cut at a set of user stories for the Icehouse timeframe based on these requirements that I hope will be useful during discussions. Based on the OpenStack Personas[1], I think that Anna would be the main consumer of the TripleO UI, but please let me know if you think otherwise. - As an infrastructure administrator, Anna needs to deploy or update a set of resources that will run OpenStack (This isn't a very specific use case, but more of the larger end goal of Anna coming into the UI.) - As an infrastructure administrator, Anna expects that the management node for the deployment services is already up and running and the status of this node is shown in the UI. - As an infrastructure administrator, Anna wants to be able to quickly see the set of unallocated nodes that she could use for her deployment of OpenStack. Ideally, she would not have to manually tell the system about these nodes. If she needs to manually register nodes for whatever reason, Anna would only want to have to define the essential data needed to register these nodes. I want to challenge this one. There are two concerns conflated. A) seeing available resources for scaling up her cloud. B) minimising effort to enroll additional resources. B) is a no-brainer. For A) though, as phrased, we're talking about seeing a set of individual items: but actually, wouldn't aggregated capacity being more useful, with optional drill down - '400 cores, 2TB RAM, 1PB of disk' Good point. I will update this to read that the user wants to see the available capacity and have the option to drill in further. [1] - As an infrastructure administrator, Anna needs to assign a role to each of the necessary nodes in her OpenStack deployment. The nodes could be either controller, compute, networking, or storage resources depending on the needs of this deployment. Definitely not: she needs to deliver a running cloud. Manually saying 'machine X is a compute node' is confusing an implementation with a need. She needs to know that her cloud will have enough capacity to meet her users needs; she needs to know that it will be resilient against a wide set of failures (and this might be a dial with different clouds having different uptime guarantees); she may need to ensure that some specific hardware configuration is used for storage, as a performance optimisation. None of those needs imply assigning roles to machines. - As an infrastructure administrator, Anna wants to review the distribution of the nodes that she has assigned before kicking off the Deploy task. If by distribution you mean the top level stats (15 control nodes, 200 hypervisors, etc) - then I agree. If you mean 'node X will be a hypervisor' - I thoroughly disagree. What does that do for her? We are in agreement, I'd expect the former. I've updated the use case to be more specific. [1] - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. I don't think she wants to do that. I think she wants to be told if there is a problem that needs her intervention to solve - e.g. bad IPMI details for a node, or a node not responding when asked to boot via PXE. - As an infrastructure administrator, Anna needs to be able to troubleshoot any errors that may occur during the deployment of nodes process. Definitely. - As an infrastructure administrator, Anna wants to monitor the availability and status of each node in her deployment. Yes, with the caveat that I think instance is the key thing here for now; there is a lifecycle aspect where being able to say 'machine X is having persistent network issues' is very important, as a long term thing we should totally aim at that. - As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. Why? Whats her motivation. One plausible one for me is 'a machine needs to be serviced so Anna wants to remove it from the deployment to avoid causing user visible downtime.' So lets say that: Anna needs to be able to take machines out of service so they can be maintained or disposed of. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. Why? This is super generic and could mean anything. - As an infrastructure administrator, Anna needs to be notified of any important changes to nodes that are in the OpenStack
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Disclaimer: I'm very new to the project, so apologies if some of my questions have been already answered or flat out don't make sense. As I proofread, some of my comments may drift a bit past basic requirements, so feel free to tell me to take certain questions out of this thread into specific discussion threads if I'm getting too detailed. *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES * Creation * Manual registration * hardware specs from Ironic based on mac address (M) * IP auto populated from Neutron (F) * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) * Management node (where triple-o is installed) * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types It's pretty clear by the current verbiage but I'm going to ask anyway: one and only one? * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) My gut reaction is that we want to bite this off sooner rather than later. This will have data model and API implications that, even if we don't commit to it for Icehouse, should still be in our minds during it, so it might make sense to make it a first class thing to just nail down now. * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) To my understanding, once this is in Icehouse, we'll have to support upgrades. If this filtering is pushed off, could we get into a situation where an allocation created in Icehouse would no longer be valid in Icehouse+1 once these filters are in place? If so, we might want to make it more of a priority to get them in place earlier and not eat the headache of addressing these sorts of integrity issues later. * nodes can be viewed by node types * additional group by status, hardware specification * controller node type * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes Is there more still being flushed out here? Things like: * Listing unallocated nodes * Unallocating a previously allocated node (does this make it a vanilla resource or does it retain the resource type? is this the only way to change a node's resource type?) * Unregistering nodes from Tuskar's inventory (I put this under unallocated under the assumption that the workflow will be an explicit unallocate before unregister; I'm not sure if this is the same as archive below). * Archived nodes (F) Can you elaborate a bit more on what this is? * Will be separate openstack service (F) * DEPLOYMENT * multiple deployments allowed (F) * initially just one * deployment specifies a node distribution across node types * node distribution can be updated after creation * deployment configuration, used for initial creation only * defaulted, with no option to change * allow modification (F) * review distribution map (F) * notification when a deployment is ready to go or whenever something changes * DEPLOYMENT ACTION * Heat template generated on the fly * hardcoded images * allow image selection (F) * pre-created template fragments for each node type * node type distribution affects generated template * nova scheduler allocates nodes * filters based on resource class and node profile information (M) * Deployment action can create or update * status indicator to determine overall state of deployment * status indicator for nodes as well * status includes 'time left' (F) * NETWORKS (F) * IMAGES (F) * LOGS (F) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Thanks for the comments! Responses inline: Disclaimer: I'm very new to the project, so apologies if some of my questions have been already answered or flat out don't make sense. As I proofread, some of my comments may drift a bit past basic requirements, so feel free to tell me to take certain questions out of this thread into specific discussion threads if I'm getting too detailed. *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES * Creation * Manual registration * hardware specs from Ironic based on mac address (M) * IP auto populated from Neutron (F) * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) * Management node (where triple-o is installed) * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types It's pretty clear by the current verbiage but I'm going to ask anyway: one and only one? Yep, that's right! * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) My gut reaction is that we want to bite this off sooner rather than later. This will have data model and API implications that, even if we don't commit to it for Icehouse, should still be in our minds during it, so it might make sense to make it a first class thing to just nail down now. That is entirely correct, which is one reason it's on the list of requirements. The forthcoming API design will have to account for it. Not recreating the entire data model between releases is a key goal :) * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) To my understanding, once this is in Icehouse, we'll have to support upgrades. If this filtering is pushed off, could we get into a situation where an allocation created in Icehouse would no longer be valid in Icehouse+1 once these filters are in place? If so, we might want to make it more of a priority to get them in place earlier and not eat the headache of addressing these sorts of integrity issues later. That's true. The problem is that to my understanding, the filters we'd need in nova-scheduler are not yet fully in place. I also think that this is an issue that we'll need to address no matter what. Even once filters exist, if a user applies a filter *after* nodes are allocated, we'll need to do something clever if the already-allocated nodes don't meet the filter criteria. * nodes can be viewed by node types * additional group by status, hardware specification * controller node type * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes Is there more still being flushed out here? Things like: * Listing unallocated nodes * Unallocating a previously allocated node (does this make it a vanilla resource or does it retain the resource type? is this the only way to change a node's resource type?) * Unregistering nodes from Tuskar's inventory (I put this under unallocated under the assumption that the workflow will be an explicit unallocate before unregister; I'm not sure if this is the same as archive below). Ah, you're entirely right. I'll add these to the list. * Archived nodes (F) Can you elaborate a bit more on what this is? To be honest, I'm a bit fuzzy about this myself; Jarda mentioned that there was an OpenStack service in the process of being planned that would handle this requirement. Jarda, can you detail a bit? Thanks again for the comments! Mainn ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On Dec 5, 2013, at 9:31 PM, Tzu-Mainn Chen tzuma...@redhat.com wrote: Hey all, I've attempted to spin out the requirements behind Jarda's excellent wireframes (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html). Hopefully this can add some perspective on both the wireframes and the needed changes to the tuskar-api. This list is great, thanks very much for taking the time to write this up! I think a big part of the User Experience design is to take a step back and understand the requirements from an end user's point of view…what would they want to accomplish by using this UI? This might influence the design in certain ways, so I've taken a cut at a set of user stories for the Icehouse timeframe based on these requirements that I hope will be useful during discussions. Based on the OpenStack Personas[1], I think that Anna would be the main consumer of the TripleO UI, but please let me know if you think otherwise. - As an infrastructure administrator, Anna needs to deploy or update a set of resources that will run OpenStack (This isn't a very specific use case, but more of the larger end goal of Anna coming into the UI.) - As an infrastructure administrator, Anna expects that the management node for the deployment services is already up and running and the status of this node is shown in the UI. - As an infrastructure administrator, Anna wants to be able to quickly see the set of unallocated nodes that she could use for her deployment of OpenStack. Ideally, she would not have to manually tell the system about these nodes. If she needs to manually register nodes for whatever reason, Anna would only want to have to define the essential data needed to register these nodes. - As an infrastructure administrator, Anna needs to assign a role to each of the necessary nodes in her OpenStack deployment. The nodes could be either controller, compute, networking, or storage resources depending on the needs of this deployment. - As an infrastructure administrator, Anna wants to review the distribution of the nodes that she has assigned before kicking off the Deploy task. - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. - As an infrastructure administrator, Anna needs to be able to troubleshoot any errors that may occur during the deployment of nodes process. - As an infrastructure administrator, Anna wants to monitor the availability and status of each node in her deployment. - As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. - As an infrastructure administrator, Anna needs to be notified of any important changes to nodes that are in the OpenStack deployment. She does not want to be spammed with non-important notifications. Please feel free to comment, change, or add to this list. [1]https://docs.google.com/document/d/16rkiXWxxgzGT47_Wc6hzIPzO2-s2JWAPEKD0gP2mt7E/edit?pli=1# Thanks, Liz All comments are welcome! Thanks, Tzu-Mainn Chen *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES * Creation * Manual registration * hardware specs from Ironic based on mac address (M) * IP auto populated from Neutron (F) * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) * Management node (where triple-o is installed) * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) * nodes can be viewed by node types * additional group by status, hardware specification * controller node type * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes * Archived nodes (F) * Will be separate openstack service (F) * DEPLOYMENT
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
That looks really good, thanks for putting that together! I'm going to put together a wiki page that consolidates the various Tuskar planning documents - requirements, user stories, wireframes, etc - so it's easier to see the whole planning picture. Mainn - Original Message - On Dec 5, 2013, at 9:31 PM, Tzu-Mainn Chen tzuma...@redhat.com wrote: Hey all, I've attempted to spin out the requirements behind Jarda's excellent wireframes (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html). Hopefully this can add some perspective on both the wireframes and the needed changes to the tuskar-api. This list is great, thanks very much for taking the time to write this up! I think a big part of the User Experience design is to take a step back and understand the requirements from an end user's point of view…what would they want to accomplish by using this UI? This might influence the design in certain ways, so I've taken a cut at a set of user stories for the Icehouse timeframe based on these requirements that I hope will be useful during discussions. Based on the OpenStack Personas[1], I think that Anna would be the main consumer of the TripleO UI, but please let me know if you think otherwise. - As an infrastructure administrator, Anna needs to deploy or update a set of resources that will run OpenStack (This isn't a very specific use case, but more of the larger end goal of Anna coming into the UI.) - As an infrastructure administrator, Anna expects that the management node for the deployment services is already up and running and the status of this node is shown in the UI. - As an infrastructure administrator, Anna wants to be able to quickly see the set of unallocated nodes that she could use for her deployment of OpenStack. Ideally, she would not have to manually tell the system about these nodes. If she needs to manually register nodes for whatever reason, Anna would only want to have to define the essential data needed to register these nodes. - As an infrastructure administrator, Anna needs to assign a role to each of the necessary nodes in her OpenStack deployment. The nodes could be either controller, compute, networking, or storage resources depending on the needs of this deployment. - As an infrastructure administrator, Anna wants to review the distribution of the nodes that she has assigned before kicking off the Deploy task. - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. - As an infrastructure administrator, Anna needs to be able to troubleshoot any errors that may occur during the deployment of nodes process. - As an infrastructure administrator, Anna wants to monitor the availability and status of each node in her deployment. - As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. - As an infrastructure administrator, Anna needs to be notified of any important changes to nodes that are in the OpenStack deployment. She does not want to be spammed with non-important notifications. Please feel free to comment, change, or add to this list. [1]https://docs.google.com/document/d/16rkiXWxxgzGT47_Wc6hzIPzO2-s2JWAPEKD0gP2mt7E/edit?pli=1# Thanks, Liz All comments are welcome! Thanks, Tzu-Mainn Chen *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES * Creation * Manual registration * hardware specs from Ironic based on mac address (M) * IP auto populated from Neutron (F) * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) * Management node (where triple-o is installed) * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) * nodes can be viewed by node types * additional
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
The relevant wiki page is here: https://wiki.openstack.org/wiki/TripleO/Tuskar#Icehouse_Planning - Original Message - That looks really good, thanks for putting that together! I'm going to put together a wiki page that consolidates the various Tuskar planning documents - requirements, user stories, wireframes, etc - so it's easier to see the whole planning picture. Mainn - Original Message - On Dec 5, 2013, at 9:31 PM, Tzu-Mainn Chen tzuma...@redhat.com wrote: Hey all, I've attempted to spin out the requirements behind Jarda's excellent wireframes (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html). Hopefully this can add some perspective on both the wireframes and the needed changes to the tuskar-api. This list is great, thanks very much for taking the time to write this up! I think a big part of the User Experience design is to take a step back and understand the requirements from an end user's point of view…what would they want to accomplish by using this UI? This might influence the design in certain ways, so I've taken a cut at a set of user stories for the Icehouse timeframe based on these requirements that I hope will be useful during discussions. Based on the OpenStack Personas[1], I think that Anna would be the main consumer of the TripleO UI, but please let me know if you think otherwise. - As an infrastructure administrator, Anna needs to deploy or update a set of resources that will run OpenStack (This isn't a very specific use case, but more of the larger end goal of Anna coming into the UI.) - As an infrastructure administrator, Anna expects that the management node for the deployment services is already up and running and the status of this node is shown in the UI. - As an infrastructure administrator, Anna wants to be able to quickly see the set of unallocated nodes that she could use for her deployment of OpenStack. Ideally, she would not have to manually tell the system about these nodes. If she needs to manually register nodes for whatever reason, Anna would only want to have to define the essential data needed to register these nodes. - As an infrastructure administrator, Anna needs to assign a role to each of the necessary nodes in her OpenStack deployment. The nodes could be either controller, compute, networking, or storage resources depending on the needs of this deployment. - As an infrastructure administrator, Anna wants to review the distribution of the nodes that she has assigned before kicking off the Deploy task. - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. - As an infrastructure administrator, Anna needs to be able to troubleshoot any errors that may occur during the deployment of nodes process. - As an infrastructure administrator, Anna wants to monitor the availability and status of each node in her deployment. - As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. - As an infrastructure administrator, Anna needs to be notified of any important changes to nodes that are in the OpenStack deployment. She does not want to be spammed with non-important notifications. Please feel free to comment, change, or add to this list. [1]https://docs.google.com/document/d/16rkiXWxxgzGT47_Wc6hzIPzO2-s2JWAPEKD0gP2mt7E/edit?pli=1# Thanks, Liz All comments are welcome! Thanks, Tzu-Mainn Chen *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES * Creation * Manual registration * hardware specs from Ironic based on mac address (M) * IP auto populated from Neutron (F) * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) * Management node (where triple-o is installed) * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 7 December 2013 09:31, Liz Blanchard lsure...@redhat.com wrote: This list is great, thanks very much for taking the time to write this up! I think a big part of the User Experience design is to take a step back and understand the requirements from an end user's point of view…what would they want to accomplish by using this UI? This might influence the design in certain ways, so I've taken a cut at a set of user stories for the Icehouse timeframe based on these requirements that I hope will be useful during discussions. Based on the OpenStack Personas[1], I think that Anna would be the main consumer of the TripleO UI, but please let me know if you think otherwise. - As an infrastructure administrator, Anna needs to deploy or update a set of resources that will run OpenStack (This isn't a very specific use case, but more of the larger end goal of Anna coming into the UI.) - As an infrastructure administrator, Anna expects that the management node for the deployment services is already up and running and the status of this node is shown in the UI. - As an infrastructure administrator, Anna wants to be able to quickly see the set of unallocated nodes that she could use for her deployment of OpenStack. Ideally, she would not have to manually tell the system about these nodes. If she needs to manually register nodes for whatever reason, Anna would only want to have to define the essential data needed to register these nodes. I want to challenge this one. There are two concerns conflated. A) seeing available resources for scaling up her cloud. B) minimising effort to enroll additional resources. B) is a no-brainer. For A) though, as phrased, we're talking about seeing a set of individual items: but actually, wouldn't aggregated capacity being more useful, with optional drill down - '400 cores, 2TB RAM, 1PB of disk' - As an infrastructure administrator, Anna needs to assign a role to each of the necessary nodes in her OpenStack deployment. The nodes could be either controller, compute, networking, or storage resources depending on the needs of this deployment. Definitely not: she needs to deliver a running cloud. Manually saying 'machine X is a compute node' is confusing an implementation with a need. She needs to know that her cloud will have enough capacity to meet her users needs; she needs to know that it will be resilient against a wide set of failures (and this might be a dial with different clouds having different uptime guarantees); she may need to ensure that some specific hardware configuration is used for storage, as a performance optimisation. None of those needs imply assigning roles to machines. - As an infrastructure administrator, Anna wants to review the distribution of the nodes that she has assigned before kicking off the Deploy task. If by distribution you mean the top level stats (15 control nodes, 200 hypervisors, etc) - then I agree. If you mean 'node X will be a hypervisor' - I thoroughly disagree. What does that do for her? - As an infrastructure administrator, Anna wants to monitor the deployment process of all of the nodes that she has assigned. I don't think she wants to do that. I think she wants to be told if there is a problem that needs her intervention to solve - e.g. bad IPMI details for a node, or a node not responding when asked to boot via PXE. - As an infrastructure administrator, Anna needs to be able to troubleshoot any errors that may occur during the deployment of nodes process. Definitely. - As an infrastructure administrator, Anna wants to monitor the availability and status of each node in her deployment. Yes, with the caveat that I think instance is the key thing here for now; there is a lifecycle aspect where being able to say 'machine X is having persistent network issues' is very important, as a long term thing we should totally aim at that. - As an infrastructure administrator, Anna wants to be able to unallocate a node from a deployment. Why? Whats her motivation. One plausible one for me is 'a machine needs to be serviced so Anna wants to remove it from the deployment to avoid causing user visible downtime.' So lets say that: Anna needs to be able to take machines out of service so they can be maintained or disposed of. - As an infrastructure administrator, Anna wants to be able to view the history of nodes that have been in a deployment. Why? This is super generic and could mean anything. - As an infrastructure administrator, Anna needs to be notified of any important changes to nodes that are in the OpenStack deployment. She does not want to be spammed with non-important notifications. What sort of changes do you mean here? Thanks for putting this together, I love Personas as a way to make designs concrete and connected to user needs. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 7 December 2013 10:55, Matt Wagner matt.wag...@redhat.com wrote: The 'management node' here is the undercloud node that Anna is interacting with, as I understand it. (Someone correct me if I'm wrong.) So it's not a bad idea to show its status, but I guess the mere fact that she's using it will indicate that it's operational. There are potentially many such nodes, and Anna will be interacting with some of them; I don't think we can make too many assumptions about what the UI working implies. - As an infrastructure administrator, Anna needs to be able to troubleshoot any errors that may occur during the deployment of nodes process. I'm not sure that the ...through the UI implication I mentioned above extends here. (IMHO) I assume that if things fail, Anna might be okay with us showing a message that $foo failed on $bar, and she should try looking in /var/log/$baz for full details. Does that seem fair? (At least early on.) I don't think we necessarily need to do anything here other than make sure the system is a) well documented and b) Anna has all the normal sysadmin access to the infrastructure. Her needs can be met by us getting out of the way gracefully; at least in the short term. -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Thanks for the comments and questions! I fully expect that this list of requirements will need to be fleshed out, refined, and heavily modified, so the more the merrier. Comments inline: *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES Note that everything in this section should be Ironic API calls. * Creation * Manual registration * hardware specs from Ironic based on mac address (M) Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory stats * IP auto populated from Neutron (F) Do you mean IPMI IP ? I'd say IPMI address managed by Neutron here. * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) Why is this under 'nodes'? I challenge the idea that it should be there. We will need to surface some stuff about nodes, but the underlying idea is to take a cloud approach here - so we're monitoring services, that happen to be on nodes. There is room to monitor nodes, as an undercloud feature set, but lets be very very specific about what is sitting at what layer. That's a fair point. At the same time, the UI does want to monitor both services and the nodes that the services are running on, correct? I would think that a user would want this. Would it be better to explicitly split this up into two separate requirements? * Management node (where triple-o is installed) This should be plural :) - TripleO isn't a single service to be installed - We've got Tuskar, Ironic, Nova, Glance, Keystone, Neutron, etc. I misspoke here - this should be where the undercloud is installed. My current understanding is that our initial release will only support the undercloud being installed onto a single node, but my understanding could very well be flawed. * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes ^ nodes is again confusing layers - nodes are what things are deployed to, but they aren't the entry point * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types Not by users though. We need to stop thinking of this as 'what we do to nodes' - Nova/Ironic operate on nodes, we operate on Heat templates. Right, I didn't mean to imply that users would be doing this allocation. But once Nova does this allocation, the UI does want to be aware of how the allocation is done, right? That's what this requirement meant. * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) Whats a node type? Compute/controller/object storage/block storage. Is another term besides node type more accurate? * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) I'm not clear on this - you can list the nodes that have had a particular thing deployed on them; we probably can get a good answer to being able to see what nodes a particular flavor can deploy to, but we don't want to be second guessing the scheduler.. Correct; the goal here is to provide a way through the UI to send additional filtering requirements that will eventually be passed into the scheduler, allowing the scheduler to apply additional filters. * nodes can be viewed by node types * additional group by status, hardware specification *Instances* - e.g. hypervisors, storage, block storage etc. * controller node type Again, need to get away from node type here. * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes This implies an 'allocation' step, that we don't have - how about 'Idle nodes' or something. Is it imprecise to say that nodes are allocated by the scheduler? Would something like 'active/idle' be better? * Archived nodes (F) * Will be separate openstack service (F) * DEPLOYMENT * multiple deployments allowed (F) * initially just one * deployment specifies a node distribution across node types I can't parse this. Deployments specify how many instances to deploy in what roles (e.g. 2 control, 2 storage, 4 block storage,
[openstack-dev] [TripleO][Tuskar] Icehouse Requirements
Hey all, I've attempted to spin out the requirements behind Jarda's excellent wireframes (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html). Hopefully this can add some perspective on both the wireframes and the needed changes to the tuskar-api. All comments are welcome! Thanks, Tzu-Mainn Chen *** Requirements are assumed to be targeted for Icehouse, unless marked otherwise: (M) - Maybe Icehouse, dependency on other in-development features (F) - Future requirement, after Icehouse * NODES * Creation * Manual registration * hardware specs from Ironic based on mac address (M) * IP auto populated from Neutron (F) * Auto-discovery during undercloud install process (M) * Monitoring * assignment, availability, status * capacity, historical statistics (M) * Management node (where triple-o is installed) * created as part of undercloud install process * can create additional management nodes (F) * Resource nodes * searchable by status, name, cpu, memory, and all attributes from ironic * can be allocated as one of four node types * compute * controller * object storage * block storage * Resource class - allows for further categorization of a node type * each node type specifies a single default resource class * allow multiple resource classes per node type (M) * optional node profile for a resource class (M) * acts as filter for nodes that can be allocated to that class (M) * nodes can be viewed by node types * additional group by status, hardware specification * controller node type * each controller node will run all openstack services * allow each node to run specified service (F) * breakdown by workload (percentage of cpu used per node) (M) * Unallocated nodes * Archived nodes (F) * Will be separate openstack service (F) * DEPLOYMENT * multiple deployments allowed (F) * initially just one * deployment specifies a node distribution across node types * node distribution can be updated after creation * deployment configuration, used for initial creation only * defaulted, with no option to change * allow modification (F) * review distribution map (F) * notification when a deployment is ready to go or whenever something changes * DEPLOYMENT ACTION * Heat template generated on the fly * hardcoded images * allow image selection (F) * pre-created template fragments for each node type * node type distribution affects generated template * nova scheduler allocates nodes * filters based on resource class and node profile information (M) * Deployment action can create or update * status indicator to determine overall state of deployment * status indicator for nodes as well * status includes 'time left' (F) * NETWORKS (F) * IMAGES (F) * LOGS (F) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev