Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 13/12/13 09:41 -0500, Jay Dobies wrote: * ability to 'preview' changes going to the scheduler What does this give you? How detailed a preview do you need? What information is critical there? Have you seen the proposed designs for a heat template preview feature - would that be sufficient? Will will probably have a better answer to this, but I feel like at very least this goes back to the psychology point raised earlier (I think in this thread, but if not, definitely one of the TripleO ones). A weird parallel is whenever I do a new install of Fedora. I never accept their default disk partitioning without electing to review/modify it. Even if I didn't expect to change anything, I want to see what they are going to give me. And then I compulsively review the summary of what actual changes will be applied in the follow up screen that's displayed after I say I'm happy with the layout. Perhaps that's more a commentary on my own OCD and cynicism that I feel dirty accepting the magic defaults blindly. I love the idea of anaconda doing the heavy lifting of figuring out sane defaults for home/root/swap and so on (similarly, I love the idea of Nova scheduler rationing out where instances are deployed), but I at least want to know I've seen it before it happens. I fully admit to not knowing how common that sort of thing is. I suspect I'm in the majority of geeks and tame by sys admin standards, but I honestly don't know. So I acknowledge that my entire argument for the preview here is based on my own personality. Jay, I mirror your sentiments exactly here, the Fedora example is a good one and is moreso the case when it comes to node allocation/details and proposed changes in a deployment scenario. Though 9/10 times the defaults Nova scheduler will choose will be fine but there's a 'human' need to review them, changing as necessary. -will pgpt6jWvlbElR.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 13/12/13 19:06 +1300, Robert Collins wrote: On 13 December 2013 06:24, Will Foster wrote: I just wanted to add a few thoughts: Thank you! For some comparative information here "from the field" I work extensively on deployments of large OpenStack implementations, most recently with a ~220node/9rack deployment (scaling up to 42racks / 1024 nodes soon). My primary role is of a Devops/Sysadmin nature, and not a specific development area so rapid provisioning/tooling/automation is an area I almost exclusively work within (mostly using API-driven using Foreman/Puppet). The infrastructure our small team designs/builds supports our development and business. I am the target user base you'd probably want to cater to. Absolutely! I can tell you the philosophy and mechanics of Tuskar/OOO are great, something I'd love to start using extensively but there are some needed aspects in the areas of control that I feel should be added (though arguably less for me and more for my ilk who are looking to expand their OpenStack footprint). * ability to 'preview' changes going to the scheduler What does this give you? How detailed a preview do you need? What information is critical there? Have you seen the proposed designs for a heat template preview feature - would that be sufficient? Thanks for the reply. Preview-wise it'd be useful to see node allocation prior to deployment - nothing too in-depth. I have not seen the heat template preview features, are you referring to the YAML templating[1] or something else[2]? I'd like to learn more. [1] - http://docs.openstack.org/developer/heat/template_guide/hot_guide.html [2] - https://github.com/openstack/heat-templates * ability to override/change some aspects within node assignment What would this be used to do? How often do those situations turn up? Whats the impact if you can't do that? One scenario might be that autodiscovery does not pick up an available node in your pool of resources, or detects incorrectly - you could manually change things as you like it. Another (more common) scenario is that you don't have an isolated, flat network with which to deploy and nodes are picked that you do not want included in the provisioning - you could remove those from the set of resources prior to launching overcloud creation. The impact would be that the tooling would seem inflexible to those lacking a thoughtfully prepared network/infrastructure, or more commonly in cases where the existing network design is too inflexible the usefulness and quick/seamless provisioning benefits would fall short. * ability to view at least minimal logging from within Tuskar UI Logging of what - the deployment engine? The heat event-log? Nova undercloud logs? Logs from the deployed instances? If it's not there in V1, but you can get, or already have credentials for the [instances that hold the logs that you wanted] would that be a big adoption blocker, or just a nuisance? Logging of the deployment engine status during the bootstrapping process initially, and some rudimentary node success/failure indication. It should be simplistic enough to not rival existing monitoring/log systems but at least provide deployment logs as the overcloud is being built and a general node/health 'check-in' that it's complete. Afterwards as you mentioned the logs are available on the deployed systems. Think of it as providing some basic written navigational signs for people crossing a small bridge before they get to the highway, there's continuity from start -> finish and a clear sense of what's occurring. From my perspective, absence of this type of verbosity may impede adoption of new users (who are used to this type of information with deployment tooling). Here's the main reason - most new adopters of OpenStack/IaaS are going to be running legacy/mixed hardware and while they might have an initiative to explore and invest and even a decent budget most of them are not going to have completely identical hardware, isolated/flat networks and things set aside in such a way that blind auto-discovery/deployment will just work all the time. Thats great information (and something I reasonably well expected, to a degree). We have a hard dependency on no wildcard DHCP servers in the environment (or we can't deploy). Autodiscovery is something we don't have yet, but certainly debugging deployment failures is a very important use case and one we need to improve both at the plumbing layer and in the stories around it in the UI. There will be a need to sometimes adjust, and those coming from a more vertically-scaling infrastructure (most large orgs.) will not have 100% matching standards in place of vendor, machine spec and network design which may make Tuscar/OOO seem inflexible and 'one-way'. This may just be a carry-over or fear of the old ways of deployment but
Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements
On 12/12/13 09:42 +1300, Robert Collins wrote: On 12 December 2013 01:17, Jaromir Coufal wrote: On 2013/10/12 23:09, Robert Collins wrote: The 'easiest' way is to support bigger companies with huge deployments, tailored infrastructure, everything connected properly. But there are tons of companies/users who are running on old heterogeneous hardware. Very likely even more than the number of companies having already mentioned large deployments. And giving them only the way of 'setting up rules' in order to get the service on the node - this type of user is not gonna use our deployment system. Thats speculation. We don't know if they will or will not because we haven't given them a working system to test. Some part of that is speculation, some part of that is feedback from people who are doing deployments (of course its just very limited audience). Anyway, it is not just pure theory. Sure. Let be me more precise. There is a hypothesis that lack of direct control will be a significant adoption blocker for a primary group of users. I think it's safe to say that some users in the group 'sysadmins having to deploy an OpenStack cloud' will find it a bridge too far and not use a system without direct control. Call this group A. I think it's also safe to say that some users will not care in the slightest, because their deployment is too small for them to be particularly worried (e.g. about occasional downtime (but they would worry a lot about data loss)). Call this group B. I suspect we don't need to consider group C - folk who won't use a system if it *has* manual control, but thats only a suspicion. It may be that the side effect of adding direct control is to reduce usability below the threshold some folk need... To assess 'significant adoption blocker' we basically need to find the % of users who will care sufficiently that they don't use TripleO. How can we do that? We can do questionnaires, and get such folk to come talk with use, but that suffers from selection bias - group B can use the system with or without direct manual control, so have little motivation to argue vigorously in any particular direction. Group A however have to argue because they won't use the system at all without that feature, and they may want to use the system for other reasons, so that because a crucial aspect for them. A much better way IMO is to test it - to get a bunch of volunteers and see who responds positively to a demo *without* direct manual control. To do that we need a demoable thing, which might just be mockups that show a set of workflows (and include things like Jay's shiny-new-hardware use case in the demo). I rather suspect we're building that anyway as part of doing UX work, so maybe what we do is put a tweet or blog post up asking for sysadmins who a) have not yet deployed openstack, b) want to, and c) are willing to spend 20-30 minutes with us, walk them through a demo showing no manual control, and record what questions they ask, and whether they would like to have that product to us, and if not, then (a) what use cases they can't address with the mockups and (b) what other reasons they have for not using it. This is a bunch of work though! So, do we need to do that work? *If* we can layer manual control on later, then we could defer this testing until we are at the point where we can say 'the nova scheduled version is ready, now lets decide if we add the manual control'. OTOH, if we *cannot* layer manual control on later - if it has tentacles through too much of the code base, then we need to decide earlier, because it will be significantly harder to add later and that may be too late of a ship date for vendors shipping on top of TripleO. So with that as a prelude, my technical sense is that we can layer manual scheduling on later: we provide an advanced screen, show the list of N instances we're going to ask for and allow each instance to be directly customised with a node id selected from either the current node it's running on or an available node. It's significant work both UI and plumbing, but it's not going to be made harder by the other work we're doing AFAICT. -> My proposal is that we shelve this discussion until we have the nova/heat scheduled version in 'and now we polish' mode, and then pick it back up and assess user needs. An alternative argument is to say that group A is a majority of the userbase and that doing an automatic version is entirely unnecessary. Thats also possible, but I'm extremely skeptical, given the huge cost of staff time, and the complete lack of interest my sysadmin friends (and my former sysadmin self) have in doing automatable things by hand. I just wanted to add a few thoughts: For some comparative information here "from the field" I work extensively on deployments of large OpenStack implementations, most recently with a ~220node/9rack deployment (scaling up to 42racks / 1024 nodes soon). My primary role is of a Devops/Sysadmin nature, and no