Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-18 Thread Will Foster

On 13/12/13 09:41 -0500, Jay Dobies wrote:

* ability to 'preview' changes going to the scheduler


What does this give you? How detailed a preview do you need? What
information is critical there? Have you seen the proposed designs for
a heat template preview feature - would that be sufficient?


Will will probably have a better answer to this, but I feel like at 
very least this goes back to the psychology point raised earlier (I 
think in this thread, but if not, definitely one of the TripleO ones).


A weird parallel is whenever I do a new install of Fedora. I never 
accept their default disk partitioning without electing to 
review/modify it. Even if I didn't expect to change anything, I want 
to see what they are going to give me. And then I compulsively review 
the summary of what actual changes will be applied in the follow up 
screen that's displayed after I say I'm happy with the layout.


Perhaps that's more a commentary on my own OCD and cynicism that I 
feel dirty accepting the magic defaults blindly. I love the idea of 
anaconda doing the heavy lifting of figuring out sane defaults for 
home/root/swap and so on (similarly, I love the idea of Nova scheduler 
rationing out where instances are deployed), but I at least want to 
know I've seen it before it happens.


I fully admit to not knowing how common that sort of thing is. I 
suspect I'm in the majority of geeks and tame by sys admin standards, 
but I honestly don't know. So I acknowledge that my entire argument 
for the preview here is based on my own personality.




Jay,

I mirror your sentiments exactly here, the Fedora example is a good
one and is moreso the case when it comes to node allocation/details
and proposed changes in a deployment scenario.  Though 9/10 times the
defaults Nova scheduler will choose will be fine but there's a 'human'
need to review them, changing as necessary.

-will



pgpt6jWvlbElR.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-16 Thread Will Foster

On 13/12/13 19:06 +1300, Robert Collins wrote:

On 13 December 2013 06:24, Will Foster  wrote:


I just wanted to add a few thoughts:


Thank you!


For some comparative information here "from the field" I work
extensively on deployments of large OpenStack implementations,
most recently with a ~220node/9rack deployment (scaling up to 42racks / 1024
nodes soon).  My primary role is of a Devops/Sysadmin nature, and not a
specific development area so rapid provisioning/tooling/automation is an
area I almost exclusively work within (mostly using API-driven
using Foreman/Puppet).  The infrastructure our small team designs/builds
supports our development and business.

I am the target user base you'd probably want to cater to.


Absolutely!


I can tell you the philosophy and mechanics of Tuskar/OOO are great,
something I'd love to start using extensively but there are some needed
aspects in the areas of control that I feel should be added (though arguably
less for me and more for my ilk who are looking to expand their OpenStack
footprint).

* ability to 'preview' changes going to the scheduler


What does this give you? How detailed a preview do you need? What
information is critical there? Have you seen the proposed designs for
a heat template preview feature - would that be sufficient?


Thanks for the reply.  Preview-wise it'd be useful to see node
allocation prior to deployment - nothing too in-depth.
I have not seen the heat template preview features, are you referring
to the YAML templating[1] or something else[2]?  I'd like to learn
more.

[1] -
http://docs.openstack.org/developer/heat/template_guide/hot_guide.html
[2] - https://github.com/openstack/heat-templates




* ability to override/change some aspects within node assignment


What would this be used to do? How often do those situations turn up?
Whats the impact if you can't do that?


One scenario might be that autodiscovery does not pick up an available
node in your pool of resources, or detects incorrectly - you could
manually change things as you like it.  Another (more common)
scenario is that you don't have an isolated, flat network with which
to deploy and nodes are picked that you do not want included in the
provisioning - you could remove those from the set of resources prior
to launching overcloud creation.  The impact would be that the tooling
would seem inflexible to those lacking a thoughtfully prepared 
network/infrastructure, or more commonly in cases where the existing

network design is too inflexible the usefulness and quick/seamless
provisioning benefits would fall short.




* ability to view at least minimal logging from within Tuskar UI


Logging of what - the deployment engine? The heat event-log? Nova
undercloud logs? Logs from the deployed instances? If it's not there
in V1, but you can get, or already have credentials for the [instances
that hold the logs that you wanted] would that be a big adoption
blocker, or just a nuisance?



Logging of the deployment engine status during the bootstrapping
process initially, and some rudimentary node success/failure
indication.  It should be simplistic enough to not rival existing monitoring/log
systems but at least provide deployment logs as the overcloud is being
built and a general node/health 'check-in' that it's complete.

Afterwards as you mentioned the logs are available on the deployed
systems.  Think of it as providing some basic written navigational signs 
for people crossing a small bridge before they get to the highway,

there's continuity from start -> finish and a clear sense of what's
occurring.  From my perspective, absence of this type of verbosity may
impede adoption of new users (who are used to this type of
information with deployment tooling).




Here's the main reason - most new adopters of OpenStack/IaaS are going to be
running legacy/mixed hardware and while they might have an initiative to
explore and invest and even a decent budget most of them are not going to
have
completely identical hardware, isolated/flat networks and things set
aside in such a way that blind auto-discovery/deployment will just work all
the time.


Thats great information (and something I reasonably well expected, to
a degree). We have a hard dependency on no wildcard DHCP servers in
the environment (or we can't deploy). Autodiscovery is something we
don't have yet, but certainly debugging deployment failures is a very
important use case and one we need to improve both at the plumbing
layer and in the stories around it in the UI.


There will be a need to sometimes adjust, and those coming from a more
vertically-scaling infrastructure (most large orgs.) will not have
100% matching standards in place of vendor, machine spec and network design
which may make Tuscar/OOO seem inflexible and 'one-way'.  This may just be a
carry-over or fear of the old ways of deployment but 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-12 Thread Will Foster

On 12/12/13 09:42 +1300, Robert Collins wrote:

On 12 December 2013 01:17, Jaromir Coufal  wrote:

On 2013/10/12 23:09, Robert Collins wrote:



The 'easiest' way is to support bigger companies with huge deployments,
tailored infrastructure, everything connected properly.

But there are tons of companies/users who are running on old
heterogeneous
hardware. Very likely even more than the number of companies having
already
mentioned large deployments. And giving them only the way of 'setting up
rules' in order to get the service on the node - this type of user is not
gonna use our deployment system.



Thats speculation. We don't know if they will or will not because we
haven't given them a working system to test.


Some part of that is speculation, some part of that is feedback from people
who are doing deployments (of course its just very limited audience).
Anyway, it is not just pure theory.


Sure. Let be me more precise. There is a hypothesis that lack of
direct control will be a significant adoption blocker for a primary
group of users.

I think it's safe to say that some users in the group 'sysadmins
having to deploy an OpenStack cloud' will find it a bridge too far and
not use a system without direct control. Call this group A.

I think it's also safe to say that some users will not care in the
slightest, because their deployment is too small for them to be
particularly worried (e.g. about occasional downtime (but they would
worry a lot about data loss)). Call this group B.

I suspect we don't need to consider group C - folk who won't use a
system if it *has* manual control, but thats only a suspicion. It may
be that the side effect of adding direct control is to reduce
usability below the threshold some folk need...

To assess 'significant adoption blocker' we basically need to find the
% of users who will care sufficiently that they don't use TripleO.

How can we do that? We can do questionnaires, and get such folk to
come talk with use, but that suffers from selection bias - group B can
use the system with or without direct manual control, so have little
motivation to argue vigorously in any particular direction. Group A
however have to argue because they won't use the system at all without
that feature, and they may want to use the system for other reasons,
so that because a crucial aspect for them.

A much better way IMO is to test it - to get a bunch of volunteers and
see who responds positively to a demo *without* direct manual control.

To do that we need a demoable thing, which might just be mockups that
show a set of workflows (and include things like Jay's
shiny-new-hardware use case in the demo).

I rather suspect we're building that anyway as part of doing UX work,
so maybe what we do is put a tweet or blog post up asking for
sysadmins who a) have not yet deployed openstack, b) want to, and c)
are willing to spend 20-30 minutes with us, walk them through a demo
showing no manual control, and record what questions they ask, and
whether they would like to have that product to us, and if not, then
(a) what use cases they can't address with the mockups and (b) what
other reasons they have for not using it.

This is a bunch of work though!

So, do we need to do that work?

*If* we can layer manual control on later, then we could defer this
testing until we are at the point where we can say 'the nova scheduled
version is ready, now lets decide if we add the manual control'.

OTOH, if we *cannot* layer manual control on later - if it has
tentacles through too much of the code base, then we need to decide
earlier, because it will be significantly harder to add later and that
may be too late of a ship date for vendors shipping on top of TripleO.

So with that as a prelude, my technical sense is that we can layer
manual scheduling on later: we provide an advanced screen, show the
list of N instances we're going to ask for and allow each instance to
be directly customised with a node id selected from either the current
node it's running on or an available node. It's significant work both
UI and plumbing, but it's not going to be made harder by the other
work we're doing AFAICT.

-> My proposal is that we shelve this discussion until we have the
nova/heat scheduled version in 'and now we polish' mode, and then pick
it back up and assess user needs.

An alternative argument is to say that group A is a majority of the
userbase and that doing an automatic version is entirely unnecessary.
Thats also possible, but I'm extremely skeptical, given the huge cost
of staff time, and the complete lack of interest my sysadmin friends
(and my former sysadmin self) have in doing automatable things by
hand.


I just wanted to add a few thoughts:

For some comparative information here "from the field" I work
extensively on deployments of large OpenStack implementations,
most recently with a ~220node/9rack deployment (scaling up to 
42racks / 1024 nodes soon).  My primary role is of a Devops/Sysadmin 
nature, and no