Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2014-02-05 Thread Jaromir Coufal

On 2014/05/02 15:27, Tzu-Mainn Chen wrote:

Hi,

In parallel to Jarda's updated wireframes, and based on various discussions 
over the past
weeks, here are the updated Tuskar requirements for Icehouse:

https://wiki.openstack.org/wiki/TripleO/TuskarIcehouseRequirements

Any feedback is appreciated.  Thanks!

Tzu-Mainn Chen


+1 looks good to me!

-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-19 Thread Radomir Dopieralski
On 11/12/13 21:42, Robert Collins wrote:
 On 12 December 2013 01:17, Jaromir Coufal jcou...@redhat.com wrote:
 On 2013/10/12 23:09, Robert Collins wrote:

[snip]

 Thats speculation. We don't know if they will or will not because we
 haven't given them a working system to test.

 Some part of that is speculation, some part of that is feedback from people
 who are doing deployments (of course its just very limited audience).
 Anyway, it is not just pure theory.
 
 Sure. Let be me more precise. There is a hypothesis that lack of
 direct control will be a significant adoption blocker for a primary
 group of users.

I'm sorry for butting in, but I think I can see where your disagreement
comes from and maybe explaining it will help resolving it.

It's not a hypothesis, but a well documented and researched fact, that
transparency has a huge impact on the ease of use of any information
artifact. In particular, the easier you can see what is actually
happening and how your actions affect the outcome, the faster you can
learn to use it and the more efficient you are in using it and resolving
any problems with it. It's no surprise that closeness of mapping and
hidden dependencies are two important congnitive dimensions that are
often measured when assesing the usability of an artifact. Humans simply
find it nice when they can tell what is happening, even if theoretically
they don't need that knowledge when everything works correctly.

This doesn't come from any direct requirements of Tuskar itself, and I
am sure that all the workarounds that Robert gave will work somehow in
every real-world problem that arises. But the whole will not necessarily
be easy or pleasant to learn and use. I am aware, that the requirment to
be able to see what is happening is a fundamental problem, because it
destroys one of the most important rules in system engineering --
separation of concerns. The parts in the upper layers should simply not
care how the parts in the lower layers do their jobs, as long as they
work properly.

I know that it is a kind of a tradition in Open Source software to
create software with the assumption, that it's enough for it to do its
job, and if every use case can be somehow done, directly or indirectly,
then it's good enough. We have a lot of working tools designed with this
principle in mind, such as CSS, autotools or our favorite git. They do
their job, and they do it well (except when they break horribly). But I
think we can put a little bit more effort into also ensuring that the
common use cases are not just doable, but also easy to implement and
maintain. And that means that we will sometimes have a requirement that
comes from how people think, and not from any particular technical need.
I know that it sounds like speculation, or theory, but I think we need
to tust in Jarda's experience with usability and his judgement about
what works better -- unless of course we are willing to learn all that
ourselves, which may take quite some time.

What is the point of having an expert, if we know better, after all?
-- 
Radomir Dopieralski


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-18 Thread Will Foster

On 13/12/13 09:41 -0500, Jay Dobies wrote:

* ability to 'preview' changes going to the scheduler


What does this give you? How detailed a preview do you need? What
information is critical there? Have you seen the proposed designs for
a heat template preview feature - would that be sufficient?


Will will probably have a better answer to this, but I feel like at 
very least this goes back to the psychology point raised earlier (I 
think in this thread, but if not, definitely one of the TripleO ones).


A weird parallel is whenever I do a new install of Fedora. I never 
accept their default disk partitioning without electing to 
review/modify it. Even if I didn't expect to change anything, I want 
to see what they are going to give me. And then I compulsively review 
the summary of what actual changes will be applied in the follow up 
screen that's displayed after I say I'm happy with the layout.


Perhaps that's more a commentary on my own OCD and cynicism that I 
feel dirty accepting the magic defaults blindly. I love the idea of 
anaconda doing the heavy lifting of figuring out sane defaults for 
home/root/swap and so on (similarly, I love the idea of Nova scheduler 
rationing out where instances are deployed), but I at least want to 
know I've seen it before it happens.


I fully admit to not knowing how common that sort of thing is. I 
suspect I'm in the majority of geeks and tame by sys admin standards, 
but I honestly don't know. So I acknowledge that my entire argument 
for the preview here is based on my own personality.




Jay,

I mirror your sentiments exactly here, the Fedora example is a good
one and is moreso the case when it comes to node allocation/details
and proposed changes in a deployment scenario.  Though 9/10 times the
defaults Nova scheduler will choose will be fine but there's a 'human'
need to review them, changing as necessary.

-will



pgpt6jWvlbElR.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-16 Thread Will Foster

On 13/12/13 19:06 +1300, Robert Collins wrote:

On 13 December 2013 06:24, Will Foster wfos...@redhat.com wrote:


I just wanted to add a few thoughts:


Thank you!


For some comparative information here from the field I work
extensively on deployments of large OpenStack implementations,
most recently with a ~220node/9rack deployment (scaling up to 42racks / 1024
nodes soon).  My primary role is of a Devops/Sysadmin nature, and not a
specific development area so rapid provisioning/tooling/automation is an
area I almost exclusively work within (mostly using API-driven
using Foreman/Puppet).  The infrastructure our small team designs/builds
supports our development and business.

I am the target user base you'd probably want to cater to.


Absolutely!


I can tell you the philosophy and mechanics of Tuskar/OOO are great,
something I'd love to start using extensively but there are some needed
aspects in the areas of control that I feel should be added (though arguably
less for me and more for my ilk who are looking to expand their OpenStack
footprint).

* ability to 'preview' changes going to the scheduler


What does this give you? How detailed a preview do you need? What
information is critical there? Have you seen the proposed designs for
a heat template preview feature - would that be sufficient?


Thanks for the reply.  Preview-wise it'd be useful to see node
allocation prior to deployment - nothing too in-depth.
I have not seen the heat template preview features, are you referring
to the YAML templating[1] or something else[2]?  I'd like to learn
more.

[1] -
http://docs.openstack.org/developer/heat/template_guide/hot_guide.html
[2] - https://github.com/openstack/heat-templates




* ability to override/change some aspects within node assignment


What would this be used to do? How often do those situations turn up?
Whats the impact if you can't do that?


One scenario might be that autodiscovery does not pick up an available
node in your pool of resources, or detects incorrectly - you could
manually change things as you like it.  Another (more common)
scenario is that you don't have an isolated, flat network with which
to deploy and nodes are picked that you do not want included in the
provisioning - you could remove those from the set of resources prior
to launching overcloud creation.  The impact would be that the tooling
would seem inflexible to those lacking a thoughtfully prepared 
network/infrastructure, or more commonly in cases where the existing

network design is too inflexible the usefulness and quick/seamless
provisioning benefits would fall short.




* ability to view at least minimal logging from within Tuskar UI


Logging of what - the deployment engine? The heat event-log? Nova
undercloud logs? Logs from the deployed instances? If it's not there
in V1, but you can get, or already have credentials for the [instances
that hold the logs that you wanted] would that be a big adoption
blocker, or just a nuisance?



Logging of the deployment engine status during the bootstrapping
process initially, and some rudimentary node success/failure
indication.  It should be simplistic enough to not rival existing monitoring/log
systems but at least provide deployment logs as the overcloud is being
built and a general node/health 'check-in' that it's complete.

Afterwards as you mentioned the logs are available on the deployed
systems.  Think of it as providing some basic written navigational signs 
for people crossing a small bridge before they get to the highway,

there's continuity from start - finish and a clear sense of what's
occurring.  From my perspective, absence of this type of verbosity may
impede adoption of new users (who are used to this type of
information with deployment tooling).




Here's the main reason - most new adopters of OpenStack/IaaS are going to be
running legacy/mixed hardware and while they might have an initiative to
explore and invest and even a decent budget most of them are not going to
have
completely identical hardware, isolated/flat networks and things set
aside in such a way that blind auto-discovery/deployment will just work all
the time.


Thats great information (and something I reasonably well expected, to
a degree). We have a hard dependency on no wildcard DHCP servers in
the environment (or we can't deploy). Autodiscovery is something we
don't have yet, but certainly debugging deployment failures is a very
important use case and one we need to improve both at the plumbing
layer and in the stories around it in the UI.


There will be a need to sometimes adjust, and those coming from a more
vertically-scaling infrastructure (most large orgs.) will not have
100% matching standards in place of vendor, machine spec and network design
which may make Tuscar/OOO seem inflexible and 'one-way'.  This may just be a
carry-over or fear of the old ways of deployment but nonetheless it
is present.


I'm not sure what you mean by matching standards here :). Ironic is

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-13 Thread Jay Dobies

* ability to 'preview' changes going to the scheduler


What does this give you? How detailed a preview do you need? What
information is critical there? Have you seen the proposed designs for
a heat template preview feature - would that be sufficient?


Will will probably have a better answer to this, but I feel like at very 
least this goes back to the psychology point raised earlier (I think in 
this thread, but if not, definitely one of the TripleO ones).


A weird parallel is whenever I do a new install of Fedora. I never 
accept their default disk partitioning without electing to review/modify 
it. Even if I didn't expect to change anything, I want to see what they 
are going to give me. And then I compulsively review the summary of what 
actual changes will be applied in the follow up screen that's displayed 
after I say I'm happy with the layout.


Perhaps that's more a commentary on my own OCD and cynicism that I feel 
dirty accepting the magic defaults blindly. I love the idea of anaconda 
doing the heavy lifting of figuring out sane defaults for home/root/swap 
and so on (similarly, I love the idea of Nova scheduler rationing out 
where instances are deployed), but I at least want to know I've seen it 
before it happens.


I fully admit to not knowing how common that sort of thing is. I suspect 
I'm in the majority of geeks and tame by sys admin standards, but I 
honestly don't know. So I acknowledge that my entire argument for the 
preview here is based on my own personality.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-13 Thread Matt Wagner
On Mon Dec  9 15:22:04 2013, Robert Collins wrote:
 On 9 December 2013 23:56, Jaromir Coufal jcou...@redhat.com wrote:

 Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory
 stats

 For registration it is just Management MAC address which is needed right? Or
 does Ironic need also IP? I think that MAC address might be enough, we can
 display IP in details of node later on.

 Ironic needs all the details I listed today. Management MAC is not
 currently used at all, but would be needed in future when we tackle
 IPMI IP managed by Neutron.

I think what happened here is that two separate things we need got
conflated.

We need the IP address of the management (IPMI) interface, for power
control, etc.

We also need the MAC of the host system (*not* its IPMI/management
interface) for PXE to serve it the appropriate content.


-- 
Matt Wagner
Software Engineer, Red Hat



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-12 Thread Keith Basil
On Dec 10, 2013, at 5:09 PM, Robert Collins wrote:

 On 11 December 2013 05:42, Jaromir Coufal jcou...@redhat.com wrote:
 On 2013/09/12 23:38, Tzu-Mainn Chen wrote:
 The disagreement comes from whether we need manual node assignment or not.
 I would argue that we
 need to step back and take a look at the real use case: heterogeneous
 nodes.  If there are literally
 no characteristics that differentiate nodes A and B, then why do we care
 which gets used for what?  Why
 do we need to manually assign one?
 
 
 Ideally, we don't. But with this approach we would take out the possibility
 to change something or decide something from the user.
 
 So, I think this is where the confusion is. Using the nova scheduler
 doesn't prevent change or control. It just ensures the change and
 control happen in the right place: the Nova scheduler has had years of
 work, of features and facilities being added to support HPC, HA and
 other such use cases. It should have everything we need [1], without
 going down to manual placement. For clarity: manual placement is when
 any of the user, Tuskar, or Heat query Ironic, select a node, and then
 use a scheduler hint to bypass the scheduler.
 
 The 'easiest' way is to support bigger companies with huge deployments,
 tailored infrastructure, everything connected properly.
 
 But there are tons of companies/users who are running on old heterogeneous
 hardware. Very likely even more than the number of companies having already
 mentioned large deployments. And giving them only the way of 'setting up
 rules' in order to get the service on the node - this type of user is not
 gonna use our deployment system.
 
 Thats speculation. We don't know if they will or will not because we
 haven't given them a working system to test.
 
 Lets break the concern into two halves:
 A) Users who could have their needs met, but won't use TripleO because
 meeting their needs in this way is too hard/complex/painful.
 
 B) Users who have a need we cannot meet with the current approach.
 
 For category B users, their needs might be specific HA things - like
 the oft discussed failure domains angle, where we need to split up HA
 clusters across power bars, aircon, switches etc. Clearly long term we
 want to support them, and the undercloud Nova scheduler is entirely
 capable of being informed about this, and we can evolve to a holistic
 statement over time. Lets get a concrete list of the cases we can
 think of today that won't be well supported initially, and we can
 figure out where to do the work to support them properly.
 
 For category A users, I think that we should get concrete examples,
 and evolve our design (architecture and UX) to make meeting those
 needs pleasant.
 
 What we shouldn't do is plan complex work without concrete examples
 that people actually need. Jay's example of some shiny new compute
 servers with special parts that need to be carved out was a great one
 - we can put that in category A, and figure out if it's easy enough,
 or obvious enough - and think about whether we document it or make it
 a guided workflow or $whatever.
 
 Somebody might argue - why do we care? If user doesn't like TripleO
 paradigm, he shouldn't use the UI and should use another tool. But the UI is
 not only about TripleO. Yes, it is underlying concept, but we are working on
 future *official* OpenStack deployment tool. We should care to enable people
 to deploy OpenStack - large/small scale, homo/heterogeneous hardware,
 typical or a bit more specific use-cases.
 
 The difficulty I'm having is that the discussion seems to assume that
 'heterogeneous implies manual', but I don't agree that that
 implication is necessary!
 
 As an underlying paradigm of how to install cloud - awesome idea, awesome
 concept, it works. But user doesn't care about how it is being deployed for
 him. He cares about getting what he wants/needs. And we shouldn't go that
 far that we violently force him to treat his infrastructure as cloud. I
 believe that possibility to change/control - if needed - is very important
 and we should care.
 
 I propose that we make concrete use cases: 'Fred cannot use TripleO
 without manual assignment because XYZ'. Then we can assess how
 important XYZ is to our early adopters and go from there.
 
 And what is key for us is to *enable* users - not to prevent them from using
 our deployment tool, because it doesn't work for their requirements.
 
 Totally agreed :)
 
 If we can agree on that, then I think it would be sufficient to say that
 we want a mechanism to allow
 UI users to deal with heterogeneous nodes, and that mechanism must use
 nova-scheduler.  In my mind,
 that's what resource classes and node profiles are intended for.
 
 
 Not arguing on this point. Though that mechanism should support also cases,
 where user specifies a role for a node / removes node from a role. The rest
 of nodes which I don't care about should be handled by nova-scheduler.
 
 Why! What is a use case for removing a 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-12 Thread Keith Basil
On Dec 11, 2013, at 3:42 PM, Robert Collins wrote:

 On 12 December 2013 01:17, Jaromir Coufal jcou...@redhat.com wrote:
 On 2013/10/12 23:09, Robert Collins wrote:
 
 The 'easiest' way is to support bigger companies with huge deployments,
 tailored infrastructure, everything connected properly.
 
 But there are tons of companies/users who are running on old
 heterogeneous
 hardware. Very likely even more than the number of companies having
 already
 mentioned large deployments. And giving them only the way of 'setting up
 rules' in order to get the service on the node - this type of user is not
 gonna use our deployment system.
 
 
 Thats speculation. We don't know if they will or will not because we
 haven't given them a working system to test.
 
 Some part of that is speculation, some part of that is feedback from people
 who are doing deployments (of course its just very limited audience).
 Anyway, it is not just pure theory.
 
 Sure. Let be me more precise. There is a hypothesis that lack of
 direct control will be a significant adoption blocker for a primary
 group of users.
 
 I think it's safe to say that some users in the group 'sysadmins
 having to deploy an OpenStack cloud' will find it a bridge too far and
 not use a system without direct control. Call this group A.
 
 I think it's also safe to say that some users will not care in the
 slightest, because their deployment is too small for them to be
 particularly worried (e.g. about occasional downtime (but they would
 worry a lot about data loss)). Call this group B.
 
 I suspect we don't need to consider group C - folk who won't use a
 system if it *has* manual control, but thats only a suspicion. It may
 be that the side effect of adding direct control is to reduce
 usability below the threshold some folk need...
 
 To assess 'significant adoption blocker' we basically need to find the
 % of users who will care sufficiently that they don't use TripleO.
 
 How can we do that? We can do questionnaires, and get such folk to
 come talk with use, but that suffers from selection bias - group B can
 use the system with or without direct manual control, so have little
 motivation to argue vigorously in any particular direction. Group A
 however have to argue because they won't use the system at all without
 that feature, and they may want to use the system for other reasons,
 so that because a crucial aspect for them.
 
 A much better way IMO is to test it - to get a bunch of volunteers and
 see who responds positively to a demo *without* direct manual control.
 
 To do that we need a demoable thing, which might just be mockups that
 show a set of workflows (and include things like Jay's
 shiny-new-hardware use case in the demo).
 
 I rather suspect we're building that anyway as part of doing UX work,
 so maybe what we do is put a tweet or blog post up asking for
 sysadmins who a) have not yet deployed openstack, b) want to, and c)
 are willing to spend 20-30 minutes with us, walk them through a demo
 showing no manual control, and record what questions they ask, and
 whether they would like to have that product to us, and if not, then
 (a) what use cases they can't address with the mockups and (b) what
 other reasons they have for not using it.
 
 This is a bunch of work though!
 
 So, do we need to do that work?
 
 *If* we can layer manual control on later, then we could defer this
 testing until we are at the point where we can say 'the nova scheduled
 version is ready, now lets decide if we add the manual control'.
 
 OTOH, if we *cannot* layer manual control on later - if it has
 tentacles through too much of the code base, then we need to decide
 earlier, because it will be significantly harder to add later and that
 may be too late of a ship date for vendors shipping on top of TripleO.
 
 So with that as a prelude, my technical sense is that we can layer
 manual scheduling on later: we provide an advanced screen, show the
 list of N instances we're going to ask for and allow each instance to
 be directly customised with a node id selected from either the current
 node it's running on or an available node. It's significant work both
 UI and plumbing, but it's not going to be made harder by the other
 work we're doing AFAICT.
 
 - My proposal is that we shelve this discussion until we have the
 nova/heat scheduled version in 'and now we polish' mode, and then pick
 it back up and assess user needs.
 
 An alternative argument is to say that group A is a majority of the
 userbase and that doing an automatic version is entirely unnecessary.
 Thats also possible, but I'm extremely skeptical, given the huge cost
 of staff time, and the complete lack of interest my sysadmin friends
 (and my former sysadmin self) have in doing automatable things by
 hand.
 
 Lets break the concern into two halves:
 A) Users who could have their needs met, but won't use TripleO because
 meeting their needs in this way is too hard/complex/painful.
 
 B) Users who have a 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-12 Thread Will Foster

On 12/12/13 09:42 +1300, Robert Collins wrote:

On 12 December 2013 01:17, Jaromir Coufal jcou...@redhat.com wrote:

On 2013/10/12 23:09, Robert Collins wrote:



The 'easiest' way is to support bigger companies with huge deployments,
tailored infrastructure, everything connected properly.

But there are tons of companies/users who are running on old
heterogeneous
hardware. Very likely even more than the number of companies having
already
mentioned large deployments. And giving them only the way of 'setting up
rules' in order to get the service on the node - this type of user is not
gonna use our deployment system.



Thats speculation. We don't know if they will or will not because we
haven't given them a working system to test.


Some part of that is speculation, some part of that is feedback from people
who are doing deployments (of course its just very limited audience).
Anyway, it is not just pure theory.


Sure. Let be me more precise. There is a hypothesis that lack of
direct control will be a significant adoption blocker for a primary
group of users.

I think it's safe to say that some users in the group 'sysadmins
having to deploy an OpenStack cloud' will find it a bridge too far and
not use a system without direct control. Call this group A.

I think it's also safe to say that some users will not care in the
slightest, because their deployment is too small for them to be
particularly worried (e.g. about occasional downtime (but they would
worry a lot about data loss)). Call this group B.

I suspect we don't need to consider group C - folk who won't use a
system if it *has* manual control, but thats only a suspicion. It may
be that the side effect of adding direct control is to reduce
usability below the threshold some folk need...

To assess 'significant adoption blocker' we basically need to find the
% of users who will care sufficiently that they don't use TripleO.

How can we do that? We can do questionnaires, and get such folk to
come talk with use, but that suffers from selection bias - group B can
use the system with or without direct manual control, so have little
motivation to argue vigorously in any particular direction. Group A
however have to argue because they won't use the system at all without
that feature, and they may want to use the system for other reasons,
so that because a crucial aspect for them.

A much better way IMO is to test it - to get a bunch of volunteers and
see who responds positively to a demo *without* direct manual control.

To do that we need a demoable thing, which might just be mockups that
show a set of workflows (and include things like Jay's
shiny-new-hardware use case in the demo).

I rather suspect we're building that anyway as part of doing UX work,
so maybe what we do is put a tweet or blog post up asking for
sysadmins who a) have not yet deployed openstack, b) want to, and c)
are willing to spend 20-30 minutes with us, walk them through a demo
showing no manual control, and record what questions they ask, and
whether they would like to have that product to us, and if not, then
(a) what use cases they can't address with the mockups and (b) what
other reasons they have for not using it.

This is a bunch of work though!

So, do we need to do that work?

*If* we can layer manual control on later, then we could defer this
testing until we are at the point where we can say 'the nova scheduled
version is ready, now lets decide if we add the manual control'.

OTOH, if we *cannot* layer manual control on later - if it has
tentacles through too much of the code base, then we need to decide
earlier, because it will be significantly harder to add later and that
may be too late of a ship date for vendors shipping on top of TripleO.

So with that as a prelude, my technical sense is that we can layer
manual scheduling on later: we provide an advanced screen, show the
list of N instances we're going to ask for and allow each instance to
be directly customised with a node id selected from either the current
node it's running on or an available node. It's significant work both
UI and plumbing, but it's not going to be made harder by the other
work we're doing AFAICT.

- My proposal is that we shelve this discussion until we have the
nova/heat scheduled version in 'and now we polish' mode, and then pick
it back up and assess user needs.

An alternative argument is to say that group A is a majority of the
userbase and that doing an automatic version is entirely unnecessary.
Thats also possible, but I'm extremely skeptical, given the huge cost
of staff time, and the complete lack of interest my sysadmin friends
(and my former sysadmin self) have in doing automatable things by
hand.


I just wanted to add a few thoughts:

For some comparative information here from the field I work
extensively on deployments of large OpenStack implementations,
most recently with a ~220node/9rack deployment (scaling up to 
42racks / 1024 nodes soon).  My primary role is of a Devops/Sysadmin 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-12 Thread Keith Basil
On Dec 12, 2013, at 4:05 PM, Jay Dobies wrote:

 Maybe this is a valid use case?
 
 Cloud operator has several core service nodes of differing configuration
 types.
 
 [node1]  -- balanced mix of disk/cpu/ram for general core services
 [node2]  -- lots of disks for Ceilometer data storage
 [node3]  -- low-end appliance like box for a specialized/custom core 
 service
   (SIEM box for example)
 
 All nodes[1,2,3] are in the same deployment grouping (core services).  As 
 such,
 this is a heterogenous deployment grouping.  Heterogeneity in this case 
 defined by
 differing roles and hardware configurations.
 
 This is a real use case.
 
 How do we handle this?
 
 This is the sort of thing I had been concerned with, but I think this is just 
 a variation on Robert's GPU example. Rather than butcher it by paraphrasing, 
 I'll just include the relevant part:
 
 
 The basic stuff we're talking about so far is just about saying each
 role can run on some set of undercloud flavors. If that new bit of kit
 has the same coarse metadata as other kit, Nova can't tell it apart.
 So the way to solve the problem is:
 - a) teach Ironic about the specialness of the node (e.g. a tag 'GPU')
 - b) teach Nova that there is a flavor that maps to the presence of
 that specialness, and
   c) teach Nova that other flavors may not map to that specialness
 
 then in Tuskar whatever Nova configuration is needed to use that GPU
 is a special role ('GPU compute' for instance) and only that role
 would be given that flavor to use. That special config is probably
 being in a host aggregate, with an overcloud flavor that specifies
 that aggregate, which means at the TripleO level we need to put the
 aggregate in the config metadata for that role, and the admin does a
 one-time setup in the Nova Horizon UI to configure their GPU compute
 flavor.
 

Yes, the core services example is a variation on the above.  The idea
of _undercloud_ flavor assignment (flavor to role mapping) escaped me
when I read that earlier.

It appears to be very elegant and provides another attribute for Tuskar's
notion of resource classes.  So +1 here.


 You mention three specific nodes, but what you're describing is more likely 
 three concepts:
 - Balanced Nodes
 - High Disk I/O Nodes
 - Low-End Appliance Nodes
 
 They may have one node in each, but I think your example of three nodes is 
 potentially *too* simplified to be considered as proper sample size. I'd 
 guess there are more than three in play commonly, in which case the concepts 
 breakdown starts to be more appealing.

Correct - definitely more than three, I just wanted to illustrate the use case.

 I think the disk flavor in particular has quite a few use cases, especially 
 until SSDs are ubiquitous. I'd want to flag those (in Jay terminology, the 
 disk hotness) as hosting the data-intensive portions, but where I had 
 previously been viewing that as manual allocation, it sounds like the 
 approach is to properly categorize them for what they are and teach Nova how 
 to use them.
 
 Robert - Please correct me if I misread any of what your intention was, I 
 don't want to drive people down the wrong path if I'm misinterpretting 
 anything.

-k


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-12 Thread Jay Dobies



On 12/12/2013 04:25 PM, Keith Basil wrote:

On Dec 12, 2013, at 4:05 PM, Jay Dobies wrote:


Maybe this is a valid use case?

Cloud operator has several core service nodes of differing configuration
types.

[node1]  -- balanced mix of disk/cpu/ram for general core services
[node2]  -- lots of disks for Ceilometer data storage
[node3]  -- low-end appliance like box for a specialized/custom core service
 (SIEM box for example)

All nodes[1,2,3] are in the same deployment grouping (core services).  As 
such,
this is a heterogenous deployment grouping.  Heterogeneity in this case defined 
by
differing roles and hardware configurations.

This is a real use case.

How do we handle this?


This is the sort of thing I had been concerned with, but I think this is just a 
variation on Robert's GPU example. Rather than butcher it by paraphrasing, I'll 
just include the relevant part:


The basic stuff we're talking about so far is just about saying each
role can run on some set of undercloud flavors. If that new bit of kit
has the same coarse metadata as other kit, Nova can't tell it apart.
So the way to solve the problem is:
- a) teach Ironic about the specialness of the node (e.g. a tag 'GPU')
- b) teach Nova that there is a flavor that maps to the presence of
that specialness, and
   c) teach Nova that other flavors may not map to that specialness

then in Tuskar whatever Nova configuration is needed to use that GPU
is a special role ('GPU compute' for instance) and only that role
would be given that flavor to use. That special config is probably
being in a host aggregate, with an overcloud flavor that specifies
that aggregate, which means at the TripleO level we need to put the
aggregate in the config metadata for that role, and the admin does a
one-time setup in the Nova Horizon UI to configure their GPU compute
flavor.



Yes, the core services example is a variation on the above.  The idea
of _undercloud_ flavor assignment (flavor to role mapping) escaped me
when I read that earlier.

It appears to be very elegant and provides another attribute for Tuskar's
notion of resource classes.  So +1 here.



You mention three specific nodes, but what you're describing is more likely 
three concepts:
- Balanced Nodes
- High Disk I/O Nodes
- Low-End Appliance Nodes

They may have one node in each, but I think your example of three nodes is 
potentially *too* simplified to be considered as proper sample size. I'd guess 
there are more than three in play commonly, in which case the concepts 
breakdown starts to be more appealing.


Correct - definitely more than three, I just wanted to illustrate the use case.


I not sure I explained what I was getting at properly. I wasn't implying 
you thought it was limited to just three. I do the same thing, simplify 
down for discussion purposes (I've done so in my head about this very 
topic).


But I think this may be a rare case where simplifying actually masks the 
concept rather than exposes it. Manual feels a bit more desirable in 
small sample groups but when looking at larger sets of nodes, the flavor 
concept feels less odd than it does when defining a flavor for a single 
machine.


That's all. :) Maybe that was clear already, but I wanted to make sure I 
didn't come off as attacking your example. It certainly wasn't my 
intention. The balanced v. disk machine thing is the sort of thing I'd 
been thinking for a while but hadn't found a good way to make concrete.



I think the disk flavor in particular has quite a few use cases, especially until SSDs 
are ubiquitous. I'd want to flag those (in Jay terminology, the disk hotness) 
as hosting the data-intensive portions, but where I had previously been viewing that as 
manual allocation, it sounds like the approach is to properly categorize them for what 
they are and teach Nova how to use them.

Robert - Please correct me if I misread any of what your intention was, I don't 
want to drive people down the wrong path if I'm misinterpretting anything.


-k


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-11 Thread Jaromir Coufal

On 2013/10/12 23:09, Robert Collins wrote:

On 11 December 2013 05:42, Jaromir Coufal jcou...@redhat.com wrote:

On 2013/09/12 23:38, Tzu-Mainn Chen wrote:

The disagreement comes from whether we need manual node assignment or not.
I would argue that we
need to step back and take a look at the real use case: heterogeneous
nodes.  If there are literally
no characteristics that differentiate nodes A and B, then why do we care
which gets used for what?  Why
do we need to manually assign one?



Ideally, we don't. But with this approach we would take out the possibility
to change something or decide something from the user.


So, I think this is where the confusion is. Using the nova scheduler
doesn't prevent change or control. It just ensures the change and
control happen in the right place: the Nova scheduler has had years of
work, of features and facilities being added to support HPC, HA and
other such use cases. It should have everything we need [1], without
going down to manual placement. For clarity: manual placement is when
any of the user, Tuskar, or Heat query Ironic, select a node, and then
use a scheduler hint to bypass the scheduler.

This is very well written. I am all for things going to right places.


The 'easiest' way is to support bigger companies with huge deployments,
tailored infrastructure, everything connected properly.

But there are tons of companies/users who are running on old heterogeneous
hardware. Very likely even more than the number of companies having already
mentioned large deployments. And giving them only the way of 'setting up
rules' in order to get the service on the node - this type of user is not
gonna use our deployment system.


Thats speculation. We don't know if they will or will not because we
haven't given them a working system to test.
Some part of that is speculation, some part of that is feedback from 
people who are doing deployments (of course its just very limited 
audience). Anyway, it is not just pure theory.



Lets break the concern into two halves:
A) Users who could have their needs met, but won't use TripleO because
meeting their needs in this way is too hard/complex/painful.

B) Users who have a need we cannot meet with the current approach.

For category B users, their needs might be specific HA things - like
the oft discussed failure domains angle, where we need to split up HA
clusters across power bars, aircon, switches etc. Clearly long term we
want to support them, and the undercloud Nova scheduler is entirely
capable of being informed about this, and we can evolve to a holistic
statement over time. Lets get a concrete list of the cases we can
think of today that won't be well supported initially, and we can
figure out where to do the work to support them properly.
My question is - can't we help them now? To enable users to use our app 
even when we don't have enough smartness to help them 'auto' way?



For category A users, I think that we should get concrete examples,
and evolve our design (architecture and UX) to make meeting those
needs pleasant.
+1... I tried to pull some operators into this discussion thread, will 
try to get more.



What we shouldn't do is plan complex work without concrete examples
that people actually need. Jay's example of some shiny new compute
servers with special parts that need to be carved out was a great one
- we can put that in category A, and figure out if it's easy enough,
or obvious enough - and think about whether we document it or make it
a guided workflow or $whatever.


Somebody might argue - why do we care? If user doesn't like TripleO
paradigm, he shouldn't use the UI and should use another tool. But the UI is
not only about TripleO. Yes, it is underlying concept, but we are working on
future *official* OpenStack deployment tool. We should care to enable people
to deploy OpenStack - large/small scale, homo/heterogeneous hardware,
typical or a bit more specific use-cases.


The difficulty I'm having is that the discussion seems to assume that
'heterogeneous implies manual', but I don't agree that that
implication is necessary!
No, I don't agree with this either. Heterogeneous hardware can be very 
well managed automatically as well as homogeneous (classes, node profiles).



As an underlying paradigm of how to install cloud - awesome idea, awesome
concept, it works. But user doesn't care about how it is being deployed for
him. He cares about getting what he wants/needs. And we shouldn't go that
far that we violently force him to treat his infrastructure as cloud. I
believe that possibility to change/control - if needed - is very important
and we should care.


I propose that we make concrete use cases: 'Fred cannot use TripleO
without manual assignment because XYZ'. Then we can assess how
important XYZ is to our early adopters and go from there.
+1, yes. I will try to bug more relevant people, who could contribute at 
this area.



And what is key for us is to *enable* users - not to prevent them from 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-11 Thread Jaromir Coufal



On 2013/10/12 19:39, Tzu-Mainn Chen wrote:


Ideally, we don't. But with this approach we would take out the
possibility to change something or decide something from the user.

The 'easiest' way is to support bigger companies with huge deployments,
tailored infrastructure, everything connected properly.

But there are tons of companies/users who are running on old
heterogeneous hardware. Very likely even more than the number of
companies having already mentioned large deployments. And giving them
only the way of 'setting up rules' in order to get the service on the
node - this type of user is not gonna use our deployment system.

Somebody might argue - why do we care? If user doesn't like TripleO
paradigm, he shouldn't use the UI and should use another tool. But the
UI is not only about TripleO. Yes, it is underlying concept, but we are
working on future *official* OpenStack deployment tool. We should care
to enable people to deploy OpenStack - large/small scale,
homo/heterogeneous hardware, typical or a bit more specific use-cases.


I think this is a very important clarification, and I'm glad you made it.  It 
sounds
like manual assignment is actually a sub-requirement, and the feature you're 
arguing
for is: supporting non-TripleO deployments.
Mostly but not only. The other argument is - keeping control on stuff I 
am doing. Note that undercloud user is different from overcloud user.



That might be a worthy goal, but I think it's a distraction for the Icehouse 
timeframe.
Each new deployment strategy requires not only a new UI, but different 
deployment
architectures that could have very little common with each other.  Designing 
them all
to work in the same space is a recipe for disaster, a convoluted gnarl of code 
that
doesn't do any one thing particularly well.  To use an analogy: there's a 
reason why
no one makes a flying boat car.

I'm going to strongly advocate that for Icehouse, we focus exclusively on large 
scale
TripleO deployments, working to make that UI and architecture as sturdy as we 
can.  Future
deployment strategies should be discussed in the future, and if they're not 
TripleO based,
they should be discussed with the proper OpenStack group.
One concern here is - it is quite likely that we get people excited 
about this approach - it will be a new boom - 'wow', there is automagic 
doing everything for me. But then the question would be reality - how 
many from that excited users will actually use TripleO for their real 
deployments (I mean in the early stages)? Would it be only couple of 
them (because of covered use cases, concerns of maturity, lack of 
control scarcity)? Can we assure them that if anything goes wrong, they 
have control over it?


-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-11 Thread Tzu-Mainn Chen
 On 2013/10/12 19:39, Tzu-Mainn Chen wrote:
 
  Ideally, we don't. But with this approach we would take out the
  possibility to change something or decide something from the user.
 
  The 'easiest' way is to support bigger companies with huge deployments,
  tailored infrastructure, everything connected properly.
 
  But there are tons of companies/users who are running on old
  heterogeneous hardware. Very likely even more than the number of
  companies having already mentioned large deployments. And giving them
  only the way of 'setting up rules' in order to get the service on the
  node - this type of user is not gonna use our deployment system.
 
  Somebody might argue - why do we care? If user doesn't like TripleO
  paradigm, he shouldn't use the UI and should use another tool. But the
  UI is not only about TripleO. Yes, it is underlying concept, but we are
  working on future *official* OpenStack deployment tool. We should care
  to enable people to deploy OpenStack - large/small scale,
  homo/heterogeneous hardware, typical or a bit more specific use-cases.
 
  I think this is a very important clarification, and I'm glad you made it.
  It sounds
  like manual assignment is actually a sub-requirement, and the feature
  you're arguing
  for is: supporting non-TripleO deployments.

 Mostly but not only. The other argument is - keeping control on stuff I
 am doing. Note that undercloud user is different from overcloud user.

Sure, but again, that argument seems to me to be a non-TripleO approach.  I'm
not saying that it's not a possible use case, I'm saying that you're advocating
for a deployment strategy that fundamentally diverges from the TripleO
philosophy - and as such, that strategy will likely require a separate UI, 
underlying
architecture, etc, and should not be planned for in the Icehouse timeframe.

  That might be a worthy goal, but I think it's a distraction for the
  Icehouse timeframe.
  Each new deployment strategy requires not only a new UI, but different
  deployment
  architectures that could have very little common with each other.
  Designing them all
  to work in the same space is a recipe for disaster, a convoluted gnarl of
  code that
  doesn't do any one thing particularly well.  To use an analogy: there's a
  reason why
  no one makes a flying boat car.
 
  I'm going to strongly advocate that for Icehouse, we focus exclusively on
  large scale
  TripleO deployments, working to make that UI and architecture as sturdy as
  we can.  Future
  deployment strategies should be discussed in the future, and if they're not
  TripleO based,
  they should be discussed with the proper OpenStack group.
 One concern here is - it is quite likely that we get people excited
 about this approach - it will be a new boom - 'wow', there is automagic
 doing everything for me. But then the question would be reality - how
 many from that excited users will actually use TripleO for their real
 deployments (I mean in the early stages)? Would it be only couple of
 them (because of covered use cases, concerns of maturity, lack of
 control scarcity)? Can we assure them that if anything goes wrong, they
 have control over it?
 -- Jarda
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-10 Thread Jaromir Coufal


On 2013/09/12 17:15, Tzu-Mainn Chen wrote:


- As an infrastructure administrator, Anna wants to be able to 
unallocate a node from a deployment.

Why? Whats her motivation. One plausible one for me is 'a machine
needs to be serviced so Anna wants to remove it from the deployment to
avoid causing user visible downtime.'  So lets say that: Anna needs to
be able to take machines out of service so they can be maintained or
disposed of.

Node being serviced is a different user story for me.

I believe we are still 'fighting' here with two approaches and I
believe we need both. We can't only provide a way 'give us
resources we will do a magic'. Yes this is preferred way -
especially for large deployments, but we also need a fallback so
that user can say - no, this node doesn't belong to the class, I
don't want it there - unassign. Or I need to have this node there
- assign.

Just for clarification - the wireframes don't cover individual nodes 
being manually assigned, do they?  I thought the concession to manual 
control was entirely through resource classes and node profiles, which 
are still parameters to be passed through to the nova-scheduler 
filter.  To me, that's very different from manual assignment.


Mainn
It's all doable and wireframes are prepared for the manual assignment as 
well, Mainn. I just was not designing details for now, since we are 
going to focus on auto-distribution first. But I will cover this use 
case in later iterations of wireframes.


Cheers
-- Jarda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-10 Thread Jaromir Coufal



On 2013/09/12 21:22, Robert Collins wrote:

Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory
stats

For registration it is just Management MAC address which is needed right? Or
does Ironic need also IP? I think that MAC address might be enough, we can
display IP in details of node later on.


Ironic needs all the details I listed today. Management MAC is not
currently used at all, but would be needed in future when we tackle
IPMI IP managed by Neutron.

OK, I will reflect that in wireframes for UI.



 * Auto-discovery during undercloud install process (M)

* Monitoring
* assignment, availability, status
* capacity, historical statistics (M)

Why is this under 'nodes'? I challenge the idea that it should be
there. We will need to surface some stuff about nodes, but the
underlying idea is to take a cloud approach here - so we're monitoring
services, that happen to be on nodes. There is room to monitor nodes,
as an undercloud feature set, but lets be very very specific about
what is sitting at what layer.

We need both - we need to track services but also state of nodes (CPU, RAM,
Network bandwidth, etc). So in node detail you should be able to track both.


Those are instance characteristics, not node characteristics. An
instance is software running on a Node, and the amount of CPU/RAM/NIC
utilisation is specific to that software while it's on that Node, not
to future or past instances running on that Node.
I think this is minor detail. Node has certain CPU/RAM/NIC capacity and 
instance is consuming it. Either way it is important for us to display 
this utilization in the UI as well as service statistics.



 * Resource nodes

 ^ nodes is again confusing layers - nodes are
what things are deployed to, but they aren't the entry point

Can you, please be a bit more specific here? I don't understand this note.


By the way, can you get your email client to insert  before the text
you are replying to rather than HTML | marks? Hard to tell what I
wrote and what you did :).

Oh right, sure, sorry. Should be fixed ;)


By that note I meant, that Nodes are not resources, Resource instances
run on Nodes. Nodes are the generic pool of hardware we can deploy
things onto.
Well right, this is the terminology. From my point of view, resources 
for overcloud are the instances which are running on Nodes. Once we 
deploy the nodes with appropriate software they become Resource Nodes 
(from unallocated pool). If this terminology is confusing already then 
we should fix it. Any suggestions for improvements?



 * Unallocated nodes

This implies an 'allocation' step, that we don't have - how about
'Idle nodes' or something.

It can be auto-allocation. I don't see problem with 'unallocated' term.


Ok, it's not a biggy. I do think it will frame things poorly and lead
to an expectation about how TripleO works that doesn't match how it
does, but we can change it later if I'm right, and if I'm wrong, well
it won't be the first time :).
I think we will figure it out in the other thread (where we talk about 
allocation). Anyway - I am interested in how differently would you 
formulate Unallocated / Resource / Management Nodes? Maybe your is better :)


-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-10 Thread Jaromir Coufal

On 2013/09/12 23:38, Tzu-Mainn Chen wrote:

Thanks for the explanation!

I'm going to claim that the thread revolves around two main areas of 
disagreement.  Then I'm going
to propose a way through:

a) Manual Node Assignment

I think that everyone is agreed that automated node assignment through 
nova-scheduler is by
far the most ideal case; there's no disagreement there.


+1


The disagreement comes from whether we need manual node assignment or not.  I 
would argue that we
need to step back and take a look at the real use case: heterogeneous nodes.  
If there are literally
no characteristics that differentiate nodes A and B, then why do we care which 
gets used for what?  Why
do we need to manually assign one?


Ideally, we don't. But with this approach we would take out the 
possibility to change something or decide something from the user.


The 'easiest' way is to support bigger companies with huge deployments, 
tailored infrastructure, everything connected properly.


But there are tons of companies/users who are running on old 
heterogeneous hardware. Very likely even more than the number of 
companies having already mentioned large deployments. And giving them 
only the way of 'setting up rules' in order to get the service on the 
node - this type of user is not gonna use our deployment system.


Somebody might argue - why do we care? If user doesn't like TripleO 
paradigm, he shouldn't use the UI and should use another tool. But the 
UI is not only about TripleO. Yes, it is underlying concept, but we are 
working on future *official* OpenStack deployment tool. We should care 
to enable people to deploy OpenStack - large/small scale, 
homo/heterogeneous hardware, typical or a bit more specific use-cases.


As an underlying paradigm of how to install cloud - awesome idea, 
awesome concept, it works. But user doesn't care about how it is being 
deployed for him. He cares about getting what he wants/needs. And we 
shouldn't go that far that we violently force him to treat his 
infrastructure as cloud. I believe that possibility to change/control - 
if needed - is very important and we should care.


And what is key for us is to *enable* users - not to prevent them from 
using our deployment tool, because it doesn't work for their requirements.




If we can agree on that, then I think it would be sufficient to say that we 
want a mechanism to allow
UI users to deal with heterogeneous nodes, and that mechanism must use 
nova-scheduler.  In my mind,
that's what resource classes and node profiles are intended for.


Not arguing on this point. Though that mechanism should support also 
cases, where user specifies a role for a node / removes node from a 
role. The rest of nodes which I don't care about should be handled by 
nova-scheduler.



One possible objection might be: nova scheduler doesn't have the appropriate 
filter that we need to
separate out two nodes.  In that case, I would say that needs to be taken up 
with nova developers.


Give it to Nova guys to fix it... What if that user's need would be 
undercloud specific requirement?  Why should Nova guys care? What should 
our unhappy user do until then? Use other tool? Will he be willing to 
get back to use our tool once it is ready?


I can also see other use-cases. It can be distribution based on power 
sockets, networking connections, etc. We can't think about all the ways 
which our user will need.




b) Terminology

It feels a bit like some of the disagreement come from people using different 
words for the same thing.
For example, the wireframes already details a UI where Robert's roles come 
first, but I think that message
was confused because I mentioned node types in the requirements.

So could we come to some agreement on what the most exact terminology would be? 
 I've listed some examples below,
but I'm sure there are more.

node type | role

+1 role


management node | ?
resource node | ?
unallocated | aqvailable | undeployed

+1 unallocated


ceate a node distribution | size the deployment

* Distribute nodes


resource classes | ?

Service classes?


node profiles | ?




So when we talk about 'unallocated Nodes', the implication is that
users 'allocate Nodes', but they don't: they size roles, and after
doing all that there may be some Nodes that are - yes - unallocated,
or have nothing scheduled to them. So... I'm not debating that we
should have a list of free hardware - we totally should - I'm debating
how we frame it. 'Available Nodes' or 'Undeployed machines' or
whatever.
The allocation can happen automatically, so from my point of view I 
don't see big problem with 'allocate' term.



I just want to get away from talking about something
([manual] allocation) that we don't offer.

We don't at the moment but we should :)

-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-10 Thread Tzu-Mainn Chen
Thanks for the reply!  Comments in-line:

  The disagreement comes from whether we need manual node assignment or not.
  I would argue that we
  need to step back and take a look at the real use case: heterogeneous
  nodes.  If there are literally
  no characteristics that differentiate nodes A and B, then why do we care
  which gets used for what?  Why
  do we need to manually assign one?
 
 Ideally, we don't. But with this approach we would take out the
 possibility to change something or decide something from the user.
 
 The 'easiest' way is to support bigger companies with huge deployments,
 tailored infrastructure, everything connected properly.
 
 But there are tons of companies/users who are running on old
 heterogeneous hardware. Very likely even more than the number of
 companies having already mentioned large deployments. And giving them
 only the way of 'setting up rules' in order to get the service on the
 node - this type of user is not gonna use our deployment system.
 
 Somebody might argue - why do we care? If user doesn't like TripleO
 paradigm, he shouldn't use the UI and should use another tool. But the
 UI is not only about TripleO. Yes, it is underlying concept, but we are
 working on future *official* OpenStack deployment tool. We should care
 to enable people to deploy OpenStack - large/small scale,
 homo/heterogeneous hardware, typical or a bit more specific use-cases.

I think this is a very important clarification, and I'm glad you made it.  It 
sounds
like manual assignment is actually a sub-requirement, and the feature you're 
arguing
for is: supporting non-TripleO deployments.

That might be a worthy goal, but I think it's a distraction for the Icehouse 
timeframe.
Each new deployment strategy requires not only a new UI, but different 
deployment
architectures that could have very little common with each other.  Designing 
them all
to work in the same space is a recipe for disaster, a convoluted gnarl of code 
that
doesn't do any one thing particularly well.  To use an analogy: there's a 
reason why
no one makes a flying boat car.

I'm going to strongly advocate that for Icehouse, we focus exclusively on large 
scale
TripleO deployments, working to make that UI and architecture as sturdy as we 
can.  Future
deployment strategies should be discussed in the future, and if they're not 
TripleO based,
they should be discussed with the proper OpenStack group.


 As an underlying paradigm of how to install cloud - awesome idea,
 awesome concept, it works. But user doesn't care about how it is being
 deployed for him. He cares about getting what he wants/needs. And we
 shouldn't go that far that we violently force him to treat his
 infrastructure as cloud. I believe that possibility to change/control -
 if needed - is very important and we should care.
 
 And what is key for us is to *enable* users - not to prevent them from
 using our deployment tool, because it doesn't work for their requirements.
 
 
  If we can agree on that, then I think it would be sufficient to say that we
  want a mechanism to allow
  UI users to deal with heterogeneous nodes, and that mechanism must use
  nova-scheduler.  In my mind,
  that's what resource classes and node profiles are intended for.
 
 Not arguing on this point. Though that mechanism should support also
 cases, where user specifies a role for a node / removes node from a
 role. The rest of nodes which I don't care about should be handled by
 nova-scheduler.
 
  One possible objection might be: nova scheduler doesn't have the
  appropriate filter that we need to
  separate out two nodes.  In that case, I would say that needs to be taken
  up with nova developers.
 
 Give it to Nova guys to fix it... What if that user's need would be
 undercloud specific requirement?  Why should Nova guys care? What should
 our unhappy user do until then? Use other tool? Will he be willing to
 get back to use our tool once it is ready?
 
 I can also see other use-cases. It can be distribution based on power
 sockets, networking connections, etc. We can't think about all the ways
 which our user will need.

In this case - it would be our job to make the Nova guys care and to work with 
them to develop
the feature.  Creating parallel services with the same fundamental purpose - I 
think that
runs counter to what OpenStack is designed for.

 
  b) Terminology
 
  It feels a bit like some of the disagreement come from people using
  different words for the same thing.
  For example, the wireframes already details a UI where Robert's roles come
  first, but I think that message
  was confused because I mentioned node types in the requirements.
 
  So could we come to some agreement on what the most exact terminology would
  be?  I've listed some examples below,
  but I'm sure there are more.
 
  node type | role
 +1 role
 
  management node | ?
  resource node | ?
  unallocated | aqvailable | undeployed
 +1 unallocated
 
  ceate a node distribution | size the 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-10 Thread Jay Dobies

Thanks for the explanation!

I'm going to claim that the thread revolves around two main areas of 
disagreement.  Then I'm going
to propose a way through:

a) Manual Node Assignment

I think that everyone is agreed that automated node assignment through 
nova-scheduler is by
far the most ideal case; there's no disagreement there.

The disagreement comes from whether we need manual node assignment or not.  I 
would argue that we
need to step back and take a look at the real use case: heterogeneous nodes.  
If there are literally
no characteristics that differentiate nodes A and B, then why do we care which 
gets used for what?  Why
do we need to manually assign one?


This is a better way of verbalizing my concerns. I suspect there are 
going to be quite a few heterogeneous environments built from legacy 
pieces in the near term and fewer built from the ground up with all new 
matching hotness.


On the other side of it, instead of handling legacy hardware I was 
worried about the new hotness (not sure why I keep using that term) 
specialized for a purpose. This is exactly what Robert described in his 
GPU example. I think his explanation of how to use the scheduler to 
accommodate that makes a lot of sense, so I'm much less behind the idea 
of a strict manual assignment than I previously was.



If we can agree on that, then I think it would be sufficient to say that we 
want a mechanism to allow
UI users to deal with heterogeneous nodes, and that mechanism must use 
nova-scheduler.  In my mind,
that's what resource classes and node profiles are intended for.

One possible objection might be: nova scheduler doesn't have the appropriate 
filter that we need to
separate out two nodes.  In that case, I would say that needs to be taken up 
with nova developers.


b) Terminology

It feels a bit like some of the disagreement come from people using different 
words for the same thing.
For example, the wireframes already details a UI where Robert's roles come 
first, but I think that message
was confused because I mentioned node types in the requirements.

So could we come to some agreement on what the most exact terminology would be? 
 I've listed some examples below,
but I'm sure there are more.

node type | role
management node | ?
resource node | ?
unallocated | available | undeployed
create a node distribution | size the deployment
resource classes | ?
node profiles | ?

Mainn

- Original Message -

On 10 December 2013 09:55, Tzu-Mainn Chen tzuma...@redhat.com wrote:

* created as part of undercloud install process



By that note I meant, that Nodes are not resources, Resource instances
run on Nodes. Nodes are the generic pool of hardware we can deploy
things onto.


I don't think resource nodes is intended to imply that nodes are
resources; rather, it's supposed to
indicate that it's a node where a resource instance runs.  It's supposed to
separate it from management node
and unallocated node.


So the question is are we looking at /nodes/ that have a /current
role/, or are we looking at /roles/ that have some /current nodes/.

My contention is that the role is the interesting thing, and the nodes
is the incidental thing. That is, as a sysadmin, my hierarchy of
concerns is something like:
  A: are all services running
  B: are any of them in a degraded state where I need to take prompt
action to prevent a service outage [might mean many things: - software
update/disk space criticals/a machine failed and we need to scale the
cluster back up/too much load]
  C: are there any planned changes I need to make [new software deploy,
feature request from user, replacing a faulty machine]
  D: are there long term issues sneaking up on me [capacity planning,
machine obsolescence]

If we take /nodes/ as the interesting thing, and what they are doing
right now as the incidental thing, it's much harder to map that onto
the sysadmin concerns. If we start with /roles/ then can answer:
  A: by showing the list of roles and the summary stats (how many
machines, service status aggregate), role level alerts (e.g. nova-api
is not responding)
  B: by showing the list of roles and more detailed stats (overall
load, response times of services, tickets against services
  and a list of in trouble instances in each role - instances with
alerts against them - low disk, overload, failed service,
early-detection alerts from hardware
  C: probably out of our remit for now in the general case, but we need
to enable some things here like replacing faulty machines
  D: by looking at trend graphs for roles (not machines), but also by
looking at the hardware in aggregate - breakdown by age of machines,
summary data for tickets filed against instances that were deployed to
a particular machine

C: and D: are (F) category work, but for all but the very last thing,
it seems clear how to approach this from a roles perspective.

I've tried to approach this using /nodes/ as the starting point, and
after two terrible drafts 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-10 Thread Robert Collins
On 11 December 2013 05:42, Jaromir Coufal jcou...@redhat.com wrote:
 On 2013/09/12 23:38, Tzu-Mainn Chen wrote:
 The disagreement comes from whether we need manual node assignment or not.
 I would argue that we
 need to step back and take a look at the real use case: heterogeneous
 nodes.  If there are literally
 no characteristics that differentiate nodes A and B, then why do we care
 which gets used for what?  Why
 do we need to manually assign one?


 Ideally, we don't. But with this approach we would take out the possibility
 to change something or decide something from the user.

So, I think this is where the confusion is. Using the nova scheduler
doesn't prevent change or control. It just ensures the change and
control happen in the right place: the Nova scheduler has had years of
work, of features and facilities being added to support HPC, HA and
other such use cases. It should have everything we need [1], without
going down to manual placement. For clarity: manual placement is when
any of the user, Tuskar, or Heat query Ironic, select a node, and then
use a scheduler hint to bypass the scheduler.

 The 'easiest' way is to support bigger companies with huge deployments,
 tailored infrastructure, everything connected properly.

 But there are tons of companies/users who are running on old heterogeneous
 hardware. Very likely even more than the number of companies having already
 mentioned large deployments. And giving them only the way of 'setting up
 rules' in order to get the service on the node - this type of user is not
 gonna use our deployment system.

Thats speculation. We don't know if they will or will not because we
haven't given them a working system to test.

Lets break the concern into two halves:
A) Users who could have their needs met, but won't use TripleO because
meeting their needs in this way is too hard/complex/painful.

B) Users who have a need we cannot meet with the current approach.

For category B users, their needs might be specific HA things - like
the oft discussed failure domains angle, where we need to split up HA
clusters across power bars, aircon, switches etc. Clearly long term we
want to support them, and the undercloud Nova scheduler is entirely
capable of being informed about this, and we can evolve to a holistic
statement over time. Lets get a concrete list of the cases we can
think of today that won't be well supported initially, and we can
figure out where to do the work to support them properly.

For category A users, I think that we should get concrete examples,
and evolve our design (architecture and UX) to make meeting those
needs pleasant.

What we shouldn't do is plan complex work without concrete examples
that people actually need. Jay's example of some shiny new compute
servers with special parts that need to be carved out was a great one
- we can put that in category A, and figure out if it's easy enough,
or obvious enough - and think about whether we document it or make it
a guided workflow or $whatever.

 Somebody might argue - why do we care? If user doesn't like TripleO
 paradigm, he shouldn't use the UI and should use another tool. But the UI is
 not only about TripleO. Yes, it is underlying concept, but we are working on
 future *official* OpenStack deployment tool. We should care to enable people
 to deploy OpenStack - large/small scale, homo/heterogeneous hardware,
 typical or a bit more specific use-cases.

The difficulty I'm having is that the discussion seems to assume that
'heterogeneous implies manual', but I don't agree that that
implication is necessary!

 As an underlying paradigm of how to install cloud - awesome idea, awesome
 concept, it works. But user doesn't care about how it is being deployed for
 him. He cares about getting what he wants/needs. And we shouldn't go that
 far that we violently force him to treat his infrastructure as cloud. I
 believe that possibility to change/control - if needed - is very important
 and we should care.

I propose that we make concrete use cases: 'Fred cannot use TripleO
without manual assignment because XYZ'. Then we can assess how
important XYZ is to our early adopters and go from there.

 And what is key for us is to *enable* users - not to prevent them from using
 our deployment tool, because it doesn't work for their requirements.

Totally agreed :)

 If we can agree on that, then I think it would be sufficient to say that
 we want a mechanism to allow
 UI users to deal with heterogeneous nodes, and that mechanism must use
 nova-scheduler.  In my mind,
 that's what resource classes and node profiles are intended for.


 Not arguing on this point. Though that mechanism should support also cases,
 where user specifies a role for a node / removes node from a role. The rest
 of nodes which I don't care about should be handled by nova-scheduler.

Why! What is a use case for removing a role from a node while leaving
that node in service? Lets be specific, always, when we're using
categories of use 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Jay Dobies



On 12/06/2013 09:39 PM, Tzu-Mainn Chen wrote:

Thanks for the comments and questions!  I fully expect that this list of 
requirements
will need to be fleshed out, refined, and heavily modified, so the more the 
merrier.

Comments inline:



*** Requirements are assumed to be targeted for Icehouse, unless marked
otherwise:
(M) - Maybe Icehouse, dependency on other in-development features
(F) - Future requirement, after Icehouse

* NODES


Note that everything in this section should be Ironic API calls.


* Creation
   * Manual registration
  * hardware specs from Ironic based on mac address (M)


Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory
stats


  * IP auto populated from Neutron (F)


Do you mean IPMI IP ? I'd say IPMI address managed by Neutron here.


   * Auto-discovery during undercloud install process (M)
* Monitoring
* assignment, availability, status
* capacity, historical statistics (M)


Why is this under 'nodes'? I challenge the idea that it should be
there. We will need to surface some stuff about nodes, but the
underlying idea is to take a cloud approach here - so we're monitoring
services, that happen to be on nodes. There is room to monitor nodes,
as an undercloud feature set, but lets be very very specific about
what is sitting at what layer.


That's a fair point.  At the same time, the UI does want to monitor both
services and the nodes that the services are running on, correct?  I would
think that a user would want this.

Would it be better to explicitly split this up into two separate requirements?


That was my understanding as well, that Tuskar would not only care about 
the services of the undercloud but the health of the actual hardware on 
which it's running. As I write that I think you're correct, two separate 
requirements feels much more explicit in how that's different from 
elsewhere in OpenStack.



* Management node (where triple-o is installed)


This should be plural :) - TripleO isn't a single service to be
installed - We've got Tuskar, Ironic, Nova, Glance, Keystone, Neutron,
etc.


I misspoke here - this should be where the undercloud is installed.  My
current understanding is that our initial release will only support the 
undercloud
being installed onto a single node, but my understanding could very well be 
flawed.


* created as part of undercloud install process
* can create additional management nodes (F)
 * Resource nodes


 ^ nodes is again confusing layers - nodes are
what things are deployed to, but they aren't the entry point


 * searchable by status, name, cpu, memory, and all attributes from
 ironic
 * can be allocated as one of four node types


Not by users though. We need to stop thinking of this as 'what we do
to nodes' - Nova/Ironic operate on nodes, we operate on Heat
templates.


Right, I didn't mean to imply that users would be doing this allocation.  But 
once Nova
does this allocation, the UI does want to be aware of how the allocation is 
done, right?
That's what this requirement meant.


 * compute
 * controller
 * object storage
 * block storage
 * Resource class - allows for further categorization of a node type
 * each node type specifies a single default resource class
 * allow multiple resource classes per node type (M)


Whats a node type?


Compute/controller/object storage/block storage.  Is another term besides node 
type
more accurate?




 * optional node profile for a resource class (M)
 * acts as filter for nodes that can be allocated to that
 class (M)


I'm not clear on this - you can list the nodes that have had a
particular thing deployed on them; we probably can get a good answer
to being able to see what nodes a particular flavor can deploy to, but
we don't want to be second guessing the scheduler..


Correct; the goal here is to provide a way through the UI to send additional 
filtering
requirements that will eventually be passed into the scheduler, allowing the 
scheduler
to apply additional filters.


 * nodes can be viewed by node types
 * additional group by status, hardware specification


*Instances* - e.g. hypervisors, storage, block storage etc.


 * controller node type


Again, need to get away from node type here.


* each controller node will run all openstack services
   * allow each node to run specified service (F)
* breakdown by workload (percentage of cpu used per node) (M)
 * Unallocated nodes


This implies an 'allocation' step, that we don't have - how about
'Idle nodes' or something.


Is it imprecise to say that nodes are allocated by the scheduler?  Would 
something like
'active/idle' be better?


 * Archived nodes (F)
 * Will be 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread mar...@redhat.com
On 09/12/13 18:01, Jay Dobies wrote:
 I believe we are still 'fighting' here with two approaches and I believe
 we need both. We can't only provide a way 'give us resources we will do
 a magic'. Yes this is preferred way - especially for large deployments,
 but we also need a fallback so that user can say - no, this node doesn't
 belong to the class, I don't want it there - unassign. Or I need to have
 this node there - assign.
 
 +1 to this. I think there are still a significant amount of admins out
 there that are really opposed to magic and want that fine-grained
 control. Even if they don't use it that frequently, in my experience
 they want to know it's there in the event they need it (and will often
 dream up a case that they'll need it).

+1 to the responses to the 'automagic' vs 'manual' discussion. The
latter is in fact only really possible in small deployments. But that's
not to say it is not a valid use case. Perhaps we need to split it
altogether into two use cases.

At least we should have a level of agreement here and register
blueprints for both: for Icehouse the auto selection of which services
go onto which nodes (i.e. allocation of services to nodes is entirely
transparent). For post Icehouse allow manual allocation of services to
nodes. This last bit may also coincide with any work being done in
Ironic/Nova scheduler which will make this allocation prettier than the
current force_nodes situation.


 
 I'm absolutely for pushing the magic approach as the preferred use. And
 in large deployments that's where people are going to see the biggest
 gain. The fine-grained approach can even be pushed off as a future
 feature. But I wouldn't be surprised to see people asking for it and I'd
 like to at least be able to say it's been talked about.
 
 - As an infrastructure administrator, Anna wants to be able to view
 the history of nodes that have been in a deployment.
 Why? This is super generic and could mean anything.
 I believe this has something to do with 'archived nodes'. But correct me
 if I am wrong.

 -- Jarda


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Tzu-Mainn Chen
  - As an infrastructure administrator, Anna wants to be able to unallocate a
  node from a deployment.
 
  Why? Whats her motivation. One plausible one for me is 'a machine
 
  needs to be serviced so Anna wants to remove it from the deployment to
 
  avoid causing user visible downtime.'  So lets say that: Anna needs to
 
  be able to take machines out of service so they can be maintained or
 
  disposed of.
 

 Node being serviced is a different user story for me.

 I believe we are still 'fighting' here with two approaches and I believe we
 need both. We can't only provide a way 'give us resources we will do a
 magic'. Yes this is preferred way - especially for large deployments, but we
 also need a fallback so that user can say - no, this node doesn't belong to
 the class, I don't want it there - unassign. Or I need to have this node
 there - assign.
Just for clarification - the wireframes don't cover individual nodes being 
manually assigned, do they? I thought the concession to manual control was 
entirely through resource classes and node profiles, which are still parameters 
to be passed through to the nova-scheduler filter. To me, that's very different 
from manual assignment. 

Mainn 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Robert Collins
On 9 December 2013 23:56, Jaromir Coufal jcou...@redhat.com wrote:

 On 2013/07/12 01:59, Robert Collins wrote:

* Creation
   * Manual registration
  * hardware specs from Ironic based on mac address (M)

 Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory
 stats

 For registration it is just Management MAC address which is needed right? Or
 does Ironic need also IP? I think that MAC address might be enough, we can
 display IP in details of node later on.

Ironic needs all the details I listed today. Management MAC is not
currently used at all, but would be needed in future when we tackle
IPMI IP managed by Neutron.

* Auto-discovery during undercloud install process (M)
* Monitoring
* assignment, availability, status
* capacity, historical statistics (M)

 Why is this under 'nodes'? I challenge the idea that it should be
 there. We will need to surface some stuff about nodes, but the
 underlying idea is to take a cloud approach here - so we're monitoring
 services, that happen to be on nodes. There is room to monitor nodes,
 as an undercloud feature set, but lets be very very specific about
 what is sitting at what layer.

 We need both - we need to track services but also state of nodes (CPU, RAM,
 Network bandwidth, etc). So in node detail you should be able to track both.

Those are instance characteristics, not node characteristics. An
instance is software running on a Node, and the amount of CPU/RAM/NIC
utilisation is specific to that software while it's on that Node, not
to future or past instances running on that Node.

* created as part of undercloud install process
* can create additional management nodes (F)
 * Resource nodes

 ^ nodes is again confusing layers - nodes are
 what things are deployed to, but they aren't the entry point

 Can you, please be a bit more specific here? I don't understand this note.

By the way, can you get your email client to insert  before the text
you are replying to rather than HTML | marks? Hard to tell what I
wrote and what you did :).

By that note I meant, that Nodes are not resources, Resource instances
run on Nodes. Nodes are the generic pool of hardware we can deploy
things onto.

 * searchable by status, name, cpu, memory, and all attributes from
 ironic
 * can be allocated as one of four node types

 Not by users though. We need to stop thinking of this as 'what we do
 to nodes' - Nova/Ironic operate on nodes, we operate on Heat
 templates.

 Discussed in other threads, but I still believe (and I am not alone) that we
 need to allow 'force nodes'.

I'll respond in the other thread :).

 * Unallocated nodes

 This implies an 'allocation' step, that we don't have - how about
 'Idle nodes' or something.

 It can be auto-allocation. I don't see problem with 'unallocated' term.

Ok, it's not a biggy. I do think it will frame things poorly and lead
to an expectation about how TripleO works that doesn't match how it
does, but we can change it later if I'm right, and if I'm wrong, well
it won't be the first time :).

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Tzu-Mainn Chen
 * created as part of undercloud install process
 * can create additional management nodes (F)
  * Resource nodes
 
  ^ nodes is again confusing layers - nodes are
  what things are deployed to, but they aren't the entry point
 
  Can you, please be a bit more specific here? I don't understand this note.
 
 By the way, can you get your email client to insert  before the text
 you are replying to rather than HTML | marks? Hard to tell what I
 wrote and what you did :).
 
 By that note I meant, that Nodes are not resources, Resource instances
 run on Nodes. Nodes are the generic pool of hardware we can deploy
 things onto.

I don't think resource nodes is intended to imply that nodes are resources; 
rather, it's supposed to
indicate that it's a node where a resource instance runs.  It's supposed to 
separate it from management node
and unallocated node.

  * Unallocated nodes
 
  This implies an 'allocation' step, that we don't have - how about
  'Idle nodes' or something.
 
  It can be auto-allocation. I don't see problem with 'unallocated' term.
 
 Ok, it's not a biggy. I do think it will frame things poorly and lead
 to an expectation about how TripleO works that doesn't match how it
 does, but we can change it later if I'm right, and if I'm wrong, well
 it won't be the first time :).
 

I'm interested in what the distinction you're making here is.  I'd rather get 
things
defined correctly the first time, and it's very possible that I'm missing a 
fundamental
definition here.


Mainn

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Robert Collins
On 10 December 2013 09:55, Tzu-Mainn Chen tzuma...@redhat.com wrote:
 * created as part of undercloud install process

 By that note I meant, that Nodes are not resources, Resource instances
 run on Nodes. Nodes are the generic pool of hardware we can deploy
 things onto.

 I don't think resource nodes is intended to imply that nodes are resources; 
 rather, it's supposed to
 indicate that it's a node where a resource instance runs.  It's supposed to 
 separate it from management node
 and unallocated node.

So the question is are we looking at /nodes/ that have a /current
role/, or are we looking at /roles/ that have some /current nodes/.

My contention is that the role is the interesting thing, and the nodes
is the incidental thing. That is, as a sysadmin, my hierarchy of
concerns is something like:
 A: are all services running
 B: are any of them in a degraded state where I need to take prompt
action to prevent a service outage [might mean many things: - software
update/disk space criticals/a machine failed and we need to scale the
cluster back up/too much load]
 C: are there any planned changes I need to make [new software deploy,
feature request from user, replacing a faulty machine]
 D: are there long term issues sneaking up on me [capacity planning,
machine obsolescence]

If we take /nodes/ as the interesting thing, and what they are doing
right now as the incidental thing, it's much harder to map that onto
the sysadmin concerns. If we start with /roles/ then can answer:
 A: by showing the list of roles and the summary stats (how many
machines, service status aggregate), role level alerts (e.g. nova-api
is not responding)
 B: by showing the list of roles and more detailed stats (overall
load, response times of services, tickets against services
 and a list of in trouble instances in each role - instances with
alerts against them - low disk, overload, failed service,
early-detection alerts from hardware
 C: probably out of our remit for now in the general case, but we need
to enable some things here like replacing faulty machines
 D: by looking at trend graphs for roles (not machines), but also by
looking at the hardware in aggregate - breakdown by age of machines,
summary data for tickets filed against instances that were deployed to
a particular machine

C: and D: are (F) category work, but for all but the very last thing,
it seems clear how to approach this from a roles perspective.

I've tried to approach this using /nodes/ as the starting point, and
after two terrible drafts I've deleted the section. I'd love it if
someone could show me how it would work:)

  * Unallocated nodes
 
  This implies an 'allocation' step, that we don't have - how about
  'Idle nodes' or something.
 
  It can be auto-allocation. I don't see problem with 'unallocated' term.

 Ok, it's not a biggy. I do think it will frame things poorly and lead
 to an expectation about how TripleO works that doesn't match how it
 does, but we can change it later if I'm right, and if I'm wrong, well
 it won't be the first time :).


 I'm interested in what the distinction you're making here is.  I'd rather get 
 things
 defined correctly the first time, and it's very possible that I'm missing a 
 fundamental
 definition here.

So we have:
 - node - a physical general purpose machine capable of running in
many roles. Some nodes may have hardware layout that is particularly
useful for a given role.
 - role - a specific workload we want to map onto one or more nodes.
Examples include 'undercloud control plane', 'overcloud control
plane', 'overcloud storage', 'overcloud compute' etc.
 - instance - A role deployed on a node - this is where work actually happens.
 - scheduling - the process of deciding which role is deployed on which node.

The way TripleO works is that we defined a Heat template that lays out
policy: 5 instances of 'overcloud control plane please', '20
hypervisors' etc. Heat passes that to Nova, which pulls the image for
the role out of Glance, picks a node, and deploys the image to the
node.

Note in particular the order: Heat - Nova - Scheduler - Node chosen.

The user action is not 'allocate a Node to 'overcloud control plane',
it is 'size the control plane through heat'.

So when we talk about 'unallocated Nodes', the implication is that
users 'allocate Nodes', but they don't: they size roles, and after
doing all that there may be some Nodes that are - yes - unallocated,
or have nothing scheduled to them. So... I'm not debating that we
should have a list of free hardware - we totally should - I'm debating
how we frame it. 'Available Nodes' or 'Undeployed machines' or
whatever. I just want to get away from talking about something
([manual] allocation) that we don't offer.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Jay Dobies




So the question is are we looking at /nodes/ that have a /current
role/, or are we looking at /roles/ that have some /current nodes/.

My contention is that the role is the interesting thing, and the nodes
is the incidental thing. That is, as a sysadmin, my hierarchy of
concerns is something like:
  A: are all services running
  B: are any of them in a degraded state where I need to take prompt
action to prevent a service outage [might mean many things: - software
update/disk space criticals/a machine failed and we need to scale the
cluster back up/too much load]
  C: are there any planned changes I need to make [new software deploy,
feature request from user, replacing a faulty machine]
  D: are there long term issues sneaking up on me [capacity planning,
machine obsolescence]

If we take /nodes/ as the interesting thing, and what they are doing
right now as the incidental thing, it's much harder to map that onto
the sysadmin concerns. If we start with /roles/ then can answer:
  A: by showing the list of roles and the summary stats (how many
machines, service status aggregate), role level alerts (e.g. nova-api
is not responding)
  B: by showing the list of roles and more detailed stats (overall
load, response times of services, tickets against services
  and a list of in trouble instances in each role - instances with
alerts against them - low disk, overload, failed service,
early-detection alerts from hardware
  C: probably out of our remit for now in the general case, but we need
to enable some things here like replacing faulty machines
  D: by looking at trend graphs for roles (not machines), but also by
looking at the hardware in aggregate - breakdown by age of machines,
summary data for tickets filed against instances that were deployed to
a particular machine

C: and D: are (F) category work, but for all but the very last thing,
it seems clear how to approach this from a roles perspective.

I've tried to approach this using /nodes/ as the starting point, and
after two terrible drafts I've deleted the section. I'd love it if
someone could show me how it would work:)


 * Unallocated nodes

This implies an 'allocation' step, that we don't have - how about
'Idle nodes' or something.

It can be auto-allocation. I don't see problem with 'unallocated' term.


Ok, it's not a biggy. I do think it will frame things poorly and lead
to an expectation about how TripleO works that doesn't match how it
does, but we can change it later if I'm right, and if I'm wrong, well
it won't be the first time :).



I'm interested in what the distinction you're making here is.  I'd rather get 
things
defined correctly the first time, and it's very possible that I'm missing a 
fundamental
definition here.


So we have:
  - node - a physical general purpose machine capable of running in
many roles. Some nodes may have hardware layout that is particularly
useful for a given role.
  - role - a specific workload we want to map onto one or more nodes.
Examples include 'undercloud control plane', 'overcloud control
plane', 'overcloud storage', 'overcloud compute' etc.
  - instance - A role deployed on a node - this is where work actually happens.
  - scheduling - the process of deciding which role is deployed on which node.


This glossary is really handy to make sure we're all speaking the same 
language.



The way TripleO works is that we defined a Heat template that lays out
policy: 5 instances of 'overcloud control plane please', '20
hypervisors' etc. Heat passes that to Nova, which pulls the image for
the role out of Glance, picks a node, and deploys the image to the
node.

Note in particular the order: Heat - Nova - Scheduler - Node chosen.

The user action is not 'allocate a Node to 'overcloud control plane',
it is 'size the control plane through heat'.

So when we talk about 'unallocated Nodes', the implication is that
users 'allocate Nodes', but they don't: they size roles, and after
doing all that there may be some Nodes that are - yes - unallocated,


I'm not sure if I should ask this here or to your point above, but what 
about multi-role nodes? Is there any piece in here that says The policy 
wants 5 instances but I can fit two of them on this existing 
underutilized node and three of them on unallocated nodes or since it's 
all at the image level you get just what's in the image and that's the 
finest-level of granularity?



or have nothing scheduled to them. So... I'm not debating that we
should have a list of free hardware - we totally should - I'm debating
how we frame it. 'Available Nodes' or 'Undeployed machines' or
whatever. I just want to get away from talking about something
([manual] allocation) that we don't offer.


My only concern here is that we're not talking about cloud users, we're 
talking about admins adminning (we'll pretend it's a word, come with me) 
a cloud. To a cloud user, give me some power so I can do some stuff is 
a safe use case if I trust the cloud I'm running on. I trust 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Robert Collins
On 10 December 2013 10:57, Jay Dobies jason.dob...@redhat.com wrote:


 So we have:
   - node - a physical general purpose machine capable of running in
 many roles. Some nodes may have hardware layout that is particularly
 useful for a given role.
   - role - a specific workload we want to map onto one or more nodes.
 Examples include 'undercloud control plane', 'overcloud control
 plane', 'overcloud storage', 'overcloud compute' etc.
   - instance - A role deployed on a node - this is where work actually
 happens.
   - scheduling - the process of deciding which role is deployed on which
 node.


 This glossary is really handy to make sure we're all speaking the same
 language.


 The way TripleO works is that we defined a Heat template that lays out
 policy: 5 instances of 'overcloud control plane please', '20
 hypervisors' etc. Heat passes that to Nova, which pulls the image for
 the role out of Glance, picks a node, and deploys the image to the
 node.

 Note in particular the order: Heat - Nova - Scheduler - Node chosen.

 The user action is not 'allocate a Node to 'overcloud control plane',
 it is 'size the control plane through heat'.

 So when we talk about 'unallocated Nodes', the implication is that
 users 'allocate Nodes', but they don't: they size roles, and after
 doing all that there may be some Nodes that are - yes - unallocated,


 I'm not sure if I should ask this here or to your point above, but what
 about multi-role nodes? Is there any piece in here that says The policy
 wants 5 instances but I can fit two of them on this existing underutilized
 node and three of them on unallocated nodes or since it's all at the image
 level you get just what's in the image and that's the finest-level of
 granularity?

The way we handle that today is to create a composite role that says
'overcloud-compute+cinder storage', for instance - because image is
the level of granularity. If/when we get automatic container
subdivision - see the other really interesting long-term thread - we
could subdivide, but I'd still do that using image as the level of
granularity, it's just that we'd have the host image + the container
images.

 or have nothing scheduled to them. So... I'm not debating that we
 should have a list of free hardware - we totally should - I'm debating
 how we frame it. 'Available Nodes' or 'Undeployed machines' or
 whatever. I just want to get away from talking about something
 ([manual] allocation) that we don't offer.


 My only concern here is that we're not talking about cloud users, we're
 talking about admins adminning (we'll pretend it's a word, come with me) a
 cloud. To a cloud user, give me some power so I can do some stuff is a
 safe use case if I trust the cloud I'm running on. I trust that the cloud
 provider has taken the proper steps to ensure that my CPU isn't in New York
 and my storage in Tokyo.

Sure :)

 To the admin setting up an overcloud, they are the ones providing that trust
 to eventual cloud users. That's where I feel like more visibility and
 control are going to be desired/appreciated.

 I admit what I just said isn't at all concrete. Might even be flat out
 wrong. I was never an admin, I've just worked on sys management software
 long enough to have the opinion that their levels of OCD are legendary. I
 can't shake this feeling that someone is going to slap some fancy new
 jacked-up piece of hardware onto the network and have a specific purpose
 they are going to want to use it for. But maybe that's antiquated thinking
 on my part.

I think concrete use cases are the only way we'll get light at the end
of the tunnel.

So lets say someone puts a new bit of fancy kit onto their network and
wants it for e.g. GPU VM instances only. Thats a reasonable desire.

The basic stuff we're talking about so far is just about saying each
role can run on some set of undercloud flavors. If that new bit of kit
has the same coarse metadata as other kit, Nova can't tell it apart.
So the way to solve the problem is:
 - a) teach Ironic about the specialness of the node (e.g. a tag 'GPU')
 - b) teach Nova that there is a flavor that maps to the presence of
that specialness, and
   c) teach Nova that other flavors may not map to that specialness

then in Tuskar whatever Nova configuration is needed to use that GPU
is a special role ('GPU compute' for instance) and only that role
would be given that flavor to use. That special config is probably
being in a host aggregate, with an overcloud flavor that specifies
that aggregate, which means at the TripleO level we need to put the
aggregate in the config metadata for that role, and the admin does a
one-time setup in the Nova Horizon UI to configure their GPU compute
flavor.

This isn't 'manual allocation' to me - it's surfacing the capabilities
from the bottom ('has GPU') and the constraints from the top ('needs
GPU') and letting Nova and Heat sort it out.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Tzu-Mainn Chen
Thanks for the explanation!

I'm going to claim that the thread revolves around two main areas of 
disagreement.  Then I'm going
to propose a way through:

a) Manual Node Assignment

I think that everyone is agreed that automated node assignment through 
nova-scheduler is by
far the most ideal case; there's no disagreement there.

The disagreement comes from whether we need manual node assignment or not.  I 
would argue that we
need to step back and take a look at the real use case: heterogeneous nodes.  
If there are literally
no characteristics that differentiate nodes A and B, then why do we care which 
gets used for what?  Why
do we need to manually assign one?

If we can agree on that, then I think it would be sufficient to say that we 
want a mechanism to allow
UI users to deal with heterogeneous nodes, and that mechanism must use 
nova-scheduler.  In my mind,
that's what resource classes and node profiles are intended for.

One possible objection might be: nova scheduler doesn't have the appropriate 
filter that we need to
separate out two nodes.  In that case, I would say that needs to be taken up 
with nova developers.


b) Terminology

It feels a bit like some of the disagreement come from people using different 
words for the same thing.
For example, the wireframes already details a UI where Robert's roles come 
first, but I think that message
was confused because I mentioned node types in the requirements.

So could we come to some agreement on what the most exact terminology would be? 
 I've listed some examples below,
but I'm sure there are more.

node type | role
management node | ?
resource node | ?
unallocated | available | undeployed
create a node distribution | size the deployment
resource classes | ?
node profiles | ?

Mainn

- Original Message -
 On 10 December 2013 09:55, Tzu-Mainn Chen tzuma...@redhat.com wrote:
  * created as part of undercloud install process
 
  By that note I meant, that Nodes are not resources, Resource instances
  run on Nodes. Nodes are the generic pool of hardware we can deploy
  things onto.
 
  I don't think resource nodes is intended to imply that nodes are
  resources; rather, it's supposed to
  indicate that it's a node where a resource instance runs.  It's supposed to
  separate it from management node
  and unallocated node.
 
 So the question is are we looking at /nodes/ that have a /current
 role/, or are we looking at /roles/ that have some /current nodes/.
 
 My contention is that the role is the interesting thing, and the nodes
 is the incidental thing. That is, as a sysadmin, my hierarchy of
 concerns is something like:
  A: are all services running
  B: are any of them in a degraded state where I need to take prompt
 action to prevent a service outage [might mean many things: - software
 update/disk space criticals/a machine failed and we need to scale the
 cluster back up/too much load]
  C: are there any planned changes I need to make [new software deploy,
 feature request from user, replacing a faulty machine]
  D: are there long term issues sneaking up on me [capacity planning,
 machine obsolescence]
 
 If we take /nodes/ as the interesting thing, and what they are doing
 right now as the incidental thing, it's much harder to map that onto
 the sysadmin concerns. If we start with /roles/ then can answer:
  A: by showing the list of roles and the summary stats (how many
 machines, service status aggregate), role level alerts (e.g. nova-api
 is not responding)
  B: by showing the list of roles and more detailed stats (overall
 load, response times of services, tickets against services
  and a list of in trouble instances in each role - instances with
 alerts against them - low disk, overload, failed service,
 early-detection alerts from hardware
  C: probably out of our remit for now in the general case, but we need
 to enable some things here like replacing faulty machines
  D: by looking at trend graphs for roles (not machines), but also by
 looking at the hardware in aggregate - breakdown by age of machines,
 summary data for tickets filed against instances that were deployed to
 a particular machine
 
 C: and D: are (F) category work, but for all but the very last thing,
 it seems clear how to approach this from a roles perspective.
 
 I've tried to approach this using /nodes/ as the starting point, and
 after two terrible drafts I've deleted the section. I'd love it if
 someone could show me how it would work:)
 
   * Unallocated nodes
  
   This implies an 'allocation' step, that we don't have - how about
   'Idle nodes' or something.
  
   It can be auto-allocation. I don't see problem with 'unallocated' term.
 
  Ok, it's not a biggy. I do think it will frame things poorly and lead
  to an expectation about how TripleO works that doesn't match how it
  does, but we can change it later if I'm right, and if I'm wrong, well
  it won't be the first time :).
 
 
  I'm interested in what the distinction you're making 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Jaromir Coufal


On 2013/06/12 21:26, Tzu-Mainn Chen wrote:

* can be allocated as one of four node types

It's pretty clear by the current verbiage but I'm going to ask anyway:
one and only one?

Yep, that's right!

Confirming. One and only one.


My gut reaction is that we want to bite this off sooner rather than
later. This will have data model and API implications that, even if we
don't commit to it for Icehouse, should still be in our minds during it,
so it might make sense to make it a first class thing to just nail down now.

That is entirely correct, which is one reason it's on the list of requirements. 
 The
forthcoming API design will have to account for it.  Not recreating the entire 
data
model between releases is a key goal :)
Well yeah, that's why we should try to think in a longer-term and 
wireframes are covering also a bit more than might land in Icehouse. So 
that we are aware of future direction and we don't have to completely 
rebuild underlying models later on.



  * optional node profile for a resource class (M)
  * acts as filter for nodes that can be allocated to that
  class (M)

To my understanding, once this is in Icehouse, we'll have to support
upgrades. If this filtering is pushed off, could we get into a situation
where an allocation created in Icehouse would no longer be valid in
Icehouse+1 once these filters are in place? If so, we might want to make
it more of a priority to get them in place earlier and not eat the
headache of addressing these sorts of integrity issues later.
Hm, can you be a bit more specific about how the allocation created in I 
might no longer be valid in I+1?



That's true.  The problem is that to my understanding, the filters we'd
need in nova-scheduler are not yet fully in place.
I think at the moment there are 'extra params' which we might use to 
some level. But yes, AFAIK there is missing part for filtered scheduling 
in nova.


I also think that this is an issue that we'll need to address no matter what.
Even once filters exist, if a user applies a filter *after* nodes are allocated,
we'll need to do something clever if the already-allocated nodes don't meet the
filter criteria.
Well here is a thing. Once nodes are allocated, you can get warning, 
that those nodes in the resource class are not fulfilling the criteria 
(if they were changed) but that's all. It will be up to user's decision 
if he wants to keep them in or unallocate them. The profiles are 
important when a decision 'which node can get in' is being made.



  * nodes can be viewed by node types
  * additional group by status, hardware specification
  * controller node type
 * each controller node will run all openstack services
* allow each node to run specified service (F)
 * breakdown by workload (percentage of cpu used per node) (M)
  * Unallocated nodes

Is there more still being flushed out here? Things like:
   * Listing unallocated nodes
   * Unallocating a previously allocated node (does this make it a
vanilla resource or does it retain the resource type? is this the only
way to change a node's resource type?)
If we use policy based approach then yes this is correct. First 
unallocate a node and then increase number of resources in other class.


But I believe that we need keep control over your infrastructure and not 
to relay only on policies. So I hope we can get into something like 
'reallocate'/'allocate manually' which will force a node to be part of 
specific class.



   * Unregistering nodes from Tuskar's inventory (I put this under
unallocated under the assumption that the workflow will be an explicit
unallocate before unregister; I'm not sure if this is the same as
archive below).

Ah, you're entirely right.  I'll add these to the list.


  * Archived nodes (F)

Can you elaborate a bit more on what this is?

To be honest, I'm a bit fuzzy about this myself; Jarda mentioned that there was
an OpenStack service in the process of being planned that would handle this
requirement.  Jarda, can you detail a bit?
So the thing is based on historical data. At the moment, there is no 
service which would keep this type of data (might be new project?). 
Since Tuskar will not be only deploying but also monitoring your 
deployment, it is important to have historical data available. If user 
removes some nodes from infrastructure, he would lose all the data and 
we would not be able to generate graphs.That's why archived nodes = 
nodes which were registered in past but are no longer available.


-- Jarda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Jaromir Coufal


On 2013/06/12 22:55, Matt Wagner wrote:

- As an infrastructure administrator, Anna wants to review the
distribution of the nodes that she has assigned before kicking off
the Deploy task.

What does she expect to see here on the review screen that she didn't
see on the previous screens, if anything? Is this just a summation, or
is she expecting to see things like which node will get which role? (I'd
argue for the former; I don't know that we can predict the latter.)
At the beginning, just summation. Later (when we have nova-scheduler 
reservation) we can get the real distribution of which node is taking 
which role.



- As an infrastructure administrator, Anna wants to monitor the
deployment process of all of the nodes that she has assigned.

I think there's an implied ...through the UI here, versus tailing log
files to watch state. Does she just expect to see states like Pending,
Deploying, or Finished, versus, say, having the full logs shown in
the UI? (I'd vote 'yes'.)
For simplified view - yes, only change of states and progress bar. 
However log should be available.



- As an infrastructure administrator, Anna needs to be able to
troubleshoot any errors that may occur during the deployment of nodes
process.

I'm not sure that the ...through the UI implication I mentioned above
extends here. (IMHO) I assume that if things fail, Anna might be okay
with us showing a message that $foo failed on $bar, and she should try
looking in /var/log/$baz for full details. Does that seem fair? (At
least early on.)
As said above, for simplified views, it is ok to say $foo failed on 
$bar, but she should be able to track the problem - logs section in the UI.



- As an infrastructure administrator, Anna wants to be able to view
the history of nodes that have been in a deployment.

Why does she want to view history of past nodes?

Note that I'm not arguing against this; it's just not abundantly clear
to me what she'll be using this information for. Does she want a history
to check off an Audit log checkbox, or will she be looking to extract
certain data from this history?

Short answer is Graphs - history of utilization of the class etc.

-- Jarda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Jaromir Coufal


On 2013/07/12 02:20, Robert Collins wrote:

- As an infrastructure administrator, Anna needs to assign a role to each of 
the necessary nodes in her OpenStack deployment. The nodes could be either 
controller, compute, networking, or storage resources depending on the needs of 
this deployment.

Definitely not: she needs to deliver a running cloud. Manually saying
'machine X is a compute node' is confusing an implementation with a
need. She needs to know that her cloud will have enough capacity to
meet her users needs; she needs to know that it will be resilient
against a wide set of failures (and this might be a dial with
different clouds having different uptime guarantees); she may need to
ensure that some specific hardware configuration is used for storage,
as a performance optimisation. None of those needs imply assigning
roles to machines.
Yes, in ideal world and large deployments. But there might be cases when 
Anna will need to say - deploy storage to this specific node. Not 
arguing that we want to have policy based approach, but we need to cover 
also manual control (forcing node to take some role).



- As an infrastructure administrator, Anna wants to monitor the deployment 
process of all of the nodes that she has assigned.

I don't think she wants to do that. I think she wants to be told if
there is a problem that needs her intervention to solve - e.g. bad
IPMI details for a node, or a node not responding when asked to boot
via PXE.
I think by this user story Liz wanted to capture that Anna wants to see 
if the deployment process is still being in progress or if it has 
finished/failed, etc. Which I agree with. I don't think that she will 
sit and watch what is happening.



- As an infrastructure administrator, Anna wants to be able to unallocate a 
node from a deployment.
Why? Whats her motivation. One plausible one for me is 'a machine
needs to be serviced so Anna wants to remove it from the deployment to
avoid causing user visible downtime.'  So lets say that: Anna needs to
be able to take machines out of service so they can be maintained or
disposed of.

Node being serviced is a different user story for me.

I believe we are still 'fighting' here with two approaches and I believe 
we need both. We can't only provide a way 'give us resources we will do 
a magic'. Yes this is preferred way - especially for large deployments, 
but we also need a fallback so that user can say - no, this node doesn't 
belong to the class, I don't want it there - unassign. Or I need to have 
this node there - assign.



- As an infrastructure administrator, Anna wants to be able to view the history 
of nodes that have been in a deployment.

Why? This is super generic and could mean anything.
I believe this has something to do with 'archived nodes'. But correct me 
if I am wrong.


-- Jarda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Jaromir Coufal


On 2013/07/12 01:59, Robert Collins wrote:


* Creation
   * Manual registration
  * hardware specs from Ironic based on mac address (M)

Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory stats
For registration it is just Management MAC address which is needed 
right? Or does Ironic need also IP? I think that MAC address might be 
enough, we can display IP in details of node later on.



  * IP auto populated from Neutron (F)

Do you mean IPMI IP ? I'd say IPMI address managed by Neutron here.

+1


   * Auto-discovery during undercloud install process (M)
* Monitoring
* assignment, availability, status
* capacity, historical statistics (M)

Why is this under 'nodes'? I challenge the idea that it should be
there. We will need to surface some stuff about nodes, but the
underlying idea is to take a cloud approach here - so we're monitoring
services, that happen to be on nodes. There is room to monitor nodes,
as an undercloud feature set, but lets be very very specific about
what is sitting at what layer.
We need both - we need to track services but also state of nodes (CPU, 
RAM, Network bandwidth, etc). So in node detail you should be able to 
track both.



* Management node (where triple-o is installed)

This should be plural :) - TripleO isn't a single service to be
installed - We've got Tuskar, Ironic, Nova, Glance, Keystone, Neutron,
etc.


* created as part of undercloud install process
* can create additional management nodes (F)
 * Resource nodes

 ^ nodes is again confusing layers - nodes are
what things are deployed to, but they aren't the entry point

Can you, please be a bit more specific here? I don't understand this note.




 * searchable by status, name, cpu, memory, and all attributes from 
ironic
 * can be allocated as one of four node types

Not by users though. We need to stop thinking of this as 'what we do
to nodes' - Nova/Ironic operate on nodes, we operate on Heat
templates.
Discussed in other threads, but I still believe (and I am not alone) 
that we need to allow 'force nodes'.



 * Unallocated nodes
This implies an 'allocation' step, that we don't have - how about
'Idle nodes' or something.

It can be auto-allocation. I don't see problem with 'unallocated' term.


   * defaulted, with no option to change
  * allow modification (F)
* review distribution map (F)
* notification when a deployment is ready to go or whenever something 
changes

Is this an (M) ?
Might be M but with higher priority. I see it in the middle. But if we 
have to decide, it can be M.

-- Jarda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Imre Farkas

On 12/09/2013 11:56 AM, Jaromir Coufal wrote:

On 2013/07/12 01:59, Robert Collins wrote:

* Monitoring
* assignment, availability, status
* capacity, historical statistics (M)

Why is this under 'nodes'? I challenge the idea that it should be
there. We will need to surface some stuff about nodes, but the
underlying idea is to take a cloud approach here - so we're monitoring
services, that happen to be on nodes. There is room to monitor nodes,
as an undercloud feature set, but lets be very very specific about
what is sitting at what layer.

We need both - we need to track services but also state of nodes (CPU,
RAM, Network bandwidth, etc). So in node detail you should be able to
track both.


I agree. Monitoring services and monitoring nodes are both valid 
features for Tuskar. I think splitting it into two separate requirements 
as Mainn suggested would make a lot of sense.



 * searchable by status, name, cpu, memory, and all attributes from 
ironic
 * can be allocated as one of four node types

Not by users though. We need to stop thinking of this as 'what we do
to nodes' - Nova/Ironic operate on nodes, we operate on Heat
templates.

Discussed in other threads, but I still believe (and I am not alone)
that we need to allow 'force nodes'.


Yeah, having both approaches would be nice to have. Instead of using the 
existing 'force nodes' implementation, wouldn't it be better/cleaner to 
implement support for it in Nova and Heat?


Imre

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread James Slagle
Mainn,

Thanks for pulling this together.

 * NODES
* Management node (where triple-o is installed)
* created as part of undercloud install process

I think getting the undercloud installed/deployed should be a
requirement for Icehouse.  I'm not sure if you meant that or were
assuming that it would already be done :).  I'd like to see a simpler
process than building the seed vm, starting it, deploying undercloud,
etc.  But, that's something we can work to define if others agree as
well.

* can create additional management nodes (F)

By this, do you mean using the undercloud to scale itself?  e.g.,
using nova on the undercloud to launch an additional undercloud
compute node, etc.  I like that concept, and don't see any reason why
that wouldn't be technically possible.

 * DEPLOYMENT ACTION
* Heat template generated on the fly
   * hardcoded images
  * allow image selection (F)

So, I think this may be what Robert was getting at, but I think this
one should be M or possibly even committed to Icehouse.  I think it's
very likely we're going to need to update which image is used to do
the deployment, e.g., if you build a new image to pick up a security
update.

IIRC, the image is just referenced by name in the template.  So,
maybe the process is just:

* build the new image
* rename/delete the old image
* upload the new image with the required name (overcloud-compute,
overcloud-control)

However, having a nicer image selection process would be nice.


-- 
-- James Slagle
--

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread James Slagle
On Fri, Dec 6, 2013 at 4:55 PM, Matt Wagner matt.wag...@redhat.com wrote:
 - As an infrastructure administrator, Anna expects that the
 management node for the deployment services is already up and running
 and the status of this node is shown in the UI.

 The 'management node' here is the undercloud node that Anna is
 interacting with, as I understand it. (Someone correct me if I'm wrong.)
 So it's not a bad idea to show its status, but I guess the mere fact
 that she's using it will indicate that it's operational.

That's how I read it as well, which assumes that you're using the
undercloud to manage itself.

FWIW, based on the OpenStack personas I think that Anna would be the
one doing the undercloud setup.  So, maybe this use case should be:

- As an infrastructure administrator, Anna wants to install the
undercloud so she can use the UI.

That piece is going to be a pretty big part of the entire deployment
process, so I think having a use case for it makes sense.

Nice work on the use cases Liz, thanks for pulling them together.

-- 
-- James Slagle
--

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Liz Blanchard

On Dec 9, 2013, at 4:29 AM, Jaromir Coufal jcou...@redhat.com wrote:

 
 On 2013/06/12 22:55, Matt Wagner wrote:
 - As an infrastructure administrator, Anna wants to review the
 distribution of the nodes that she has assigned before kicking off
 the Deploy task.
 What does she expect to see here on the review screen that she didn't
 see on the previous screens, if anything? Is this just a summation, or
 is she expecting to see things like which node will get which role? (I'd
 argue for the former; I don't know that we can predict the latter.)
 At the beginning, just summation. Later (when we have nova-scheduler 
 reservation) we can get the real distribution of which node is taking which 
 role.

Yes, the idea is that Anna wants to see some representation of what the 
distribution of nodes would be (how many would be assigned to each profile) 
before kicking off the deploy action.

 
 - As an infrastructure administrator, Anna wants to monitor the
 deployment process of all of the nodes that she has assigned.
 I think there's an implied ...through the UI here, versus tailing log
 files to watch state. Does she just expect to see states like Pending,
 Deploying, or Finished, versus, say, having the full logs shown in
 the UI? (I'd vote 'yes'.)
 For simplified view - yes, only change of states and progress bar. However 
 log should be available.

I'd vote 'yes' as well. These are definitely design decisions we should be 
making based on what we know of our end user. Although some use cases like 
troubleshooting might point towards using logs, this one definitely seems like 
a UI addition. I'll update the use case to be more specific. [1]

 
 - As an infrastructure administrator, Anna needs to be able to
 troubleshoot any errors that may occur during the deployment of nodes
 process.
 I'm not sure that the ...through the UI implication I mentioned above
 extends here. (IMHO) I assume that if things fail, Anna might be okay
 with us showing a message that $foo failed on $bar, and she should try
 looking in /var/log/$baz for full details. Does that seem fair? (At
 least early on.)
 As said above, for simplified views, it is ok to say $foo failed on $bar, but 
 she should be able to track the problem - logs section in the UI.

Yes, this is meant to be through the UI. I've updated the use case. [1]

 
 - As an infrastructure administrator, Anna wants to be able to view
 the history of nodes that have been in a deployment.
 Why does she want to view history of past nodes?
 
 Note that I'm not arguing against this; it's just not abundantly clear
 to me what she'll be using this information for. Does she want a history
 to check off an Audit log checkbox, or will she be looking to extract
 certain data from this history?
 Short answer is Graphs - history of utilization of the class etc.

I've updated this one to be more specific about the reasons why historic nodes 
is important to Anna. [1]

Thanks for all of the feedback,
Liz

[1] https://wiki.openstack.org/wiki/TripleO/Tuskar/IcehouseUserStories

 
 -- Jarda
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Liz Blanchard

On Dec 9, 2013, at 4:57 AM, Jaromir Coufal jcou...@redhat.com wrote:

 
 On 2013/07/12 02:20, Robert Collins wrote:
 - As an infrastructure administrator, Anna needs to assign a role to each 
 of the necessary nodes in her OpenStack deployment. The nodes could be 
 either controller, compute, networking, or storage resources depending on 
 the needs of this deployment.
 Definitely not: she needs to deliver a running cloud. Manually saying
 'machine X is a compute node' is confusing an implementation with a
 need. She needs to know that her cloud will have enough capacity to
 meet her users needs; she needs to know that it will be resilient
 against a wide set of failures (and this might be a dial with
 different clouds having different uptime guarantees); she may need to
 ensure that some specific hardware configuration is used for storage,
 as a performance optimisation. None of those needs imply assigning
 roles to machines.
 Yes, in ideal world and large deployments. But there might be cases when Anna 
 will need to say - deploy storage to this specific node. Not arguing that we 
 want to have policy based approach, but we need to cover also manual control 
 (forcing node to take some role).

Perhaps the use case is that Anna would want to define the different capacities 
that her cloud deployment will need? You both a right though, we don't want to 
force the user to manually select which nodes will run which services, but we 
should allow it for cases in which it's needed. I've updated the use case as an 
attempt to clear this up. [1]

 
 - As an infrastructure administrator, Anna wants to monitor the deployment 
 process of all of the nodes that she has assigned.
 I don't think she wants to do that. I think she wants to be told if
 there is a problem that needs her intervention to solve - e.g. bad
 IPMI details for a node, or a node not responding when asked to boot
 via PXE.
 I think by this user story Liz wanted to capture that Anna wants to see if 
 the deployment process is still being in progress or if it has 
 finished/failed, etc. Which I agree with. I don't think that she will sit and 
 watch what is happening.

Yes, definitely. I've updated this use case to reflect reality in that Anna 
would not sit there and actively monitor, but rather she would want to 
ultimately make sure that there weren't any errors during the deployment 
process. [1]

  
 - As an infrastructure administrator, Anna wants to be able to unallocate a 
 node from a deployment.
 Why? Whats her motivation. One plausible one for me is 'a machine
 needs to be serviced so Anna wants to remove it from the deployment to
 avoid causing user visible downtime.'  So lets say that: Anna needs to
 be able to take machines out of service so they can be maintained or
 disposed of.
 Node being serviced is a different user story for me.
 
 I believe we are still 'fighting' here with two approaches and I believe we 
 need both. We can't only provide a way 'give us resources we will do a 
 magic'. Yes this is preferred way - especially for large deployments, but we 
 also need a fallback so that user can say - no, this node doesn't belong to 
 the class, I don't want it there - unassign. Or I need to have this node 
 there - assign.

This is a great question, Robert. I think the reason you bring up for Anna 
wanting to remove a node is actually more of a Disable node action. This way 
she could potentially bring it back up after the maintenance is done. I will 
add some more details to this use case to try to clarify. [1]

 
 - As an infrastructure administrator, Anna wants to be able to view the 
 history of nodes that have been in a deployment.
 Why? This is super generic and could mean anything.
 I believe this has something to do with 'archived nodes'. But correct me if I 
 am wrong.

I was assuming it would be incase the user wants to go back to view the history 
of a certain node. Potentially the user could bring an archived node back 
online? Although maybe at this point it would just be rediscovered?

Thanks,
Liz

[1] https://wiki.openstack.org/wiki/TripleO/Tuskar/IcehouseUserStories

 
 -- Jarda
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread mar...@redhat.com
On 06/12/13 04:31, Tzu-Mainn Chen wrote:
 Hey all,
 
 I've attempted to spin out the requirements behind Jarda's excellent 
 wireframes 
 (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html).
 Hopefully this can add some perspective on both the wireframes and the needed 
 changes to the tuskar-api.
 
 All comments are welcome!
 
 Thanks,
 Tzu-Mainn Chen
 
 
 
 *** Requirements are assumed to be targeted for Icehouse, unless marked 
 otherwise:
(M) - Maybe Icehouse, dependency on other in-development features
(F) - Future requirement, after Icehouse
 
 * NODES
* Creation
   * Manual registration
  * hardware specs from Ironic based on mac address (M)
  * IP auto populated from Neutron (F)
   * Auto-discovery during undercloud install process (M)
* Monitoring
* assignment, availability, status
* capacity, historical statistics (M)
* Management node (where triple-o is installed)
* created as part of undercloud install process
* can create additional management nodes (F)
 * Resource nodes
 * searchable by status, name, cpu, memory, and all attributes from 
 ironic
 * can be allocated as one of four node types
 * compute
 * controller
 * object storage
 * block storage
 * Resource class - allows for further categorization of a node type
 * each node type specifies a single default resource class
 * allow multiple resource classes per node type (M)
 * optional node profile for a resource class (M)
 * acts as filter for nodes that can be allocated to that 
 class (M)
 * nodes can be viewed by node types
 * additional group by status, hardware specification
 * controller node type
* each controller node will run all openstack services
   * allow each node to run specified service (F)
* breakdown by workload (percentage of cpu used per node) (M)
 * Unallocated nodes
 * Archived nodes (F)
 * Will be separate openstack service (F)
 
 * DEPLOYMENT
* multiple deployments allowed (F)
  * initially just one
* deployment specifies a node distribution across node types
   * node distribution can be updated after creation
* deployment configuration, used for initial creation only
   * defaulted, with no option to change
  * allow modification (F)
* review distribution map (F)
* notification when a deployment is ready to go or whenever something 
 changes
 
 * DEPLOYMENT ACTION
* Heat template generated on the fly
   * hardcoded images
  * allow image selection (F)
   * pre-created template fragments for each node type
   * node type distribution affects generated template

sorry am a bit late to the discussion - fyi:

 there are two sides to these previous points 1) temp solution using
merge.py from tuskar and the tripleo-heat-templates repo. (Icehouse,
imo) and 2) doing it 'properly' with the merge functionality pushed into
heat. (F, imo).

For 1) various bits are in play: fyi/if interested:

 /#/c/56947/ (Make merge.py invokable), /#/c/58823/ (Make merge.py
installable) and /#/c/52045/ (WIP : sketch of what using merge.py looks
like for tuskar) this last one needs updating and thought. Also
/#/c/58229/ and /#/c/57210/ which need some more thought,



* nova scheduler allocates nodes
   * filters based on resource class and node profile information (M)
* Deployment action can create or update
* status indicator to determine overall state of deployment
   * status indicator for nodes as well
   * status includes 'time left' (F)
 
 * NETWORKS (F)
 * IMAGES (F)
 * LOGS (F)
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Liz Blanchard

On Dec 9, 2013, at 8:58 AM, James Slagle james.sla...@gmail.com wrote:

 On Fri, Dec 6, 2013 at 4:55 PM, Matt Wagner matt.wag...@redhat.com wrote:
 - As an infrastructure administrator, Anna expects that the
 management node for the deployment services is already up and running
 and the status of this node is shown in the UI.
 
 The 'management node' here is the undercloud node that Anna is
 interacting with, as I understand it. (Someone correct me if I'm wrong.)
 So it's not a bad idea to show its status, but I guess the mere fact
 that she's using it will indicate that it's operational.
 
 That's how I read it as well, which assumes that you're using the
 undercloud to manage itself.
 
 FWIW, based on the OpenStack personas I think that Anna would be the
 one doing the undercloud setup.  So, maybe this use case should be:
 
 - As an infrastructure administrator, Anna wants to install the
 undercloud so she can use the UI.
 
 That piece is going to be a pretty big part of the entire deployment
 process, so I think having a use case for it makes sense.

+1. I've added this as the very first use case.

 
 Nice work on the use cases Liz, thanks for pulling them together.

Thanks to all for the great discussion on these use cases. The 
questions/comments that they've generated is exactly what I was hoping for. I 
will continue to make updates and refine these[1] based on discussions. Of 
course, feel free to add to/change these yourself as well.

Liz

[1] https://wiki.openstack.org/wiki/TripleO/Tuskar/IcehouseUserStories

 
 -- 
 -- James Slagle
 --
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Liz Blanchard

On Dec 6, 2013, at 8:20 PM, Robert Collins robe...@robertcollins.net wrote:

 On 7 December 2013 09:31, Liz Blanchard lsure...@redhat.com wrote:
 This list is great, thanks very much for taking the time to write this up! I 
 think a big part of the User Experience design is to take a step back and 
 understand the requirements from an end user's point of view…what would they 
 want to accomplish by using this UI? This might influence the design in 
 certain ways, so I've taken a cut at a set of user stories for the Icehouse 
 timeframe based on these requirements that I hope will be useful during 
 discussions.
 
 Based on the OpenStack Personas[1], I think that Anna would be the main 
 consumer of the TripleO UI, but please let me know if you think otherwise.
 
 - As an infrastructure administrator, Anna needs to deploy or update a set 
 of resources that will run OpenStack (This isn't a very specific use case, 
 but more of the larger end goal of Anna coming into the UI.)
 - As an infrastructure administrator, Anna expects that the management node 
 for the deployment services is already up and running and the status of this 
 node is shown in the UI.
 - As an infrastructure administrator, Anna wants to be able to quickly see 
 the set of unallocated nodes that she could use for her deployment of 
 OpenStack. Ideally, she would not have to manually tell the system about 
 these nodes. If she needs to manually register nodes for whatever reason, 
 Anna would only want to have to define the essential data needed to register 
 these nodes.
 
 I want to challenge this one. There are two concerns conflated. A)
 seeing available resources for scaling up her cloud. B) minimising
 effort to enroll additional resources. B) is a no-brainer. For A)
 though, as phrased, we're talking about seeing a set of individual
 items: but actually, wouldn't aggregated capacity being more useful,
 with optional drill down - '400 cores, 2TB RAM, 1PB of disk'

Good point. I will update this to read that the user wants to see the available 
capacity and have the option to drill in further. [1]

 
 - As an infrastructure administrator, Anna needs to assign a role to each of 
 the necessary nodes in her OpenStack deployment. The nodes could be either 
 controller, compute, networking, or storage resources depending on the needs 
 of this deployment.
 
 Definitely not: she needs to deliver a running cloud. Manually saying
 'machine X is a compute node' is confusing an implementation with a
 need. She needs to know that her cloud will have enough capacity to
 meet her users needs; she needs to know that it will be resilient
 against a wide set of failures (and this might be a dial with
 different clouds having different uptime guarantees); she may need to
 ensure that some specific hardware configuration is used for storage,
 as a performance optimisation. None of those needs imply assigning
 roles to machines.
 
 - As an infrastructure administrator, Anna wants to review the distribution 
 of the nodes that she has assigned before kicking off the Deploy task.
 
 If by distribution you mean the top level stats (15 control nodes, 200
 hypervisors, etc) - then I agree. If you mean 'node X will be a
 hypervisor' - I thoroughly disagree. What does that do for her?

We are in agreement, I'd expect the former. I've updated the use case to be 
more specific. [1] 

 
 - As an infrastructure administrator, Anna wants to monitor the deployment 
 process of all of the nodes that she has assigned.
 
 I don't think she wants to do that. I think she wants to be told if
 there is a problem that needs her intervention to solve - e.g. bad
 IPMI details for a node, or a node not responding when asked to boot
 via PXE.
 
 - As an infrastructure administrator, Anna needs to be able to troubleshoot 
 any errors that may occur during the deployment of nodes process.
 
 Definitely.
 
 - As an infrastructure administrator, Anna wants to monitor the availability 
 and status of each node in her deployment.
 
 Yes, with the caveat that I think instance is the key thing here for
 now; there is a lifecycle aspect where being able to say 'machine X is
 having persistent network issues' is very important, as a long term
 thing we should totally aim at that.
 
 - As an infrastructure administrator, Anna wants to be able to unallocate a 
 node from a deployment.
 
 Why? Whats her motivation. One plausible one for me is 'a machine
 needs to be serviced so Anna wants to remove it from the deployment to
 avoid causing user visible downtime.'  So lets say that: Anna needs to
 be able to take machines out of service so they can be maintained or
 disposed of.
 
 - As an infrastructure administrator, Anna wants to be able to view the 
 history of nodes that have been in a deployment.
 
 Why? This is super generic and could mean anything.
 
 - As an infrastructure administrator, Anna needs to be notified of any 
 important changes to nodes that are in the OpenStack 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-09 Thread Jay Dobies

I believe we are still 'fighting' here with two approaches and I believe
we need both. We can't only provide a way 'give us resources we will do
a magic'. Yes this is preferred way - especially for large deployments,
but we also need a fallback so that user can say - no, this node doesn't
belong to the class, I don't want it there - unassign. Or I need to have
this node there - assign.


+1 to this. I think there are still a significant amount of admins out 
there that are really opposed to magic and want that fine-grained 
control. Even if they don't use it that frequently, in my experience 
they want to know it's there in the event they need it (and will often 
dream up a case that they'll need it).


I'm absolutely for pushing the magic approach as the preferred use. And 
in large deployments that's where people are going to see the biggest 
gain. The fine-grained approach can even be pushed off as a future 
feature. But I wouldn't be surprised to see people asking for it and I'd 
like to at least be able to say it's been talked about.



- As an infrastructure administrator, Anna wants to be able to view the history 
of nodes that have been in a deployment.

Why? This is super generic and could mean anything.

I believe this has something to do with 'archived nodes'. But correct me
if I am wrong.

-- Jarda


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Jay Dobies
Disclaimer: I'm very new to the project, so apologies if some of my 
questions have been already answered or flat out don't make sense.


As I proofread, some of my comments may drift a bit past basic 
requirements, so feel free to tell me to take certain questions out of 
this thread into specific discussion threads if I'm getting too detailed.





*** Requirements are assumed to be targeted for Icehouse, unless marked 
otherwise:
(M) - Maybe Icehouse, dependency on other in-development features
(F) - Future requirement, after Icehouse

* NODES
* Creation
   * Manual registration
  * hardware specs from Ironic based on mac address (M)
  * IP auto populated from Neutron (F)
   * Auto-discovery during undercloud install process (M)
* Monitoring
* assignment, availability, status
* capacity, historical statistics (M)
* Management node (where triple-o is installed)
* created as part of undercloud install process
* can create additional management nodes (F)
 * Resource nodes
 * searchable by status, name, cpu, memory, and all attributes from 
ironic
 * can be allocated as one of four node types


It's pretty clear by the current verbiage but I'm going to ask anyway: 
one and only one?



 * compute
 * controller
 * object storage
 * block storage
 * Resource class - allows for further categorization of a node type
 * each node type specifies a single default resource class
 * allow multiple resource classes per node type (M)


My gut reaction is that we want to bite this off sooner rather than 
later. This will have data model and API implications that, even if we 
don't commit to it for Icehouse, should still be in our minds during it, 
so it might make sense to make it a first class thing to just nail down now.



 * optional node profile for a resource class (M)
 * acts as filter for nodes that can be allocated to that class 
(M)


To my understanding, once this is in Icehouse, we'll have to support 
upgrades. If this filtering is pushed off, could we get into a situation 
where an allocation created in Icehouse would no longer be valid in 
Icehouse+1 once these filters are in place? If so, we might want to make 
it more of a priority to get them in place earlier and not eat the 
headache of addressing these sorts of integrity issues later.



 * nodes can be viewed by node types
 * additional group by status, hardware specification
 * controller node type
* each controller node will run all openstack services
   * allow each node to run specified service (F)
* breakdown by workload (percentage of cpu used per node) (M)
 * Unallocated nodes


Is there more still being flushed out here? Things like:
 * Listing unallocated nodes
 * Unallocating a previously allocated node (does this make it a 
vanilla resource or does it retain the resource type? is this the only 
way to change a node's resource type?)
 * Unregistering nodes from Tuskar's inventory (I put this under 
unallocated under the assumption that the workflow will be an explicit 
unallocate before unregister; I'm not sure if this is the same as 
archive below).



 * Archived nodes (F)


Can you elaborate a bit more on what this is?


 * Will be separate openstack service (F)

* DEPLOYMENT
* multiple deployments allowed (F)
  * initially just one
* deployment specifies a node distribution across node types
   * node distribution can be updated after creation
* deployment configuration, used for initial creation only
   * defaulted, with no option to change
  * allow modification (F)
* review distribution map (F)
* notification when a deployment is ready to go or whenever something 
changes

* DEPLOYMENT ACTION
* Heat template generated on the fly
   * hardcoded images
  * allow image selection (F)
   * pre-created template fragments for each node type
   * node type distribution affects generated template
* nova scheduler allocates nodes
   * filters based on resource class and node profile information (M)
* Deployment action can create or update
* status indicator to determine overall state of deployment
   * status indicator for nodes as well
   * status includes 'time left' (F)

* NETWORKS (F)
* IMAGES (F)
* LOGS (F)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Tzu-Mainn Chen
Thanks for the comments!  Responses inline:

 Disclaimer: I'm very new to the project, so apologies if some of my
 questions have been already answered or flat out don't make sense.
 
 As I proofread, some of my comments may drift a bit past basic
 requirements, so feel free to tell me to take certain questions out of
 this thread into specific discussion threads if I'm getting too detailed.
 
  
 
  *** Requirements are assumed to be targeted for Icehouse, unless marked
  otherwise:
  (M) - Maybe Icehouse, dependency on other in-development features
  (F) - Future requirement, after Icehouse
 
  * NODES
  * Creation
 * Manual registration
* hardware specs from Ironic based on mac address (M)
* IP auto populated from Neutron (F)
 * Auto-discovery during undercloud install process (M)
  * Monitoring
  * assignment, availability, status
  * capacity, historical statistics (M)
  * Management node (where triple-o is installed)
  * created as part of undercloud install process
  * can create additional management nodes (F)
   * Resource nodes
   * searchable by status, name, cpu, memory, and all attributes from
   ironic
   * can be allocated as one of four node types
 
 It's pretty clear by the current verbiage but I'm going to ask anyway:
 one and only one?

Yep, that's right!

   * compute
   * controller
   * object storage
   * block storage
   * Resource class - allows for further categorization of a node
   type
   * each node type specifies a single default resource class
   * allow multiple resource classes per node type (M)
 
 My gut reaction is that we want to bite this off sooner rather than
 later. This will have data model and API implications that, even if we
 don't commit to it for Icehouse, should still be in our minds during it,
 so it might make sense to make it a first class thing to just nail down now.

That is entirely correct, which is one reason it's on the list of requirements. 
 The
forthcoming API design will have to account for it.  Not recreating the entire 
data
model between releases is a key goal :)


   * optional node profile for a resource class (M)
   * acts as filter for nodes that can be allocated to that
   class (M)
 
 To my understanding, once this is in Icehouse, we'll have to support
 upgrades. If this filtering is pushed off, could we get into a situation
 where an allocation created in Icehouse would no longer be valid in
 Icehouse+1 once these filters are in place? If so, we might want to make
 it more of a priority to get them in place earlier and not eat the
 headache of addressing these sorts of integrity issues later.

That's true.  The problem is that to my understanding, the filters we'd
need in nova-scheduler are not yet fully in place.

I also think that this is an issue that we'll need to address no matter what.
Even once filters exist, if a user applies a filter *after* nodes are allocated,
we'll need to do something clever if the already-allocated nodes don't meet the
filter criteria.

   * nodes can be viewed by node types
   * additional group by status, hardware specification
   * controller node type
  * each controller node will run all openstack services
 * allow each node to run specified service (F)
  * breakdown by workload (percentage of cpu used per node) (M)
   * Unallocated nodes
 
 Is there more still being flushed out here? Things like:
   * Listing unallocated nodes
   * Unallocating a previously allocated node (does this make it a
 vanilla resource or does it retain the resource type? is this the only
 way to change a node's resource type?)
   * Unregistering nodes from Tuskar's inventory (I put this under
 unallocated under the assumption that the workflow will be an explicit
 unallocate before unregister; I'm not sure if this is the same as
 archive below).

Ah, you're entirely right.  I'll add these to the list.

   * Archived nodes (F)
 
 Can you elaborate a bit more on what this is?

To be honest, I'm a bit fuzzy about this myself; Jarda mentioned that there was
an OpenStack service in the process of being planned that would handle this
requirement.  Jarda, can you detail a bit?

Thanks again for the comments!


Mainn

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Liz Blanchard

On Dec 5, 2013, at 9:31 PM, Tzu-Mainn Chen tzuma...@redhat.com wrote:

 Hey all,
 
 I've attempted to spin out the requirements behind Jarda's excellent 
 wireframes 
 (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html).
 Hopefully this can add some perspective on both the wireframes and the needed 
 changes to the tuskar-api.

This list is great, thanks very much for taking the time to write this up! I 
think a big part of the User Experience design is to take a step back and 
understand the requirements from an end user's point of view…what would they 
want to accomplish by using this UI? This might influence the design in certain 
ways, so I've taken a cut at a set of user stories for the Icehouse timeframe 
based on these requirements that I hope will be useful during discussions.

Based on the OpenStack Personas[1], I think that Anna would be the main 
consumer of the TripleO UI, but please let me know if you think otherwise.

- As an infrastructure administrator, Anna needs to deploy or update a set of 
resources that will run OpenStack (This isn't a very specific use case, but 
more of the larger end goal of Anna coming into the UI.)
- As an infrastructure administrator, Anna expects that the management node for 
the deployment services is already up and running and the status of this node 
is shown in the UI.
- As an infrastructure administrator, Anna wants to be able to quickly see the 
set of unallocated nodes that she could use for her deployment of OpenStack. 
Ideally, she would not have to manually tell the system about these nodes. If 
she needs to manually register nodes for whatever reason, Anna would only want 
to have to define the essential data needed to register these nodes.
- As an infrastructure administrator, Anna needs to assign a role to each of 
the necessary nodes in her OpenStack deployment. The nodes could be either 
controller, compute, networking, or storage resources depending on the needs of 
this deployment.
- As an infrastructure administrator, Anna wants to review the distribution of 
the nodes that she has assigned before kicking off the Deploy task.
- As an infrastructure administrator, Anna wants to monitor the deployment 
process of all of the nodes that she has assigned.
- As an infrastructure administrator, Anna needs to be able to troubleshoot any 
errors that may occur during the deployment of nodes process.
- As an infrastructure administrator, Anna wants to monitor the availability 
and status of each node in her deployment.
- As an infrastructure administrator, Anna wants to be able to unallocate a 
node from a deployment.
- As an infrastructure administrator, Anna wants to be able to view the history 
of nodes that have been in a deployment.
- As an infrastructure administrator, Anna needs to be notified of any 
important changes to nodes that are in the OpenStack deployment. She does not 
want to be spammed with non-important notifications.

Please feel free to comment, change, or add to this list.

[1]https://docs.google.com/document/d/16rkiXWxxgzGT47_Wc6hzIPzO2-s2JWAPEKD0gP2mt7E/edit?pli=1#

Thanks,
Liz

 
 All comments are welcome!
 
 Thanks,
 Tzu-Mainn Chen
 
 
 
 *** Requirements are assumed to be targeted for Icehouse, unless marked 
 otherwise:
   (M) - Maybe Icehouse, dependency on other in-development features
   (F) - Future requirement, after Icehouse
 
 * NODES
   * Creation
  * Manual registration
 * hardware specs from Ironic based on mac address (M)
 * IP auto populated from Neutron (F)
  * Auto-discovery during undercloud install process (M)
   * Monitoring
   * assignment, availability, status
   * capacity, historical statistics (M)
   * Management node (where triple-o is installed)
   * created as part of undercloud install process
   * can create additional management nodes (F)
* Resource nodes
* searchable by status, name, cpu, memory, and all attributes from 
 ironic
* can be allocated as one of four node types
* compute
* controller
* object storage
* block storage
* Resource class - allows for further categorization of a node type
* each node type specifies a single default resource class
* allow multiple resource classes per node type (M)
* optional node profile for a resource class (M)
* acts as filter for nodes that can be allocated to that class 
 (M)
* nodes can be viewed by node types
* additional group by status, hardware specification
* controller node type
   * each controller node will run all openstack services
  * allow each node to run specified service (F)
   * breakdown by workload (percentage of cpu used per node) (M)
* Unallocated nodes
* Archived nodes (F)
* Will be separate openstack service (F)
 
 * DEPLOYMENT
 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Tzu-Mainn Chen
That looks really good, thanks for putting that together!

I'm going to put together a wiki page that consolidates the various Tuskar
planning documents - requirements, user stories, wireframes, etc - so it's
easier to see the whole planning picture.

Mainn

- Original Message -
 
 On Dec 5, 2013, at 9:31 PM, Tzu-Mainn Chen tzuma...@redhat.com wrote:
 
  Hey all,
  
  I've attempted to spin out the requirements behind Jarda's excellent
  wireframes
  (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html).
  Hopefully this can add some perspective on both the wireframes and the
  needed changes to the tuskar-api.
 
 This list is great, thanks very much for taking the time to write this up! I
 think a big part of the User Experience design is to take a step back and
 understand the requirements from an end user's point of view…what would they
 want to accomplish by using this UI? This might influence the design in
 certain ways, so I've taken a cut at a set of user stories for the Icehouse
 timeframe based on these requirements that I hope will be useful during
 discussions.
 
 Based on the OpenStack Personas[1], I think that Anna would be the main
 consumer of the TripleO UI, but please let me know if you think otherwise.
 
 - As an infrastructure administrator, Anna needs to deploy or update a set of
 resources that will run OpenStack (This isn't a very specific use case, but
 more of the larger end goal of Anna coming into the UI.)
 - As an infrastructure administrator, Anna expects that the management node
 for the deployment services is already up and running and the status of this
 node is shown in the UI.
 - As an infrastructure administrator, Anna wants to be able to quickly see
 the set of unallocated nodes that she could use for her deployment of
 OpenStack. Ideally, she would not have to manually tell the system about
 these nodes. If she needs to manually register nodes for whatever reason,
 Anna would only want to have to define the essential data needed to register
 these nodes.
 - As an infrastructure administrator, Anna needs to assign a role to each of
 the necessary nodes in her OpenStack deployment. The nodes could be either
 controller, compute, networking, or storage resources depending on the needs
 of this deployment.
 - As an infrastructure administrator, Anna wants to review the distribution
 of the nodes that she has assigned before kicking off the Deploy task.
 - As an infrastructure administrator, Anna wants to monitor the deployment
 process of all of the nodes that she has assigned.
 - As an infrastructure administrator, Anna needs to be able to troubleshoot
 any errors that may occur during the deployment of nodes process.
 - As an infrastructure administrator, Anna wants to monitor the availability
 and status of each node in her deployment.
 - As an infrastructure administrator, Anna wants to be able to unallocate a
 node from a deployment.
 - As an infrastructure administrator, Anna wants to be able to view the
 history of nodes that have been in a deployment.
 - As an infrastructure administrator, Anna needs to be notified of any
 important changes to nodes that are in the OpenStack deployment. She does
 not want to be spammed with non-important notifications.
 
 Please feel free to comment, change, or add to this list.
 
 [1]https://docs.google.com/document/d/16rkiXWxxgzGT47_Wc6hzIPzO2-s2JWAPEKD0gP2mt7E/edit?pli=1#
 
 Thanks,
 Liz
 
  
  All comments are welcome!
  
  Thanks,
  Tzu-Mainn Chen
  
  
  
  *** Requirements are assumed to be targeted for Icehouse, unless marked
  otherwise:
(M) - Maybe Icehouse, dependency on other in-development features
(F) - Future requirement, after Icehouse
  
  * NODES
* Creation
   * Manual registration
  * hardware specs from Ironic based on mac address (M)
  * IP auto populated from Neutron (F)
   * Auto-discovery during undercloud install process (M)
* Monitoring
* assignment, availability, status
* capacity, historical statistics (M)
* Management node (where triple-o is installed)
* created as part of undercloud install process
* can create additional management nodes (F)
 * Resource nodes
 * searchable by status, name, cpu, memory, and all attributes from
 ironic
 * can be allocated as one of four node types
 * compute
 * controller
 * object storage
 * block storage
 * Resource class - allows for further categorization of a node type
 * each node type specifies a single default resource class
 * allow multiple resource classes per node type (M)
 * optional node profile for a resource class (M)
 * acts as filter for nodes that can be allocated to that
 class (M)
 * nodes can be viewed by node types
 * additional 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Matt Wagner
Thanks, Liz! Seeing things this way is really helpful.

(I actually feel like wireframes - requirements - user stories is
exactly the opposite of how this normally goes, but hitting all of the
steps either way makes things much clearer.)

I've raised some questions below. I think many of them aren't aimed at
you per se, but are more general things that seeing the user stories has
helped me realize we could clarify.


On Fri Dec  6 15:31:36 2013, Liz Blanchard wrote:

 - As an infrastructure administrator, Anna expects that the
 management node for the deployment services is already up and running
 and the status of this node is shown in the UI.

The 'management node' here is the undercloud node that Anna is
interacting with, as I understand it. (Someone correct me if I'm wrong.)
So it's not a bad idea to show its status, but I guess the mere fact
that she's using it will indicate that it's operational.


 - As an infrastructure administrator, Anna wants to review the
 distribution of the nodes that she has assigned before kicking off
 the Deploy task.

What does she expect to see here on the review screen that she didn't
see on the previous screens, if anything? Is this just a summation, or
is she expecting to see things like which node will get which role? (I'd
argue for the former; I don't know that we can predict the latter.)


 - As an infrastructure administrator, Anna wants to monitor the
 deployment process of all of the nodes that she has assigned.

I think there's an implied ...through the UI here, versus tailing log
files to watch state. Does she just expect to see states like Pending,
Deploying, or Finished, versus, say, having the full logs shown in
the UI? (I'd vote 'yes'.)


 - As an infrastructure administrator, Anna needs to be able to
 troubleshoot any errors that may occur during the deployment of nodes
 process.

I'm not sure that the ...through the UI implication I mentioned above
extends here. (IMHO) I assume that if things fail, Anna might be okay
with us showing a message that $foo failed on $bar, and she should try
looking in /var/log/$baz for full details. Does that seem fair? (At
least early on.)


 - As an infrastructure administrator, Anna wants to be able to view
 the history of nodes that have been in a deployment.

Why does she want to view history of past nodes?

Note that I'm not arguing against this; it's just not abundantly clear
to me what she'll be using this information for. Does she want a history
to check off an Audit log checkbox, or will she be looking to extract
certain data from this history?

Thanks again for creating these user stories, Liz!

-- 
Matt Wagner
Software Engineer, Red Hat



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Tzu-Mainn Chen
The relevant wiki page is here:

https://wiki.openstack.org/wiki/TripleO/Tuskar#Icehouse_Planning


- Original Message -
 That looks really good, thanks for putting that together!
 
 I'm going to put together a wiki page that consolidates the various Tuskar
 planning documents - requirements, user stories, wireframes, etc - so it's
 easier to see the whole planning picture.
 
 Mainn
 
 - Original Message -
  
  On Dec 5, 2013, at 9:31 PM, Tzu-Mainn Chen tzuma...@redhat.com wrote:
  
   Hey all,
   
   I've attempted to spin out the requirements behind Jarda's excellent
   wireframes
   (http://lists.openstack.org/pipermail/openstack-dev/2013-December/020944.html).
   Hopefully this can add some perspective on both the wireframes and the
   needed changes to the tuskar-api.
  
  This list is great, thanks very much for taking the time to write this up!
  I
  think a big part of the User Experience design is to take a step back and
  understand the requirements from an end user's point of view…what would
  they
  want to accomplish by using this UI? This might influence the design in
  certain ways, so I've taken a cut at a set of user stories for the Icehouse
  timeframe based on these requirements that I hope will be useful during
  discussions.
  
  Based on the OpenStack Personas[1], I think that Anna would be the main
  consumer of the TripleO UI, but please let me know if you think otherwise.
  
  - As an infrastructure administrator, Anna needs to deploy or update a set
  of
  resources that will run OpenStack (This isn't a very specific use case, but
  more of the larger end goal of Anna coming into the UI.)
  - As an infrastructure administrator, Anna expects that the management node
  for the deployment services is already up and running and the status of
  this
  node is shown in the UI.
  - As an infrastructure administrator, Anna wants to be able to quickly see
  the set of unallocated nodes that she could use for her deployment of
  OpenStack. Ideally, she would not have to manually tell the system about
  these nodes. If she needs to manually register nodes for whatever reason,
  Anna would only want to have to define the essential data needed to
  register
  these nodes.
  - As an infrastructure administrator, Anna needs to assign a role to each
  of
  the necessary nodes in her OpenStack deployment. The nodes could be either
  controller, compute, networking, or storage resources depending on the
  needs
  of this deployment.
  - As an infrastructure administrator, Anna wants to review the distribution
  of the nodes that she has assigned before kicking off the Deploy task.
  - As an infrastructure administrator, Anna wants to monitor the deployment
  process of all of the nodes that she has assigned.
  - As an infrastructure administrator, Anna needs to be able to troubleshoot
  any errors that may occur during the deployment of nodes process.
  - As an infrastructure administrator, Anna wants to monitor the
  availability
  and status of each node in her deployment.
  - As an infrastructure administrator, Anna wants to be able to unallocate a
  node from a deployment.
  - As an infrastructure administrator, Anna wants to be able to view the
  history of nodes that have been in a deployment.
  - As an infrastructure administrator, Anna needs to be notified of any
  important changes to nodes that are in the OpenStack deployment. She does
  not want to be spammed with non-important notifications.
  
  Please feel free to comment, change, or add to this list.
  
  [1]https://docs.google.com/document/d/16rkiXWxxgzGT47_Wc6hzIPzO2-s2JWAPEKD0gP2mt7E/edit?pli=1#
  
  Thanks,
  Liz
  
   
   All comments are welcome!
   
   Thanks,
   Tzu-Mainn Chen
   
   
   
   *** Requirements are assumed to be targeted for Icehouse, unless marked
   otherwise:
 (M) - Maybe Icehouse, dependency on other in-development features
 (F) - Future requirement, after Icehouse
   
   * NODES
 * Creation
* Manual registration
   * hardware specs from Ironic based on mac address (M)
   * IP auto populated from Neutron (F)
* Auto-discovery during undercloud install process (M)
 * Monitoring
 * assignment, availability, status
 * capacity, historical statistics (M)
 * Management node (where triple-o is installed)
 * created as part of undercloud install process
 * can create additional management nodes (F)
  * Resource nodes
  * searchable by status, name, cpu, memory, and all attributes from
  ironic
  * can be allocated as one of four node types
  * compute
  * controller
  * object storage
  * block storage
  * Resource class - allows for further categorization of a node
  type
  * each node type specifies a single default resource class
  * allow multiple resource classes 

Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Robert Collins
On 7 December 2013 09:31, Liz Blanchard lsure...@redhat.com wrote:
 This list is great, thanks very much for taking the time to write this up! I 
 think a big part of the User Experience design is to take a step back and 
 understand the requirements from an end user's point of view…what would they 
 want to accomplish by using this UI? This might influence the design in 
 certain ways, so I've taken a cut at a set of user stories for the Icehouse 
 timeframe based on these requirements that I hope will be useful during 
 discussions.

 Based on the OpenStack Personas[1], I think that Anna would be the main 
 consumer of the TripleO UI, but please let me know if you think otherwise.

 - As an infrastructure administrator, Anna needs to deploy or update a set of 
 resources that will run OpenStack (This isn't a very specific use case, but 
 more of the larger end goal of Anna coming into the UI.)
 - As an infrastructure administrator, Anna expects that the management node 
 for the deployment services is already up and running and the status of this 
 node is shown in the UI.
 - As an infrastructure administrator, Anna wants to be able to quickly see 
 the set of unallocated nodes that she could use for her deployment of 
 OpenStack. Ideally, she would not have to manually tell the system about 
 these nodes. If she needs to manually register nodes for whatever reason, 
 Anna would only want to have to define the essential data needed to register 
 these nodes.

I want to challenge this one. There are two concerns conflated. A)
seeing available resources for scaling up her cloud. B) minimising
effort to enroll additional resources. B) is a no-brainer. For A)
though, as phrased, we're talking about seeing a set of individual
items: but actually, wouldn't aggregated capacity being more useful,
with optional drill down - '400 cores, 2TB RAM, 1PB of disk'

 - As an infrastructure administrator, Anna needs to assign a role to each of 
 the necessary nodes in her OpenStack deployment. The nodes could be either 
 controller, compute, networking, or storage resources depending on the needs 
 of this deployment.

Definitely not: she needs to deliver a running cloud. Manually saying
'machine X is a compute node' is confusing an implementation with a
need. She needs to know that her cloud will have enough capacity to
meet her users needs; she needs to know that it will be resilient
against a wide set of failures (and this might be a dial with
different clouds having different uptime guarantees); she may need to
ensure that some specific hardware configuration is used for storage,
as a performance optimisation. None of those needs imply assigning
roles to machines.

 - As an infrastructure administrator, Anna wants to review the distribution 
 of the nodes that she has assigned before kicking off the Deploy task.

If by distribution you mean the top level stats (15 control nodes, 200
hypervisors, etc) - then I agree. If you mean 'node X will be a
hypervisor' - I thoroughly disagree. What does that do for her?

 - As an infrastructure administrator, Anna wants to monitor the deployment 
 process of all of the nodes that she has assigned.

I don't think she wants to do that. I think she wants to be told if
there is a problem that needs her intervention to solve - e.g. bad
IPMI details for a node, or a node not responding when asked to boot
via PXE.

 - As an infrastructure administrator, Anna needs to be able to troubleshoot 
 any errors that may occur during the deployment of nodes process.

Definitely.

 - As an infrastructure administrator, Anna wants to monitor the availability 
 and status of each node in her deployment.

Yes, with the caveat that I think instance is the key thing here for
now; there is a lifecycle aspect where being able to say 'machine X is
having persistent network issues' is very important, as a long term
thing we should totally aim at that.

 - As an infrastructure administrator, Anna wants to be able to unallocate a 
 node from a deployment.

Why? Whats her motivation. One plausible one for me is 'a machine
needs to be serviced so Anna wants to remove it from the deployment to
avoid causing user visible downtime.'  So lets say that: Anna needs to
be able to take machines out of service so they can be maintained or
disposed of.

 - As an infrastructure administrator, Anna wants to be able to view the 
 history of nodes that have been in a deployment.

Why? This is super generic and could mean anything.

 - As an infrastructure administrator, Anna needs to be notified of any 
 important changes to nodes that are in the OpenStack deployment. She does not 
 want to be spammed with non-important notifications.

What sort of changes do you mean here?



Thanks for putting this together, I love Personas as a way to make
designs concrete and connected to user needs.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Robert Collins
On 7 December 2013 10:55, Matt Wagner matt.wag...@redhat.com wrote:

 The 'management node' here is the undercloud node that Anna is
 interacting with, as I understand it. (Someone correct me if I'm wrong.)
 So it's not a bad idea to show its status, but I guess the mere fact
 that she's using it will indicate that it's operational.

There are potentially many such nodes, and Anna will be interacting
with some of them; I don't think we can make too many assumptions
about what the UI working implies.

 - As an infrastructure administrator, Anna needs to be able to
 troubleshoot any errors that may occur during the deployment of nodes
 process.

 I'm not sure that the ...through the UI implication I mentioned above
 extends here. (IMHO) I assume that if things fail, Anna might be okay
 with us showing a message that $foo failed on $bar, and she should try
 looking in /var/log/$baz for full details. Does that seem fair? (At
 least early on.)

I don't think we necessarily need to do anything here other than make
sure the system is a) well documented and b) Anna has all the normal
sysadmin access to the infrastructure. Her needs can be met by us
getting out of the way gracefully; at least in the short term.


-Rob

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Tuskar] Icehouse Requirements

2013-12-06 Thread Tzu-Mainn Chen
Thanks for the comments and questions!  I fully expect that this list of 
requirements
will need to be fleshed out, refined, and heavily modified, so the more the 
merrier.

Comments inline:

 
  *** Requirements are assumed to be targeted for Icehouse, unless marked
  otherwise:
 (M) - Maybe Icehouse, dependency on other in-development features
 (F) - Future requirement, after Icehouse
 
  * NODES
 
 Note that everything in this section should be Ironic API calls.
 
 * Creation
* Manual registration
   * hardware specs from Ironic based on mac address (M)
 
 Ironic today will want IPMI address + MAC for each NIC + disk/cpu/memory
 stats
 
   * IP auto populated from Neutron (F)
 
 Do you mean IPMI IP ? I'd say IPMI address managed by Neutron here.
 
* Auto-discovery during undercloud install process (M)
 * Monitoring
 * assignment, availability, status
 * capacity, historical statistics (M)
 
 Why is this under 'nodes'? I challenge the idea that it should be
 there. We will need to surface some stuff about nodes, but the
 underlying idea is to take a cloud approach here - so we're monitoring
 services, that happen to be on nodes. There is room to monitor nodes,
 as an undercloud feature set, but lets be very very specific about
 what is sitting at what layer.

That's a fair point.  At the same time, the UI does want to monitor both
services and the nodes that the services are running on, correct?  I would
think that a user would want this.

Would it be better to explicitly split this up into two separate requirements?

 * Management node (where triple-o is installed)
 
 This should be plural :) - TripleO isn't a single service to be
 installed - We've got Tuskar, Ironic, Nova, Glance, Keystone, Neutron,
 etc.

I misspoke here - this should be where the undercloud is installed.  My
current understanding is that our initial release will only support the 
undercloud
being installed onto a single node, but my understanding could very well be 
flawed.

 * created as part of undercloud install process
 * can create additional management nodes (F)
  * Resource nodes
 
 ^ nodes is again confusing layers - nodes are
 what things are deployed to, but they aren't the entry point
 
  * searchable by status, name, cpu, memory, and all attributes from
  ironic
  * can be allocated as one of four node types
 
 Not by users though. We need to stop thinking of this as 'what we do
 to nodes' - Nova/Ironic operate on nodes, we operate on Heat
 templates.

Right, I didn't mean to imply that users would be doing this allocation.  But 
once Nova
does this allocation, the UI does want to be aware of how the allocation is 
done, right?
That's what this requirement meant.

  * compute
  * controller
  * object storage
  * block storage
  * Resource class - allows for further categorization of a node type
  * each node type specifies a single default resource class
  * allow multiple resource classes per node type (M)
 
 Whats a node type?

Compute/controller/object storage/block storage.  Is another term besides node 
type
more accurate?

 
  * optional node profile for a resource class (M)
  * acts as filter for nodes that can be allocated to that
  class (M)
 
 I'm not clear on this - you can list the nodes that have had a
 particular thing deployed on them; we probably can get a good answer
 to being able to see what nodes a particular flavor can deploy to, but
 we don't want to be second guessing the scheduler..

Correct; the goal here is to provide a way through the UI to send additional 
filtering
requirements that will eventually be passed into the scheduler, allowing the 
scheduler
to apply additional filters.

  * nodes can be viewed by node types
  * additional group by status, hardware specification
 
 *Instances* - e.g. hypervisors, storage, block storage etc.
 
  * controller node type
 
 Again, need to get away from node type here.
 
 * each controller node will run all openstack services
* allow each node to run specified service (F)
 * breakdown by workload (percentage of cpu used per node) (M)
  * Unallocated nodes
 
 This implies an 'allocation' step, that we don't have - how about
 'Idle nodes' or something.

Is it imprecise to say that nodes are allocated by the scheduler?  Would 
something like
'active/idle' be better?

  * Archived nodes (F)
  * Will be separate openstack service (F)
 
  * DEPLOYMENT
 * multiple deployments allowed (F)
   * initially just one
 * deployment specifies a node distribution across node types
 
 I can't parse this. Deployments specify how many instances to deploy
 in what roles (e.g. 2 control, 2 storage, 4 block storage,