Re: [openstack-dev] [TripleO] Summit session wrapup

2013-12-02 Thread Matt Wagner
On Sun Dec  1 00:27:30 2013, Tzu-Mainn Chen wrote:

 I think it's far more important that we list out requirements and
 create a design document that people agree upon first.  Otherwise, we
 run the risk of focusing on feature X for release 1 without ensuring
 that our architecture supports feature Y for release 2.

+1 to this.

I think that lifeless'
https://etherpad.openstack.org/p/tripleo-feature-map pad might be a good
way to get moving in that direction.


 The point of disagreement here - which actually seems quite minor to
 me - is how far we want to go in defining heterogeneity.  Are existing
 node attributes such as cpu and memory enough?  Or do we need to go
 further?  To take examples from this thread, some additional
 possibilities include: rack, network connectivity, etc.  Presumably,
 such attributes will be user defined and managed within TripleO itself.

I took the point of disagreement more about the allowance of manual
control. Should a user be able to override the list of what gets
provisioned, where?

And I don't think you always want heterogeneity. For example, if we
treat 'rack' as one of those attributes, a system administrator might
specifically want things to NOT share a rack, e.g. for redundancy.

That said, I suspect that many of us (myself included) have never
designed a data center, so I worry that some of our examples might be a
bit contrived. Not necessarily just for this conversation, but I think
it'd be handy to have real-world stories here. I'm sure no two are
identical, but it'd help make sure we're focused on real-world scenarios.



 If that understanding is correct, it seems to me that the requirements
 are broadly in agreement, and that TripleO defined node attributes
 is a feature that can easily be slotted into this sort of
 architecture.  Whether it needs to come first. . . should be a
 different discussion (my gut feel is that it shouldn't come first, as
 it depends on everything else working, but maybe I'm wrong).

So to me, that question -- what should come first? -- is exactly what
started this discussion. It didn't start out as a question of whether we
should allow users to override the schedule, but as a question of where
we should start building. Should we start off just letting Nova
scheduler do all the hard work for us and let overrides maybe come in
later? Or should we we start off requiring that everything is manual and
later transition to using Nova? (I don't have a strong opinion either
way, but I hope we land one way or the other soon.)

-- 
Matt Wagner
Software Engineer, Red Hat



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Summit session wrapup

2013-12-02 Thread Jordan OMara

On 01/12/13 00:27 -0500, Tzu-Mainn Chen wrote:
I think we may all be approaching the planning of this project in the wrong way, because of confusions such as: 


Well, I think there is one small misunderstanding. I've never said that
manual way should be primary workflow for us. I agree that we should lean
toward as much automation and smartness as possible. But in the same time, I
am adding that we need manual fallback for user to change that smart
decision.



Primary way would be to let TripleO decide, where the stuff go. I think we
agree here.
That's a pretty fundamental requirement that both sides seem to agree upon - but that agreement got lost in the discussions of what feature should come in which release, etc. That seems backwards to me. 

I think it's far more important that we list out requirements and create a design document that people agree upon first. Otherwise, we run the risk of focusing on feature X for release 1 without ensuring that our architecture supports feature Y for release 2. 

To make this example more specific: it seems clear that everyone agrees that the current Tuskar design (where nodes must be assigned to racks, which are then used as the primary means of manipulation) is not quite correct. Instead, we'd like to introduce a philosophy where we assume that users don't want to deal with homogeneous nodes individually, instead letting TripleO make decisions for them. 



I agree; getting buy-in on a design document up front is going to
save us future anguish

Regarding this - I think we may want to clarify what the purpose of our releases are at the moment. Personally, I don't think our current planning is about several individual product releases that we expect to be production-ready and usable by the world; I think it's about milestone releases which build towards a more complete product. 

From that perspective, if I were a prospective user, I would be less concerned with each release containing exactly what I need. Instead, what I would want most out of the project is: 

a) frequent stable releases (so I can be comfortable with the pace of development and the quality of code) 
b) design documentation and wireframes (so I can be comfortable that the architecture will support features I need) 
c) a roadmap (so I have an idea when my requirements will be met) 



+1
--
Jordan O'Mara jomara at redhat.com
Red Hat Engineering, Raleigh 


pgp7rupTEuBS0.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-30 Thread Tzu-Mainn Chen
I think we may all be approaching the planning of this project in the wrong 
way, because of confusions such as: 

 Well, I think there is one small misunderstanding. I've never said that
 manual way should be primary workflow for us. I agree that we should lean
 toward as much automation and smartness as possible. But in the same time, I
 am adding that we need manual fallback for user to change that smart
 decision.

 Primary way would be to let TripleO decide, where the stuff go. I think we
 agree here.
That's a pretty fundamental requirement that both sides seem to agree upon - 
but that agreement got lost in the discussions of what feature should come in 
which release, etc. That seems backwards to me. 

I think it's far more important that we list out requirements and create a 
design document that people agree upon first. Otherwise, we run the risk of 
focusing on feature X for release 1 without ensuring that our architecture 
supports feature Y for release 2. 

To make this example more specific: it seems clear that everyone agrees that 
the current Tuskar design (where nodes must be assigned to racks, which are 
then used as the primary means of manipulation) is not quite correct. Instead, 
we'd like to introduce a philosophy where we assume that users don't want to 
deal with homogeneous nodes individually, instead letting TripleO make 
decisions for them. 

When we have a bunch of heterogeneous nodes, we want to be able to break them 
up into several homogeneous groups, and assign different capabilities to each. 
But again, within each individual homogeneous group, we don't want users 
dealing with each individual nodes; instead, we want TripleO to take care of 
business. 

The point of disagreement here - which actually seems quite minor to me - is 
how far we want to go in defining heterogeneity. Are existing node attributes 
such as cpu and memory enough? Or do we need to go further? To take examples 
from this thread, some additional possibilities include: rack, network 
connectivity, etc. Presumably, such attributes will be user defined and managed 
within TripleO itself. 

If that understanding is correct, it seems to me that the requirements are 
broadly in agreement, and that TripleO defined node attributes is a feature 
that can easily be slotted into this sort of architecture. Whether it needs to 
come first. . . should be a different discussion (my gut feel is that it 
shouldn't come first, as it depends on everything else working, but maybe I'm 
wrong). 

In any case, if we can a) detail requirements without talking about releases 
and b) create a design architecture, I think that it'll be far easier to come 
up with a set of milestones that make developmental sense. 

  Folk that want to manually install openstack on a couple of machines
 
  can already do so : we don't change the game for them by replacing a
 
  manual system with a manual system. My vision is that we should
 
  deliver something significantly better!
 

 We should! And we can. But I think we shouldn't deliver something, what will
 discourage people from using TripleO. Especially at the beginning - see
 user, we are doing first steps here, the distribution is not perfect and
 what you wanted, but you can do the change you need. You don't have to go
 away and come back in 6 months when we try to be smarter and address your
 case.

Regarding this - I think we may want to clarify what the purpose of our 
releases are at the moment. Personally, I don't think our current planning is 
about several individual product releases that we expect to be production-ready 
and usable by the world; I think it's about milestone releases which build 
towards a more complete product. 

From that perspective, if I were a prospective user, I would be less concerned 
with each release containing exactly what I need. Instead, what I would want 
most out of the project is: 

a) frequent stable releases (so I can be comfortable with the pace of 
development and the quality of code) 
b) design documentation and wireframes (so I can be comfortable that the 
architecture will support features I need) 
c) a roadmap (so I have an idea when my requirements will be met) 

Mainn 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-28 Thread Jaromir Coufal


On 2013/28/11 06:41, Robert Collins wrote:

Certainly. Do we have Personas for those people? (And have we done any
validation of them?)
We have shorter paragraph to each. But not verified by any survey, so we 
don't have very solid basis in this area right now and I believe we all 
are trying to assume at the moment.



This may be where we disagree indeed :). Wearing my sysadmin hat ( a
little dusty, but never really goes away :P) - I can tell you I spent
a lot of time worrying about what went on what machine. But it was
never actually what I was paid to do.

What I was paid to do was to deliver infrastructure and services to
the business. Everything that we could automate, that we could
describe with policy and still get robust, reliable results - we did.
It's how one runs many hundred machines with an ops team of 2.

Planning around failure domains for example, is tedious work; it's
needed at a purchasing level - you need to decide if you're buying
three datacentres or one datacentre with internal redundancy, but once
thats decided the actual mechanics of ensure that each HA service is
spread across the (three datacentres) or (three separate zones in the
one DC) is not interesting. So - I'm sure that many sysadmins do
manually assign work to machines to ensure a good result from
performance or HA concerns, but thats out of necessity, not desire.
Well, I think there is one small misunderstanding. I've never said that 
manual way should be primary workflow for us. I agree that we should 
lean toward as much automation and smartness as possible. But in the 
same time, I am adding that we need manual fallback for user to change 
that smart decision.


Primary way would be to let TripleO decide, where the stuff go. I think 
we agree here.


But I, as sysadmin, want to see the distribution of stuff before I 
deploy. And if there is some failure in the automation logic, I need to 
have possibility to change that. Not from scratch, but do the change in 
suggested distribution. There always should be way to do that manually. 
Let's imagine that TripleO will by some mistake or intentionally 
distribute nodes across my datacenter wrong (wrong for me, not 
necessarily for somebody else). What would I do? Would I let TripleO to 
deploy it anyway? No. I will not use TripleO. But If there is something 
what I need to change and I have a way to do that, I will keep with 
TripleO, because it allows me to satisfy all I need.


We can be smart, but we can't be the smartest and see all reasons of all 
users.



Why does that layout make you happy? What is it about that setup where
things will work better for you? Note that in the absence of a
sophisticated scheduler you'll have some volumes with redundancy of 3
end up all in one rack: you won't get rack-can-fail safety on the
delivered cloud workloads (I mention this as one attempt to understand
why knowing there is a control node / 3 storage /rest compute in each
rack makes you happy).
It doesn't have to make me happy, but somebody else might have strong 
reasoning for that (or any other setup which we didn't cover). We don't 
have to know it, but why can't we allow him to do this?


One more time, I want to stress this out - I am not fighting for absence 
of sophisticated scheduler, I am fighting for allowing user to control 
the stuff if he wants/needs to.



I think having that degree of control is failure. Our CloudOS team has
considerable experience now in deploying clouds using a high-touch
system like you describe - and they are utterly convinced that it
doesn't scale. Even at 20 nodes it is super tedious, and beyond that
it's ridiculous.
Right. And are they convinced that automated tool will do the best job 
for them? Are they trusting them so strongly, so that they would deploy 
their whole datacenter without checking the correct distribution? Would 
they say - OK I said I want 50 compute, 10 block storage, 3 control. As 
long as it will work, I don't care, be smart, do it for me.


It all depends on the GUI design. If we design it well enough, so that 
we allow user to do quick bulk actions, even manual distribution can be 
easy. Even for 100 nodes... or more.

(But I don't suggest we do that all manual.)


Flexibilty comes with a cost. Right now we have a large audience
interested in what we have, but we're delivering two separate things:
we have a functional sysadminny interface with command line scripts
and heat templates - , and we have a GUI where we can offer a better
interface which the tuskar folk are building up. I agree that
homogeneous hardware isn't a viable long term constraint. But if we
insist on fixing that issue first, we sacrifice our ability to learn
about the usefulness of a simple, straight forward interface. We'll be
doing a bunch of work - regardless of implementation - to deal with
heterogeneity, when we could be bringing Swift and Cinder up to
production readiness - which IMO will get many more folk onboard for
adoption.
I agree that 

Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-28 Thread Ladislav Smola

Hello,

just few notes from me:

https://etherpad.openstack.org/p/tripleo-feature-map sounds like a great 
idea, we should go through them one by one maybe on meeting.
We should agree on what is doable for I, without violating the Openstack 
way in some very ugly way. So do we want to be Openstack on Openstack

or Almost Openstack on Openstack? Or what is the goal here?

So let's take a simple example, flat network 2 racks (32 nodes), 2 
controllers nodes, 2 neutron nodes, 14 nova compute, 14 storage


I. Manual way using Heat and Scheduler could be assigning every group of 
nodes to special flavor by hard. Then nova scheduler will take care of it.
1. How hard it will be to implement 'Assigning a specific nodes to 
Flavor' ? (probably adding a condition for MAC address?)
Or do you have some other idea how to do this in an almost clean 
way? Without reimplementing nova scheduler. (though this is probably 
messing with scheduler)
2. How this will be implementable in UI? Just assigning nodes to flavors 
and uploading a Heat template?


II. Having homogeneous hardware, all will be one flavor and then nova 
scheduler will decide, where to put what. When you give heat e.g. I want 
to spawn 2 controller images.
1. How hard is to set the policies, like we want to spread those nodes 
over all racks?
2. How this will be implementable in UI? It is basically building a 
complex Heat template, right? So just uploading a Heat template?


III. Having more flavors
1. We will be able to set in Heat something like, I want Nova compute 
node on compute_flavor(amazon c1,c3) with high priority or on 
all_purpose_flavor(amazon m1)  with normal_priority. How hard is that?

2. How this will be implementable in UI? Just uploading a Heat template?

IV. Tripleo way


1. From the OOO name I infer, we want to use openstack, that means using 
Heat, Nova scheduler etc.
From my point of view having Heat template for deploying e.g. 
Wordpress installation seems the same to me like having a Heat template
to deploy Openstack, it's just much more complex. Is this a valid 
assumption? If you think it's not, explain why please.



Radical idea : we could ask (e.g. on -operators) for a few potential 
users who'd be willing to let us interview them.

Yes please!!!

Talking to jcoufal, being able to edit Heat template in UI, being able 
to assign baremetals to flavors(later connected to template catalog). It 
could be all we need. Also later visualize
what will happen when you actually stack create the template, so we 
don't go blindly would be very needed.


Kind regards,
Ladislav


On 11/28/2013 06:41 AM, Robert Collins wrote:

Hey, I realise I've done a sort of point-bypoint thing below - sorry.
Let me say that I'm glad you're focused on what will help users, and
their needs - I am too. Hopefully we can figure out why we have
different opinions about what things are key, and/or how we can get
data to better understand our potential users.


On 28 November 2013 02:39, Jaromir Coufal jcou...@redhat.com wrote:


Important point here is, that we agree on starting with very basics - grow
then. Which is great.

The whole deployment workflow (not just UI) is all about user experience
which is built on top of TripleO's approach. Here I see two important
factors:
- There are users who are having some needs and expectations.

Certainly. Do we have Personas for those people? (And have we done any
validation of them?)


- There is underlying concept of TripleO, which we are using for
implementing features which are satisfying those needs.

mmm, so the technical aspect of TripleO is about setting up a virtuous
circle: where improvements in deploying cluster software via OpenStack
makes deploying OpenStack better, and those of us working on deploying
OpenStack will make deploying cluster software via OpenStack better in
general, as part of solving 'deploying OpenStack' in a nice way.


We are circling around and trying to approach the problem from wrong end -
which is implementation point of view (how to avoid own scheduling).

Let's try get out of the box and start with thinking about our audience
first - what they expect, what they need. Then we go back, put our
implementation thinking hat on and find out how we are going to re-use
OpenStack components to achieve our goals. In the end we have detailed plan.

Certainly, +1.


=== Users ===

I would like to start with our targeted audience first - without milestones,
without implementation details.

I think here is the main point where I disagree and which leads to different
approaches. I don't think, that user of TripleO cares only about deploying
infrastructure without any knowledge where the things go. This is overcloud
user's approach - 'I want VM and I don't care where it runs'. Those are
self-service users / cloud users. I know we are OpenStack on OpenStack, but
we shouldn't go that far that we expect same behavior from undercloud users.
I can tell you various examples of why the 

Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-28 Thread Jaromir Coufal

Hi Mark,

thanks for your insight, I mostly agree. Just few points below.

On 2013/27/11 21:54, Mark McLoughlin wrote:

Hi Jarda,

...

Yes, I buy this. And I think it's the point worth dwelling on.

It would be quite a bit of work to substantiate the point with hard data
- e.g. doing user testing of mockups with and without placement control
- so we have to at least try to build some consensus without that.
I agree here. It will be a lot of work. I'd love to have that, but 
creating distinct designs, finding users for real testing and testing 
with them will consume big amount of time and in this agile approach we 
can't afford it.


I believe that we are not very distinct in our goals and that we can get 
to consensus without that.


There was smaller confusion which I tried to clarify in answer to Rob's 
response.



We could do some work on a more detailed description of the persona and
their basic goals. This would clear up whether we're designing for the
case where one persona owns the undercloud and there's another overcloud
operator persona.
Yes, we need to have this written down. Or at least get to consensus if 
we can quickly get there and document it then. Whatever works and 
doesn't block us.



We could also look at other tools targeted to similar use cases and see
what they do.
I looked and they all do it very manual way. (or at least those which I 
have seen from Mirantis, Huawei, etc) - and there is some reason for 
this. As I wrote into Robert's answer, we can do much more, we can be 
smart, but we can't think that we are the smartest.



But yeah - my instinct is that all of that would show that we'd be
fighting an uphill battle to persuade our users that this type of magic
is what they want.

That's exactly my point. Thanks for saying that.

We want to help them and feed them with ready-to-deploy solution. But 
they need to have feeling that they have things under control (maybe 
just check the solution and/or allow to change).



...

=== Implementation ===

Above mentioned approach shouldn't lead to reimplementing scheduler. We
can still use nova-scheduler, but we can take advantage of extra params
(like unique identifier), so that we specify more concretely what goes
where.

It's hard to see how what you describe doesn't ultimately mean we
completely by pass the Nova scheduler. Yes, if you request placement on
a specific node, it does still go through the scheduler ... but it
doesn't do any actual scheduling.

Maybe we should separate the discussion/design around control nodes and
resource (i.e. compute/storage) nodes. Mostly because there should be a
large ratio of the latter to the former, so you'd expect it to be less
likely for such fine grained control over resource nodes to be useful.

e.g. maybe adding more compute nodes doesn't involve the user doing any
placement, and we just let the nova scheduler choose from the available
nodes which are suitable for compute workloads.
Yes, controller nodes will need to get better treatment, but I think not 
in our first steps. I believe that for now we are fine with going with 
generic controller node which is running all controller services.


I think what would be great to have is to let nova-scheduler to do its 
job (dry-run), show the distribution and just confirm (or do some change 
in there).


-- Jarda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-28 Thread Jaromir Coufal

On 2013/27/11 16:37, James Slagle wrote:

On Wed, Nov 27, 2013 at 8:39 AM, Jaromir Coufal jcou...@redhat.com wrote:


V0: basic slick installer - flexibility and control first
- enable user to auto-discover (or manual register) nodes
- let user decide, which node is going to be controller, which is going to
be compute or storage
- associate images with these nodes
- deploy


I think you've made some good points about the user experience helping drive the
design of what Tuskar is targeting.  I think the conversation around
how to design
letting the user pick what to deploy where should continue.  I wonder
though, would
it be possible to not have that in a V0?

Basically make your V0 above even smaller (eliminating the middle 2
bullets), and just
letting nova figure it out, the same as what happens now when we run
heat stack-create  from the CLI.

I see 2 possible reasons for trying this:
- Gets us to something people can try even sooner
- It may turn out we want this option in the long run ... a figure it
out all out for me
   type of approach, so it wouldn't be wasted effort.

Hey James,

well as long as we end up with possibility to have control over it in 
the Icehouse release , I am fine with that.

(The 'control' I tried to explain closer in response to Robert's e-mail).

As for the milestone approach:
I just think that more basic and traditional way for user is to do stuff 
manually. And that's where I think we can start. That's user's point of 
view.
From implementation point of view, there is already some magic in 
openstack, so it might be easier to start with that already existing 
magic, add manual support then and then enhance the magic to much 
smarter approach.


In the end, most of the audience will see the result in Icehouse 
release, so if we start one way or another - whatever works. I just want 
to make sure, that we are going to deliver usable solution.


-- Jarda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-28 Thread Jiří Stránský

Hi all,

just a few thoughts (subjective opinions) regarding the whole debate:

* I think that having a manually picking images for machines approach 
would make TripleO more usable in the beginning. I think it will take a 
good deal of time to get our smart solution working with the admin 
rather than against him [1], and a possibility of manual override is a 
good safety catch.


E.g. one question that i wonder about - how would our smart flavor-based 
approach solve this situation: I have homogenous nodes on which i want 
to deploy Cinder and Swift. Half of those nodes has better connectivity 
to the internet than the other half. I want Swift on the ones with 
better internet connectivity. How will i ensure such deployment with 
flavor-based approach? Could we use e.g. host aggregates defined on the 
undercloud for this? I think it will take time before our smart solution 
can understand such and similar conditions.


* On the other hand, i think relying on Nova to pick hosts feels more 
TripleO-spirited solution to me. It means using OpenStack to deploy 
OpenStack.


So i can't really lean towards one solution or the other. Maybe it's 
most important to make *something*, gather some feedback, and tweak what 
needs tweaking.



Cheers

Jirka


[1] http://i.technet.microsoft.com/dynimg/IC284957.jpg

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-27 Thread Jaromir Coufal


On 2013/27/11 00:00, Robert Collins wrote:

On 26 November 2013 07:41, Jaromir Coufal jcou...@redhat.com wrote:

Hey Rob,

can we add 'Slick Overcloud deployment through the UI' to the list? There
was no session about that, but we discussed it afterwords and agreed that it
is high priority for Icehouse as well.

I just want to keep it on the list, so we are aware of that.

Certainly. Please add a blueprint for that and I'll mark itup appropriately.

I will do.


Related to that we had a long chat in IRC that I was to follow up here, so - ...

Tuskar is refocusing on getting the basics really right - slick basic
install, and then work up. At the same time, just about every nova
person I've spoken too (a /huge/ sample of three, but meh :)) has
expressed horror that Tuskar is doing it's own scheduling, and
confusion about the need to manage flavors in such detail.
So the discussion on IRC was about getting back to basics - a clean
core design and something that we aren't left with technical debt that
we need to eliminate in order to move forward - which the scheduler
stuff would be.

So: my question/proposal was this: lets set a couple of MVPs.

0: slick install homogeneous nodes:
  - ask about nodes and register them with nova baremetal / Ironic (can
use those APIs directly)
  - apply some very simple heuristics to turn that into a cloud:
- 1 machine - all in one
- 2 machines - separate hypervisor and the rest
- 3 machines - two hypervisors and the rest
- 4 machines - two hypervisors, HA the rest
- 5 + scale out hypervisors
  - so total forms needed = 1 gather hw details
  - internals: heat template with one machine flavor used

1: add support for heterogeneous nodes:
  - for each service (storage compute etc) supply a list of flavors
we're willing to have that run on
  - pass that into the heat template
  - teach heat to deal with flavor specific resource exhaustion by
asking for a different flavor (or perhaps have nova accept multiple
flavors and 'choose one that works'): details to be discussed with
heat // nova at the right time.

2: add support for anti-affinity for HA setups:
  - here we get into the question about short term deliverables vs long
term desire, but at least we'll have a polished installer already.

-Rob


Important point here is, that we agree on starting with very basics - 
grow then. Which is great.


The whole deployment workflow (not just UI) is all about user experience 
which is built on top of TripleO's approach. Here I see two important 
factors:

- There are *users* who are having some *needs and expectations*.
- There is underlying *concept of TripleO*, which we are using for 
*implementing* features which are satisfying those needs.


We are circling around and trying to approach the problem from wrong end 
- which is implementation point of view (how to avoid own scheduling).


Let's try get out of the box and start with thinking about our audience 
first - what they expect, what they need. Then we go back, put our 
implementation thinking hat on and find out how we are going to re-use 
OpenStack components to achieve our goals. In the end we have detailed plan.



=== Users ===

I would like to start with our targeted audience first - without 
milestones, without implementation details.


I think here is the main point where I disagree and which leads to 
different approaches. I don't think, that user of TripleO cares *only* 
about deploying infrastructure without any knowledge where the things 
go. This is overcloud user's approach - 'I want VM and I don't care 
where it runs'. Those are self-service users / cloud users. I know we 
are OpenStack on OpenStack, but we shouldn't go that far that we expect 
same behavior from undercloud users. I can tell you various examples of 
why the operator will care about where the image goes and what runs on 
specific node.


/One quick example:/
I have three racks of homogenous hardware and I want to design it the 
way so that I have one control node in each, 3 storage nodes and the 
rest compute. With that smart deployment, I'll never know what my rack 
contains in the end. But if I have control over stuff, I can say that 
this node is controller, those three are storage and those are compute - 
I am happy from the very beginning.


Our targeted audience are sysadmins, operators. They hate 'magics'. They 
want to have control over things which they are doing. If we put in 
front of them workflow, where they click one button and they get cloud 
installed, they will get horrified.


That's why I am very sure and convinced that we need to have ability for 
user to have control over stuff. What node is having what role. We can 
be smart, suggest and advice. But not hiding this functionality from 
user. Otherwise, I am afraid that we can fail.


Furthermore, if we put lots of restrictions (like homogenous hardware) 
in front of users from the very beginning, we are discouraging people 
from using TripleO-UI. We are 

Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-27 Thread James Slagle
On Wed, Nov 27, 2013 at 8:39 AM, Jaromir Coufal jcou...@redhat.com wrote:

 V0: basic slick installer - flexibility and control first
 - enable user to auto-discover (or manual register) nodes
 - let user decide, which node is going to be controller, which is going to
 be compute or storage
 - associate images with these nodes
 - deploy


I think you've made some good points about the user experience helping drive the
design of what Tuskar is targeting.  I think the conversation around
how to design
letting the user pick what to deploy where should continue.  I wonder
though, would
it be possible to not have that in a V0?

Basically make your V0 above even smaller (eliminating the middle 2
bullets), and just
letting nova figure it out, the same as what happens now when we run
heat stack-create  from the CLI.

I see 2 possible reasons for trying this:
- Gets us to something people can try even sooner
- It may turn out we want this option in the long run ... a figure it
out all out for me
  type of approach, so it wouldn't be wasted effort.


-- 
-- James Slagle
--

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-27 Thread Mark McLoughlin
Hi Jarda,

On Wed, 2013-11-27 at 14:39 +0100, Jaromir Coufal wrote:

 I think here is the main point where I disagree and which leads to 
 different approaches. I don't think, that user of TripleO cares *only* 
 about deploying infrastructure without any knowledge where the things 
 go. This is overcloud user's approach - 'I want VM and I don't care 
 where it runs'. Those are self-service users / cloud users. I know we 
 are OpenStack on OpenStack, but we shouldn't go that far that we expect 
 same behavior from undercloud users.

Nice, I think you're getting really close to identifying the conflicting
assumptions/viewpoints here.

What OpenStack - and cloud, in general - does is provide a nice
self-service abstraction between the owners of the underlying resources
and the end-user.

We take away an awful lot of placement control away from the
self-service in order to allow the operator to provide a usable,
large-scale, multi-tenant service.

The difference with TripleO is that we assume the undercloud operator
and the undercloud user are one and the same. At least, that's what I
assume we're designing for. I don't think we're designing for a
situation where there is an undercloud operator serving the needs of
multiple overcloud operators and it's important for the undercloud
operator to have ultimate control over placement.

That's hardly the end of the story here, but it is one useful
distinction that could justify why this case might be different from the
usual application-deployment-on-IaaS case.

 I can tell you various examples of 
 why the operator will care about where the image goes and what runs on 
 specific node.
 
 /One quick example:/
 I have three racks of homogenous hardware and I want to design it the 
 way so that I have one control node in each, 3 storage nodes and the 
 rest compute. With that smart deployment, I'll never know what my rack 
 contains in the end. But if I have control over stuff, I can say that 
 this node is controller, those three are storage and those are compute - 
 I am happy from the very beginning.

It is valid to ask why this knowledge is important to the user in this
case and why it makes them happy. Challenging such assumptions can lead
to design breakthroughs, I'm sure you agree.

e.g. before AWS came along, you could imagine someone trying to shoot
down the entire premise of IaaS with similar arguments.

Or the whole they'd have asked for a faster horse thing.

 Our targeted audience are sysadmins, operators. They hate 'magics'. They 
 want to have control over things which they are doing. If we put in 
 front of them workflow, where they click one button and they get cloud 
 installed, they will get horrified.

 That's why I am very sure and convinced that we need to have ability for 
 user to have control over stuff. What node is having what role. We can 
 be smart, suggest and advice. But not hiding this functionality from 
 user. Otherwise, I am afraid that we can fail.
 
 Furthermore, if we put lots of restrictions (like homogenous hardware) 
 in front of users from the very beginning, we are discouraging people 
 from using TripleO-UI. We are young project and trying to hit as broad 
 audience as possible. If we do flexible enough approach to get large 
 audience interested, solve their problems, we will get more feedback, we 
 will get early adopters, we will get more contributors, etc.
 
 First, let's help cloud operator, who is having some nodes and wants to 
 deploy OpenStack on them. He wants to have control which node is 
 controller, which node is compute or storage. Then we can get smarter 
 and guide.

Yes, I buy this. And I think it's the point worth dwelling on.

It would be quite a bit of work to substantiate the point with hard data
- e.g. doing user testing of mockups with and without placement control
- so we have to at least try to build some consensus without that.

We could do some work on a more detailed description of the persona and
their basic goals. This would clear up whether we're designing for the
case where one persona owns the undercloud and there's another overcloud
operator persona.

We could also look at other tools targeted to similar use cases and see
what they do.

But yeah - my instinct is that all of that would show that we'd be
fighting an uphill battle to persuade our users that this type of magic
is what they want.

...
 === Implementation ===
 
 Above mentioned approach shouldn't lead to reimplementing scheduler. We 
 can still use nova-scheduler, but we can take advantage of extra params 
 (like unique identifier), so that we specify more concretely what goes 
 where.

It's hard to see how what you describe doesn't ultimately mean we
completely by pass the Nova scheduler. Yes, if you request placement on
a specific node, it does still go through the scheduler ... but it
doesn't do any actual scheduling.

Maybe we should separate the discussion/design around control nodes and
resource (i.e. compute/storage) 

Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-26 Thread Robert Collins
On 26 November 2013 07:41, Jaromir Coufal jcou...@redhat.com wrote:
 Hey Rob,

 can we add 'Slick Overcloud deployment through the UI' to the list? There
 was no session about that, but we discussed it afterwords and agreed that it
 is high priority for Icehouse as well.

 I just want to keep it on the list, so we are aware of that.

Certainly. Please add a blueprint for that and I'll mark itup appropriately.

Related to that we had a long chat in IRC that I was to follow up here, so - ...

Tuskar is refocusing on getting the basics really right - slick basic
install, and then work up. At the same time, just about every nova
person I've spoken too (a /huge/ sample of three, but meh :)) has
expressed horror that Tuskar is doing it's own scheduling, and
confusion about the need to manage flavors in such detail.

So the discussion on IRC was about getting back to basics - a clean
core design and something that we aren't left with technical debt that
we need to eliminate in order to move forward - which the scheduler
stuff would be.

So: my question/proposal was this: lets set a couple of MVPs.

0: slick install homogeneous nodes:
 - ask about nodes and register them with nova baremetal / Ironic (can
use those APIs directly)
 - apply some very simple heuristics to turn that into a cloud:
   - 1 machine - all in one
   - 2 machines - separate hypervisor and the rest
   - 3 machines - two hypervisors and the rest
   - 4 machines - two hypervisors, HA the rest
   - 5 + scale out hypervisors
 - so total forms needed = 1 gather hw details
 - internals: heat template with one machine flavor used

1: add support for heterogeneous nodes:
 - for each service (storage compute etc) supply a list of flavors
we're willing to have that run on
 - pass that into the heat template
 - teach heat to deal with flavor specific resource exhaustion by
asking for a different flavor (or perhaps have nova accept multiple
flavors and 'choose one that works'): details to be discussed with
heat // nova at the right time.

2: add support for anti-affinity for HA setups:
 - here we get into the question about short term deliverables vs long
term desire, but at least we'll have a polished installer already.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Summit session wrapup

2013-11-25 Thread Jaromir Coufal

Hey Rob,

can we add 'Slick Overcloud deployment through the UI' to the list? 
There was no session about that, but we discussed it afterwords and 
agreed that it is high priority for Icehouse as well.


I just want to keep it on the list, so we are aware of that.

Thanks
-- Jarda

On 2013/25/11 02:17, Robert Collins wrote:

I've now gone through and done the post summit cleanup of blueprints
and migration of design docs into blueprints as appropriate.

We had 50 odd blueprints, many of where were really not effective
blueprints - they described single work items with little coordination
need, were not changelog items, etc. I've marked those obsolete.
Blueprints are not a discussion forum - they are a place that [some]
discussions can be captured, but anything initially filed there will
take some time before folk notice it - and the lack of a discussion
mechanism makes it very hard to reach consensus there. Could TripleO
interested folk please raise things here, on the dev list initially,
and we'll move it to lower latency // higher bandwidth environments as
needed?

 From the summit we had the following outcomes
https://etherpad.openstack.org/p/icehouse-deployment-hardware-autodiscovery
- needs to be done in ironic

https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-modelling-infrastructure-sla-services
- needs more discussion to tease concerns out - in particular I want
us to get to
a problem statement that Nova core folk understand :)

https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-ha-production-configuration
- this is ready for folk to act on at any point

https://blueprints.launchpad.net/tripleo/+spec/tripleo-tuskar-deployment-scaling-topologies
- this is ready for folk to act on - but it's fairly shallow, since
most of the answer was 'discuss with heat' :)

https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-scaling-design
- this is ready for folk to act on; the main thing was gathering a
bunch of data so we can make good decisions from here on out

The stable branches decision has been documented in the wiki - all done.

Cheers,
Rob
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Summit session wrapup

2013-11-24 Thread Robert Collins
I've now gone through and done the post summit cleanup of blueprints
and migration of design docs into blueprints as appropriate.

We had 50 odd blueprints, many of where were really not effective
blueprints - they described single work items with little coordination
need, were not changelog items, etc. I've marked those obsolete.
Blueprints are not a discussion forum - they are a place that [some]
discussions can be captured, but anything initially filed there will
take some time before folk notice it - and the lack of a discussion
mechanism makes it very hard to reach consensus there. Could TripleO
interested folk please raise things here, on the dev list initially,
and we'll move it to lower latency // higher bandwidth environments as
needed?

From the summit we had the following outcomes
https://etherpad.openstack.org/p/icehouse-deployment-hardware-autodiscovery
- needs to be done in ironic

https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-modelling-infrastructure-sla-services
- needs more discussion to tease concerns out - in particular I want
us to get to
a problem statement that Nova core folk understand :)

https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-ha-production-configuration
- this is ready for folk to act on at any point

https://blueprints.launchpad.net/tripleo/+spec/tripleo-tuskar-deployment-scaling-topologies
- this is ready for folk to act on - but it's fairly shallow, since
most of the answer was 'discuss with heat' :)

https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-scaling-design
- this is ready for folk to act on; the main thing was gathering a
bunch of data so we can make good decisions from here on out

The stable branches decision has been documented in the wiki - all done.

Cheers,
Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev