Re: [openstack-dev] [kolla] Integrating kollacli as python-kollaclient

2015-10-22 Thread Jastrzebski, Michal
Hello,

I have looked at this code and it seems pretty solid. I'm not sure if it will 
be ready for governance as-is, there are few things I think have to be 
addressed before (for example I'm not sure if ansible can be dependency due to 
its license). Having said that I'd be happy to see it in our codebase as it 
will help kolla's user experience a lot.

So +1 from me, thanks guys for it!

Let's discuss what has to be done to get it into openstack asap, and then next 
steps. I'd like to volunteer to help you with that guys.

Regards,
Michał

> -Original Message-
> From: Paul Bourke [mailto:paul.bou...@oracle.com]
> Sent: Thursday, October 22, 2015 10:42 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [kolla] Integrating kollacli as 
> python-kollaclient
> 
> Having used the cli for some time I can vouch for it being very useful, and
> usable. The guys have done a nice job of giving it an "OpenStack feel",
> mimicking the patterns used in openstackclient and the like.
> 
> It should give users a more polished introduction to Kolla.
> 
> +1
> 
> -Paul
> 
> On 21/10/15 23:20, Steven Dake (stdake) wrote:
> > Hello Folks,
> >
> >
> > Oracle has developed a CLI tool for managing OpenStack Kolla clusters.
> > Several months ago at our midcycle, the topic was brought up an I
> > suggested to go ahead and get started on the work.  We clearly didn't
> > spend enough time discussing how it should be integrated into the code
> > base or developed or even what its features should be, and that is my error.
> >
> >
> > What ended up happening is sort of a code dump, which is not ideal,
> > but I can only work so many 20 hour days ;)  I didn't believe our
> > community had the bandwidth to deal with integrating a CLI directly
> > into the tree while we were focused on our major objective of
> > implementing Ansible deployment of OpenStack in Docker containers.
> > Possibly the wrong call, but it is what it is and it is my error, not 
> > Oracles.
> >
> >
> > The code can be cloned from:
> >
> > git clone git://oss.oracle.com/git/openstack-kollacli.git
> >
> >
> > The code as is is very high quality but will likely need to go through
> > alot of refactoring to ReST-ify it.  There are two major authors of
> > the code, Borne Mace and Steve Noyes.
> >
> >
> > I'd like a majority vote from the core team as to whether we should
> > add this repository to our list of governed repositories in the
> > OpenStack Kolla governance repository here:
> >
> >
> > https://github.com/openstack/governance/blob/master/reference/projects
> > .yaml#L1509
> >
> >
> > Consider this email a +1 vote from me.
> >
> >
> > A completely separate email thread and decision will be made by the
> > community about core team membership changes to handle maintenance of
> > the code.  Assuming this code is voted into Kolla's governance, I plan
> > to propose Borne as a core reviewer, which will be open to core team
> > vote as a separate act with our 3 +1 votes no vetos within 1 week
> > period.  We will address that assuming a majority vote of the code
> > merge wins.  Steve can follow the normal processes for joining the
> > core team if he wishes (reviewing patches) - clearly his code
> > contributions are there.  Borne already does some reviews, and
> > although he isn't a top reviewer, he does have some contribution in
> > this area making it into the top 10 for the Liberty cycle.
> >
> >
> >
> > Kolla CLI Features:
> >
> >   * dynamic ansible inventory manipulation via the host, group and
> > service commands
> >   * ssh key push via the host setup command
> >   * ssh key validation via the host check command
> >   * ansible deployment via the deploy command
> >   * property viewing and modification with the property list, set and
> > clear commands
> >   * cleanup of docker containers on a single, multiple or all hosts via
> > the host destroy command
> >   * debug data collection via the dump command
> >   * configuration of openstack passwords via the password command
> >   * Lines of python = 2700
> >   * Lines of  test case code =  1800
> >   * ~ 200 commits
> >
> >
> >
> >
> >
> 
> __
> >  OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [kolla] Backport policy for Liberty

2015-10-09 Thread Jastrzebski, Michal
Hello,

Since we have little actual logic, and ansible itself is pretty pluggable by 
its very nature, backporting should be quite easy and would not affect existing 
deployment much. We will make sure that it will be safe to have stable/liberty 
code and will keep working at all times. I agree with Sam that we need careful 
CI for that, and it will be our first priority.

I would very much like to introduce operators to our session regarding this 
policy, as they will be most affected party and we want to make sure that they 
will take part in decision.

Regards,
Michał

From: Sam Yaple [mailto:sam...@yaple.net]
Sent: Friday, October 9, 2015 4:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [kolla] Backport policy for Liberty

On Thu, Oct 8, 2015 at 2:47 PM, Steven Dake (stdake) 
> wrote:
Kolla operators and developers,

The general consensus of the Core Reviewer team for Kolla is that we should 
embrace a liberal backport policy for the Liberty release.  An example of 
liberal -> We add a new server service to Ansible, we would backport the 
feature to liberty.  This is in breaking with the typical OpenStack backports 
policy.  It also creates a whole bunch more work and has potential to introduce 
regressions in the Liberty release.

Given these realities I want to put on hold any liberal backporting until after 
Summit.  I will schedule a fishbowl session for a backport policy discussion 
where we will decide as a community what type of backport policy we want.  The 
delivery required before we introduce any liberal backporting policy then 
should be a description of that backport policy discussion at Summit distilled 
into a RST file in our git repository.

If you have any questions, comments, or concerns, please chime in on the thread.

Regards
-steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I am in favor of a very liberal backport policy. We have the potential to have 
very little code difference between N, N-1, and N-2 releases while still 
deploying the different versions of OpenStack. However, I recognize is a big 
undertaking to backport all things, not to mention the testing involved.

I would like to see two things before we truly embrace a liberal policy. The 
first is better testing. A true gate that does upgrades and potentially 
multinode (at least from a network perspective). The second thing is a bot or 
automation of some kind to automatically propose non-conflicting patches to the 
stable branches if they include the 'backport: xyz' tag in the commit message. 
Cores would still need to confirm these changes with the normal review process 
and could easily abandon them, but that would remove alot of overhead of 
performing the actual backport.
Since Kolla simply deploys OpenStack, it is alot closer to a client or a 
library than it is to Nova or Neutron. And given its mission maybe it should 
break from the "typical OpenStack backports policy" so we can give a consistent 
deployment experience across all stable and supported version of OpenStack at 
any given time.
Those are my thoughts on the matter at least. I look forward to some 
conversations about this in Tokyo.
Sam Yaple

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] proposing Michal Jastrzebski (inc0) for core reviewer

2015-09-30 Thread Jastrzebski, Michal
Thanks everyone!

I really appreciate this and I hope to help to make kolla even better project 
than it is right now (and right now it's pretty cool;)). We have great 
community, very diverse and very dedicated. It's pleasure to work with all of 
you and let's keep up with great work in following releases:)

Thank you again,
Michał

> -Original Message-
> From: Steven Dake (stdake) [mailto:std...@cisco.com]
> Sent: Wednesday, September 30, 2015 8:05 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [kolla] proposing Michal Jastrzebski (inc0) for 
> core
> reviewer
> 
> Michal,
> 
> The vote was unanimous.  Welcome to the Kolla Core Reviewer team.  I have
> added you to the appropriate gerrit group.
> 
> Regards
> -steve
> 
> 
> From: Steven Dake  >
> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
>  d...@lists.openstack.org> >
> Date: Tuesday, September 29, 2015 at 3:20 PM
> To: "OpenStack Development Mailing List (not for usage questions)"
>  d...@lists.openstack.org> >
> Subject: [openstack-dev] [kolla] proposing Michal Jastrzebski (inc0) for core
> reviewer
> 
> 
> 
>   Hi folks,
> 
>   I am proposing Michal for core reviewer.  Consider my proposal as a
> +1 vote.  Michal has done a fantastic job with rsyslog, has done a nice job
> overall contributing to the project for the last cycle, and has really 
> improved his
> review quality and participation over the last several months.
> 
>   Our process requires 3 +1 votes, with no veto (-1) votes.  If your
> uncertain, it is best to abstain :)  I will leave the voting open for 1 week 
> until
> Tuesday October 6th or until there is a unanimous decision or a  veto.
> 
>   Regards
>   -steve


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] package based overcloud upgrades

2015-06-25 Thread Jastrzebski, Michal
Hello guys,

As for TripleO+Upgrades, Kolla is one of ways to help with that. Keep in mind 
that package upgrades also means dependency upgrades, and that can break things 
unless we upgrade whole stack at once. No downtime upgrades will basically mean 
that we need to decouple upgrade process to several steps to ensure per node 
(and maybe later, with Kolla per service/per node) upgrades. I think we could 
achieve that with some clever dependency tinkering in heat (start upgrading 
resource X only if resource Y is already upgraded), or clever usage of 
breakpoints? Maybe to introduce something like inprog quota on resource group 
- during upgrade only 1 of 5 resources in group can be played with, after it's 
finish this lock is lifted and next resource can start upgrade? I think lock 
like this might make sense for other stacks as well, if we want to minimize 
impact of stack upgrade. Thoughts?

Regards,
Michał

 -Original Message-
 From: Dan Prince [mailto:dpri...@redhat.com]
 Sent: Friday, June 26, 2015 12:11 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [TripleO] package based overcloud upgrades
 
 On Thu, 2015-06-25 at 12:01 -0700, Dan Smith wrote:
  Hi Dan,
 
   I put together a quick etherpad to help outline the remaining
   patches left to support package based upgrades within the TripleO
   heat
   templates:
  
   https://etherpad.openstack.org/p/tripleo-package-upgrades
  
   The etherpad includes a brief overview of the upgrade approach, a
   list of patches related to upgrades, and then instructions on how
   you can go about testing upgrades with devtest today.
 
  From the looks of this, this is mostly about how you deploy new
  overcloud packages and kick services to start using them, in a sort of
  stop the cloud, upgrade everything, start the cloud again sort of
  way.
  Is that right? Maybe I'm missing some high-level magic that is
  happening in the heat templates?
 
  Nova has been doing a lot of work to avoid the first step of stop the
  (whole) cloud, and the inertia seems to be spreading to other
  projects.
  What you describe in the etherpad seems very puppet-focused, which
  seems (to me) to be a little too naive to orchestrate a rolling
  upgrade operation where things have to happen in a specific order, but
  where you don't have to fully turn anything off in the process.
 
  Is this evaluation correct? If so, is this the first phase of an
  upgrade approach, or the only goal for the moment? Any thoughts on how
  we can get to something more flexible?
 
 
 Exactly. Step at a time. It is really just a mechanism to deploy
 package based upgrades via Heat... which certainly doesn't solve all of
 the upgrade issues (or maybe even the complicated ones) but it is a
 step forwards. And it is cool to be able to do all of this with just
 Heat.
 
 I would say the jury is still out on how we tackle some of the workflow
 issues around full version (major) upgrades while also ensuring minimal
 downtime.
 
 Dan (dprince)
 
 
  Thanks!
 
  --Dan
 
 
 __
 ___
  _
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs
  cribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 __
 
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: OpenStack-dev-
 requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Heat][Oslo] Versioned objects compatibility mode

2015-06-22 Thread Jastrzebski, Michal
Hello,

I wanted to start discussion about versioned objects backporting for 
conductor-less projects.
In Vancouver we discussed compatibility mode, which works like that:

1. We define one version for every object we use, this means adding base 
object, for example:

Class HeatObject:
VERSION=1.5

All objects inherit from this one, each time we change it, we bump up this 
variable.

2. We introduce new config option in heat.conf 

Combatibility_mode = 1.4

3. Implement mechanism which will automatically backport each outgoing message 
to 1.4 as long as we have this var.

Upgrade flow looks like that:
1. We have all nodes using 1.4 version
2. We incrementally run new version with 1.4 compatibility mode
3. When all nodes are already up-to-date, we incrementally turn off 
compatibility mode

This solution has one rather big disadvantage: 2 restarts.
This can be mitigated by adding another call to reread config without restart. 
Oslo.config has this capability, but we need to add call to run it.

Thoughts?

Regards,
Michał Jastrzębski

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][heat] Versioned objects backporting

2015-05-04 Thread Jastrzebski, Michal
W dniu 5/4/2015 o 8:21 AM, Angus Salkeld pisze: On Thu, Apr 30, 2015 at 9:25 
PM, Jastrzebski, Michal 
 michal.jastrzeb...@intel.com mailto:michal.jastrzeb...@intel.com wrote:
 
 Hello,
 
 After discussions, we've spotted possible gap in versioned objects:
 backporting of too-new versions in RPC.
 Nova does that by conductor, but not every service has something
 like that. I want to propose another approach:
 
 1. Milestone pinning - we need to make single reference to versions
 of various objects - for example heat in version 15.1 will mean
 stack in version 1.1 and resource in version 1.5.
 2. Compatibility mode - this will add flag to service
 --compatibility=15.1, that will mean that every outgoing RPC
 communication will be backported before sending to object versions
 bound to this milestone.
 
 With this 2 things landed we'll achieve rolling upgrade like that:
 1. We have N nodes in version V
 2. We take down 1 node and upgrade code to version V+1
 3. Run code in ver V+1 with --compatibility=V
 4. Repeat 2 and 3 until every node will have version V+1
 5. Restart each service without compatibility flag
 
 This approach has one big disadvantage - 2 restarts required, but
 should solve problem of backporting of too-new versions.
 Any ideas? Alternatives?
 
 
 AFAIK if nova gets a message that is too new, it just forwards it on 
 (and a newer server will handle it).
 
 With that this *should* work, shouldn't it?
 1. rolling upgrade of heat-engine

That will be hard part. When we'll have only one engine from given version, we 
lose HA. Also, since we never know where given task lands, we might end up with 
one task bouncing from old version to old version, making call indefinitely 
long. Ofc with each upgraded engine we'll lessen change for that to happen, but 
I think we should aim for lowest possible downtime. That being said, that might 
be good idea to solve this problem not-too-clean, but quickly.

 2. db sync
 3. rolling upgrade of heat-api
 
 -Angus
 
 
 Regards,
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][heat] Versioned objects backporting

2015-05-04 Thread Jastrzebski, Michal
W dniu 5/4/2015 o 11:50 AM, Angus Salkeld pisze: On Mon, May 4, 2015 at 6:33 
PM, Jastrzebski, Michal 
 michal.jastrzeb...@intel.com mailto:michal.jastrzeb...@intel.com wrote:
 
 W dniu 5/4/2015 o 8:21 AM, Angus Salkeld pisze: On Thu, Apr 30,
 2015 at 9:25 PM, Jastrzebski, Michal
   michal.jastrzeb...@intel.com
 mailto:michal.jastrzeb...@intel.com
 mailto:michal.jastrzeb...@intel.com
 mailto:michal.jastrzeb...@intel.com wrote:
  
   Hello,
  
   After discussions, we've spotted possible gap in versioned
 objects:
   backporting of too-new versions in RPC.
   Nova does that by conductor, but not every service has something
   like that. I want to propose another approach:
  
   1. Milestone pinning - we need to make single reference to
 versions
   of various objects - for example heat in version 15.1 will mean
   stack in version 1.1 and resource in version 1.5.
   2. Compatibility mode - this will add flag to service
   --compatibility=15.1, that will mean that every outgoing RPC
   communication will be backported before sending to object
 versions
   bound to this milestone.
  
   With this 2 things landed we'll achieve rolling upgrade like
 that:
   1. We have N nodes in version V
   2. We take down 1 node and upgrade code to version V+1
   3. Run code in ver V+1 with --compatibility=V
   4. Repeat 2 and 3 until every node will have version V+1
   5. Restart each service without compatibility flag
  
   This approach has one big disadvantage - 2 restarts required, but
   should solve problem of backporting of too-new versions.
   Any ideas? Alternatives?
  
  
   AFAIK if nova gets a message that is too new, it just forwards it on
   (and a newer server will handle it).
  
   With that this *should* work, shouldn't it?
   1. rolling upgrade of heat-engine
 
 That will be hard part. When we'll have only one engine from given
 version, we lose HA. Also, since we never know where given task
 lands, we might end up with one task bouncing from old version to
 old version, making call indefinitely long. Ofc with each upgraded
 engine we'll lessen change for that to happen, but I think we should
 aim for lowest possible downtime. That being said, that might be
 good idea to solve this problem not-too-clean, but quickly.
 
 
 I don't think losing HA in the time it takes some heat-engines to stop, 
 install new software and restart the heat-engines is a big deal (IMHO).
 
 -Angus

We will also lose guarantee that this RPC call will be completed in any given 
time. It can bounce from incompatible node to incompatible node until there are 
no incompatible nodes. Especially if there are no other tasks on queue and when 
service returns it to queue and takes call right afterwards, there is good 
chance that it will take this particular one, and we'll get loop out there.

 
 
  2. db sync
  3. rolling upgrade of heat-api
 
  -Angus
 
 
  Regards,
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [oslo][heat] Versioned objects backporting

2015-04-30 Thread Jastrzebski, Michal
Hello,

After discussions, we've spotted possible gap in versioned objects: backporting 
of too-new versions in RPC.
Nova does that by conductor, but not every service has something like that. I 
want to propose another approach:

1. Milestone pinning - we need to make single reference to versions of various 
objects - for example heat in version 15.1 will mean stack in version 1.1 and 
resource in version 1.5.
2. Compatibility mode - this will add flag to service --compatibility=15.1, 
that will mean that every outgoing RPC communication will be backported before 
sending to object versions bound to this milestone.

With this 2 things landed we'll achieve rolling upgrade like that:
1. We have N nodes in version V
2. We take down 1 node and upgrade code to version V+1
3. Run code in ver V+1 with --compatibility=V
4. Repeat 2 and 3 until every node will have version V+1
5. Restart each service without compatibility flag

This approach has one big disadvantage - 2 restarts required, but should solve 
problem of backporting of too-new versions.
Any ideas? Alternatives?

Regards,
Michał

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Rework auto-scaling support in Heat

2014-11-28 Thread Jastrzebski, Michal


 -Original Message-
 From: Qiming Teng [mailto:teng...@linux.vnet.ibm.com]
 Sent: Friday, November 28, 2014 8:33 AM
 To: openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [Heat] Rework auto-scaling support in Heat
 
 Dear all,
 
 Auto-Scaling is an important feature supported by Heat and needed by many
 users we talked to.  There are two flavors of AutoScalingGroup resources in
 Heat today: the AWS-based one and the Heat native one.  As more requests
 coming in, the team has proposed to separate auto-scaling support into a
 separate service so that people who are interested in it can jump onto it.  At
 the same time, Heat engine (especially the resource type code) will be
 drastically simplified.  The separated AS service could move forward more
 rapidly and efficiently.
 
 This work was proposed a while ago with the following wiki and blueprints
 (mostly approved during Havana cycle), but the progress is slow.  A group of
 developers now volunteer to take over this work and move it forward.
 
 wiki: https://wiki.openstack.org/wiki/Heat/AutoScaling
 BPs:
  - https://blueprints.launchpad.net/heat/+spec/as-lib-db
  - https://blueprints.launchpad.net/heat/+spec/as-lib
  - https://blueprints.launchpad.net/heat/+spec/as-engine-db
  - https://blueprints.launchpad.net/heat/+spec/as-engine
  - https://blueprints.launchpad.net/heat/+spec/autoscaling-api
  - https://blueprints.launchpad.net/heat/+spec/autoscaling-api-client
  - https://blueprints.launchpad.net/heat/+spec/as-api-group-resource
  - https://blueprints.launchpad.net/heat/+spec/as-api-policy-resource
  - https://blueprints.launchpad.net/heat/+spec/as-api-webhook-trigger-
 resource
  - https://blueprints.launchpad.net/heat/+spec/autoscaling-api-resources
 
 Once this whole thing lands, Heat engine will talk to the AS engine in terms 
 of
 ResourceGroup, ScalingPolicy, Webhooks.  Heat engine won't care how auto-
 scaling is implemented although the AS engine may in turn ask Heat to
 create/update stacks for scaling's purpose.  In theory, AS engine can
 create/destroy resources by directly invoking other OpenStack services.  This
 new AutoScaling service may eventually have its own DB, engine, API, api-
 client.  We can definitely aim high while work hard on real code.
 
 After reviewing the BPs/Wiki and some communication, we get two options
 to push forward this.  I'm writing this to solicit ideas and comments from the
 community.
 
 Option A: Top-Down Quick Split
 --

Do you want to drop support of AS from heat altogether? Many people would 
disagree with drop of AS (even drop of HARestarter is problem). We don't really 
want to support duplicate systems, so having 2 engines of autoscalling would be 
wrong.
That being said, I can see big gap which heat (or services around) could fill - 
intelligent orchiestration. By that I mean autohealing, auto-redeploying, 
autoscalling and pretty much auto-whatever. Clouds are fluid, we could provide 
framework for that. Heat would be great tool for that because it has context of 
whole stack, and in fact all we do would be stack update.

 This means we will follow a roadmap shown below, which is not 100%
 accurate yet and very rough:
 
   1) Get the separated REST service in place and working
   2) Switch Heat resources to use the new REST service
 
 Pros:
   - Separate code base means faster review/commit cycle
   - Less code churn in Heat
 Cons:
   - A new service need to be installed/configured/launched
   - Need commitments from dedicated, experienced developers from very
 beginning
 
 Option B: Bottom-Up Slow Growth
 ---

Personally I'd be advocate of fixing what we have instead of making new thing. 
Maybe we should make it separate process (as long as we'll try to keep 
consistent api)? Maybe add place for new logic (autohealing?), but still keep 
that inside heat.
One thing - we'll need to make concurrent updates really robust when we want to 
make whole thing automatic (I'm talking about convergence).

 The roadmap is more conservative, with many (yes, many) incremental
 patches to migrate things carefully.
 
   1) Separate some of the autoscaling logic into libraries in Heat
   2) Augment heat-engine with new AS RPCs
   3) Switch AS related resource types to use the new RPCs
   4) Add new REST service that also talks to the same RPC
  (create new GIT repo, API endpoint and client lib...)
 
 Pros:
   - Less risk breaking user lands with each revision well tested
   - More smooth transition for users in terms of upgrades
 
 Cons:
   - A lot of churn within Heat code base, which means long review cycles
   - Still need commitments from cores to supervise the whole process
 
 There could be option C, D... but the two above are what we came up with
 during the discussion.
 
 Another important thing we talked about is about the open discussion on
 this.  OpenStack Wiki seems a good place to document settled designs but
 not for 

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

2014-11-27 Thread Jastrzebski, Michal
W dniu 11/27/2014 o 5:15 AM, Angus Salkeld pisze: On Thu, Nov 27, 2014 
at 12:20 PM, Zane Bitter zbit...@redhat.com 
 mailto:zbit...@redhat.com wrote:
 
 A bunch of us have spent the last few weeks working independently on
 proof of concept designs for the convergence architecture. I think
 those efforts have now reached a sufficient level of maturity that
 we should start working together on synthesising them into a plan
 that everyone can forge ahead with. As a starting point I'm going to
 summarise my take on the three efforts; hopefully the authors of the
 other two will weigh in to give us their perspective.
 
 
 Zane's Proposal
 ===
 
 
 https://github.com/zaneb/heat-__convergence-prototype/tree/__distributed-graph
 
 https://github.com/zaneb/heat-convergence-prototype/tree/distributed-graph
 
 I implemented this as a simulator of the algorithm rather than using
 the Heat codebase itself in order to be able to iterate rapidly on
 the design, and indeed I have changed my mind many, many times in
 the process of implementing it. Its notable departure from a
 realistic simulation is that it runs only one operation at a time -
 essentially giving up the ability to detect race conditions in
 exchange for a completely deterministic test framework. You just
 have to imagine where the locks need to be. Incidentally, the test
 framework is designed so that it can easily be ported to the actual
 Heat code base as functional tests so that the same scenarios could
 be used without modification, allowing us to have confidence that
 the eventual implementation is a faithful replication of the
 simulation (which can be rapidly experimented on, adjusted and
 tested when we inevitably run into implementation issues).
 
 This is a complete implementation of Phase 1 (i.e. using existing
 resource plugins), including update-during-update, resource
 clean-up, replace on update and rollback; with tests.
 
 Some of the design goals which were successfully incorporated:
 - Minimise changes to Heat (it's essentially a distributed version
 of the existing algorithm), and in particular to the database
 - Work with the existing plugin API
 - Limit total DB access for Resource/Stack to O(n) in the number of
 resources
 - Limit overall DB access to O(m) in the number of edges
 - Limit lock contention to only those operations actually contending
 (i.e. no global locks)
 - Each worker task deals with only one resource
 - Only read resource attributes once
 
 Open questions:
 - What do we do when we encounter a resource that is in progress
 from a previous update while doing a subsequent update? Obviously we
 don't want to interrupt it, as it will likely be left in an unknown
 state. Making a replacement is one obvious answer, but in many cases
 there could be serious down-sides to that. How long should we wait
 before trying it? What if it's still in progress because the engine
 processing the resource already died?
 
 
 Also, how do we implement resource level timeouts in general?
 
 
 Michał's Proposal
 =
 
 https://github.com/inc0/heat-__convergence-prototype/tree/__iterative 
 https://github.com/inc0/heat-convergence-prototype/tree/iterative
 
 Note that a version modified by me to use the same test scenario
 format (but not the same scenarios) is here:
 
 
 https://github.com/zaneb/heat-__convergence-prototype/tree/__iterative-adapted
 
 https://github.com/zaneb/heat-convergence-prototype/tree/iterative-adapted
 
 This is based on my simulation framework after a fashion, but with
 everything implemented synchronously and a lot of handwaving about
 how the actual implementation could be distributed. The central
 premise is that at each step of the algorithm, the entire graph is
 examined for tasks that can be performed next, and those are then
 started. Once all are complete (it's synchronous, remember), the
 next step is run. Keen observers will be asking how we know when it
 is time to run the next step in a distributed version of this
 algorithm, where it will be run and what to do about resources that
 are in an intermediate state at that time. All of these questions
 remain unanswered.

Q1. When is time to run next step?
There is no really next step or previous step, there is step which can be run 
*right now*, which effectively means when current step finishes, because then 
and only then all requirements are met and we can proceed.

Q2. I can see 3 main processes there:

* Converger (I'd assume thats current engine):
Process which parses graph and schedules next steps, it will be run after 
change in reality is detected.

* Observer:
Process which keeps reality_db and actual resources aligned. Its mostly for 
phase2. This one 

Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

2014-11-13 Thread Jastrzebski, Michal
Guys, I don't think we want to get into this cluster management mud. You say 
let's
make observer...and what if observer dies? Do we do observer to observer? And 
then
there is split brain. I'm observer, I've lost connection to worker. Should I 
restart a worker?
Maybe I'm one who lost connection to the rest of the world? Should I resume 
task and risk
duplicate workload?

And then there is another problem. If there is timeout caused by limit of 
resources of workers,
if  we restart whole workload after timeout, we will stretch these resources 
even further, and in turn
we'll get more timeouts (...) - great way to kill whole setup.

So we get to horizontal scalability. Or total lack of it. Any stack that is too 
complicated for single engine
to process will be impossible to process at all. We should find a way to 
distribute workloads in
active-active, stateless (as much as possible) manner.

Regards,
Michał inc0 Jastrzębski   

 -Original Message-
 From: Murugan, Visnusaran [mailto:visnusaran.muru...@hp.com]
 Sent: Thursday, November 13, 2014 2:59 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops
 
 Zane,
 
 We do follow shardy's suggestion of having worker/observer as eventlet in
 heat-engine. No new process. The timer will be executed under an engine's
 worker.
 
 Question:
 1. heat-engine processing resource-action failed (process killed) 2. heat-
 engine processing timeout for a stack fails (process killed)
 
 In the above mentioned cases, I thought celery tasks would come to our
 rescue.
 
 Convergence-poc implementation can recover from error and retry if there is
 a notification available.
 
 
 -Vishnu
 
 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Thursday, November 13, 2014 7:05 PM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops
 
 On 13/11/14 06:52, Angus Salkeld wrote:
  On Thu, Nov 13, 2014 at 6:29 PM, Murugan, Visnusaran
  visnusaran.muru...@hp.com mailto:visnusaran.muru...@hp.com
 wrote:
 
  Hi all,
 
  __ __
 
  Convergence-POC distributes stack operations by sending resource
  actions over RPC for any heat-engine to execute. Entire stack
  lifecycle will be controlled by worker/observer notifications. This
  distributed model has its own advantages and disadvantages.
 
  __ __
 
  Any stack operation has a timeout and a single engine will be
  responsible for it. If that engine goes down, timeout is lost along
  with it. So a traditional way is for other engines to recreate
  timeout from scratch. Also a missed resource action notification
  will be detected only when stack operation timeout happens. __ __
 
  __ __
 
  To overcome this, we will need the following capability:
 
  __1.__Resource timeout (can be used for retry)
 
  We will shortly have a worker job, can't we have a job that just
  sleeps that gets started in parallel with the job that is doing the work?
  It gets to the end of the sleep and runs a check.
 
 What if that worker dies too? There's no guarantee that it'd even be a
 different worker. In fact, there's not even a guarantee that we'd have
 multiple workers.
 
 BTW Steve Hardy's suggestion, which I have more or less come around to, is
 that the engines themselves should be the workers in convergence, to save
 operators deploying two types of processes. (The observers will still be a
 separate process though, in phase 2.)
 
  
 
  __2.__Recover from engine failure (loss of stack timeout, resource
  action notification)
 
  __
 
 
  My suggestion above could catch failures as long as it was run in a
  different process.
 
  -Angus
 
  __
 
  __ __
 
  Suggestion:
 
  __1.__Use task queue like celery to host timeouts for both stack and
  resource.
 
  __2.__Poll database for engine failures and restart timers/
  retrigger resource retry (IMHO: This would be a traditional and
  weighs heavy)
 
  __3.__Migrate heat to use TaskFlow. (Too many code change)
 
  __ __
 
  I am not suggesting we use Task Flow. Using celery will have very
  minimum code change. (decorate appropriate functions) 
 
  __ __
 
  __ __
 
  Your thoughts.
 
  __ __
 
  -Vishnu
 
  IRC: ckmvishnu
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  mailto:OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 ___
 OpenStack-dev mailing list
 

Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

2014-11-13 Thread Jastrzebski, Michal
By observer I mean process which will actually notify about stack timeout. 
Maybe it was poor  choice of words. Anyway, something will need to check what 
stacks are timed out, and that's new single point of failure.

 -Original Message-
 From: Zane Bitter [mailto:zbit...@redhat.com]
 Sent: Thursday, November 13, 2014 3:49 PM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops
 
 On 13/11/14 09:31, Jastrzebski, Michal wrote:
  Guys, I don't think we want to get into this cluster management mud.
  You say let's make observer...and what if observer dies? Do we do
  observer to observer? And then there is split brain. I'm observer, I've lost
 connection to worker. Should I restart a worker?
  Maybe I'm one who lost connection to the rest of the world? Should I
  resume task and risk duplicate workload?
 
 I think you're misinterpreting what we mean by observer. See
 https://wiki.openstack.org/wiki/Heat/ConvergenceDesign
 
 - ZB
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

2014-11-13 Thread Jastrzebski, Michal


 -Original Message-
 From: Clint Byrum [mailto:cl...@fewbar.com]
 Sent: Thursday, November 13, 2014 8:00 PM
 To: openstack-dev
 Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops
 
 Excerpts from Zane Bitter's message of 2014-11-13 09:55:43 -0800:
  On 13/11/14 09:58, Clint Byrum wrote:
   Excerpts from Zane Bitter's message of 2014-11-13 05:54:03 -0800:
   On 13/11/14 03:29, Murugan, Visnusaran wrote:
   Hi all,
  
   Convergence-POC distributes stack operations by sending resource
   actions over RPC for any heat-engine to execute. Entire stack
   lifecycle will be controlled by worker/observer notifications.
   This distributed model has its own advantages and disadvantages.
  
   Any stack operation has a timeout and a single engine will be
   responsible for it. If that engine goes down, timeout is lost
   along with it. So a traditional way is for other engines to
   recreate timeout from scratch. Also a missed resource action
   notification will be detected only when stack operation timeout
 happens.
  
   To overcome this, we will need the following capability:
  
   1.Resource timeout (can be used for retry)
  
   I don't believe this is strictly needed for phase 1 (essentially we
   don't have it now, so nothing gets worse).
  
  
   We do have a stack timeout, and it stands to reason that we won't
   have a single box with a timeout greenthread after this, so a
   strategy is needed.
 
  Right, that was 2, but I was talking specifically about the resource
  retry. I think we agree on both points.
 
   For phase 2, yes, we'll want it. One thing we haven't discussed
   much is that if we used Zaqar for this then the observer could
   claim a message but not acknowledge it until it had processed it,
   so we could have guaranteed delivery.
  
  
   Frankly, if oslo.messaging doesn't support reliable delivery then we
   need to add it.
 
  That is straight-up impossible with AMQP. Either you ack the message
  and risk losing it if the worker dies before processing is complete,
  or you don't ack the message until it's processed and you become a
  blocker for every other worker trying to pull jobs off the queue. It
  works fine when you have only one worker; otherwise not so much. This
  is the crux of the whole why isn't Zaqar just Rabbit debate.
 
 
 I'm not sure we have the same understanding of AMQP, so hopefully we can
 clarify here. This stackoverflow answer echoes my understanding:
 
 http://stackoverflow.com/questions/17841843/rabbitmq-does-one-
 consumer-block-the-other-consumers-of-the-same-queue
 
 Not ack'ing just means they might get retransmitted if we never ack. It
 doesn't block other consumers. And as the link above quotes from the
 AMQP spec, when there are multiple consumers, FIFO is not guaranteed.
 Other consumers get other messages.
 
 So just add the ability for a consumer to read, work, ack to oslo.messaging,
 and this is mostly handled via AMQP. Of course that also likely means no
 zeromq for Heat without accepting that messages may be lost if workers die.
 
 Basically we need to add something that is not RPC but instead jobqueue
 that mimics this:
 
 http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messagin
 g/rpc/dispatcher.py#n131
 
 I've always been suspicious of this bit of code, as it basically means that if
 anything fails between that call, and the one below it, we have lost contact,
 but as long as clients are written to re-send when there is a lack of reply,
 there shouldn't be a problem. But, for a job queue, there is no reply, and so
 the worker would dispatch, and then acknowledge after the dispatched call
 had returned (including having completed the step where new messages are
 added to the queue for any newly-possible children).
 
 Just to be clear, I believe what Zaqar adds is the ability to peek at a 
 specific
 message ID and not affect it in the queue, which is entirely different than
 ACK'ing the ones you've already received in your session.
 
  Most stuff in OpenStack gets around this by doing synchronous calls
  across oslo.messaging, where there is an end-to-end ack. We don't want
  that here though. We'll probably have to make do with having ways to
  recover after a failure (kick off another update with the same data is
  always an option). The hard part is that if something dies we don't
  really want to wait until the stack timeout to start recovering.
 
 
 I fully agree. Josh's point about using a coordination service like Zookeeper 
 to
 maintain liveness is an interesting one here. If we just make sure that all 
 the
 workers that have claimed work off the queue are alive, that should be
 sufficient to prevent a hanging stack situation like you describe above.
 
   Zaqar should have nothing to do with this and is, IMO, a poor choice
   at this stage, though I like the idea of using it in the future so
   that we can make Heat more of an outside-the-cloud app.
 
  I'm inclined to agree that it would 

Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

2014-11-13 Thread Jastrzebski, Michal


 -Original Message-
 From: Joshua Harlow [mailto:harlo...@outlook.com]
 Sent: Thursday, November 13, 2014 10:50 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops
 
 On Nov 13, 2014, at 10:59 AM, Clint Byrum cl...@fewbar.com wrote:
 
  Excerpts from Zane Bitter's message of 2014-11-13 09:55:43 -0800:
  On 13/11/14 09:58, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-11-13 05:54:03 -0800:
  On 13/11/14 03:29, Murugan, Visnusaran wrote:
  Hi all,
 
  Convergence-POC distributes stack operations by sending resource
  actions over RPC for any heat-engine to execute. Entire stack
  lifecycle will be controlled by worker/observer notifications.
  This distributed model has its own advantages and disadvantages.
 
  Any stack operation has a timeout and a single engine will be
  responsible for it. If that engine goes down, timeout is lost
  along with it. So a traditional way is for other engines to
  recreate timeout from scratch. Also a missed resource action
  notification will be detected only when stack operation timeout
 happens.
 
  To overcome this, we will need the following capability:
 
  1.Resource timeout (can be used for retry)
 
  I don't believe this is strictly needed for phase 1 (essentially we
  don't have it now, so nothing gets worse).
 
 
  We do have a stack timeout, and it stands to reason that we won't
  have a single box with a timeout greenthread after this, so a
  strategy is needed.
 
  Right, that was 2, but I was talking specifically about the resource
  retry. I think we agree on both points.
 
  For phase 2, yes, we'll want it. One thing we haven't discussed
  much is that if we used Zaqar for this then the observer could
  claim a message but not acknowledge it until it had processed it,
  so we could have guaranteed delivery.
 
 
  Frankly, if oslo.messaging doesn't support reliable delivery then we
  need to add it.
 
  That is straight-up impossible with AMQP. Either you ack the message
  and risk losing it if the worker dies before processing is complete,
  or you don't ack the message until it's processed and you become a
  blocker for every other worker trying to pull jobs off the queue. It
  works fine when you have only one worker; otherwise not so much. This
  is the crux of the whole why isn't Zaqar just Rabbit debate.
 
 
  I'm not sure we have the same understanding of AMQP, so hopefully we
  can clarify here. This stackoverflow answer echoes my understanding:
 
  http://stackoverflow.com/questions/17841843/rabbitmq-does-one-
 consumer
  -block-the-other-consumers-of-the-same-queue
 
  Not ack'ing just means they might get retransmitted if we never ack.
  It doesn't block other consumers. And as the link above quotes from
  the AMQP spec, when there are multiple consumers, FIFO is not
 guaranteed.
  Other consumers get other messages.
 
  So just add the ability for a consumer to read, work, ack to
  oslo.messaging, and this is mostly handled via AMQP. Of course that
  also likely means no zeromq for Heat without accepting that messages
  may be lost if workers die.
 
  Basically we need to add something that is not RPC but instead
  jobqueue that mimics this:
 
  http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messa
  ging/rpc/dispatcher.py#n131
 
  I've always been suspicious of this bit of code, as it basically means
  that if anything fails between that call, and the one below it, we
  have lost contact, but as long as clients are written to re-send when
  there is a lack of reply, there shouldn't be a problem. But, for a job
  queue, there is no reply, and so the worker would dispatch, and then
  acknowledge after the dispatched call had returned (including having
  completed the step where new messages are added to the queue for any
  newly-possible children).
 
  Just to be clear, I believe what Zaqar adds is the ability to peek at
  a specific message ID and not affect it in the queue, which is
  entirely different than ACK'ing the ones you've already received in your
 session.
 
  Most stuff in OpenStack gets around this by doing synchronous calls
  across oslo.messaging, where there is an end-to-end ack. We don't
  want that here though. We'll probably have to make do with having
  ways to recover after a failure (kick off another update with the
  same data is always an option). The hard part is that if something
  dies we don't really want to wait until the stack timeout to start
 recovering.
 
 
  I fully agree. Josh's point about using a coordination service like
  Zookeeper to maintain liveness is an interesting one here. If we just
  make sure that all the workers that have claimed work off the queue
  are alive, that should be sufficient to prevent a hanging stack
  situation like you describe above.
 
  Zaqar should have nothing to do with this and is, IMO, a poor choice
  at this stage, though I like the idea of 

Re: [openstack-dev] [Nova] Automatic evacuate

2014-10-17 Thread Jastrzebski, Michal


 -Original Message-
 From: Florian Haas [mailto:flor...@hastexo.com]
 Sent: Thursday, October 16, 2014 10:53 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] Automatic evacuate
 
 On Thu, Oct 16, 2014 at 9:25 AM, Jastrzebski, Michal
 michal.jastrzeb...@intel.com wrote:
  In my opinion flavor defining is a bit hacky. Sure, it will provide us
  functionality fairly quickly, but also will strip us from flexibility
  Heat would give. Healing can be done in several ways, simple destroy
  - create (basic convergence workflow so far), evacuate with or
  without shared storage, even rebuild vm, probably few more when we put
  more thoughts to it.
 
 But then you'd also need to monitor the availability of *individual* guest and
 down you go the rabbit hole.
 
 So suppose you're monitoring a guest with a simple ping. And it stops
 responding to that ping.

I was more reffering to monitoring host (not guest), and for sure not by ping.
I was thinking of current zookeeper service group implementation, we might want
to use corosync and write servicegroup plugin for that. There are several 
choices
for that, each requires testing really before we make any decission.

There is also fencing case, which we agree is important, and I think nova should
be able to do that (since it does evacuate, it also should do a fencing). But
for working fencing we really need working host health monitoring, so I suggest
we take baby steps here and solve one issue at the time. And that would be host
monitoring.

 (1) Has it died?
 (2) Is it just too busy to respond to the ping?
 (3) Has its guest network stack died?
 (4) Has its host vif died?
 (5) Has the L2 agent on the compute host died?
 (6) Has its host network stack died?
 (7) Has the compute host died?
 
 Suppose further it's using shared storage (running off an RBD volume or
 using an iSCSI volume, or whatever). Now you have almost as many recovery
 options as possible causes for the failure, and some of those recovery
 options will potentially destroy your guest's data.
 
 No matter how you twist and turn the problem, you need strongly consistent
 distributed VM state plus fencing. In other words, you need a full blown HA
 stack.
 
  I'd rather use nova for low level task and maybe low level monitoring
  (imho nova should do that using servicegroup). But I'd use something
  more more configurable for actual task triggering like heat. That
  would give us framework rather than mechanism. Later we might want to
  apply HA on network or volume, then we'll have mechanism ready just
  monitoring hook and healing will need to be implemented.
 
  We can use scheduler hints to place resource on host HA-compatible
  (whichever health action we'd like to use), this will bit more
  complicated, but also will give us more flexibility.
 
 I apologize in advance for my bluntness, but this all sounds to me like you're
 vastly underrating the problem of reliable guest state detection and
 recovery. :)

Guest health in my opinion is just a bit out of scope here. If we'll have robust
way of detecting host health, we can pretty much asume that if host dies, 
guests follow.
There are ways to detect guest health (libvirt watchdog, ceilometer, ping you 
mentioned),
but that should be done somewhere else. And for sure not by evacuation.

 
  I agree that we all should meet in Paris and discuss that so we can
  join our forces. This is one of bigger gaps to be filled imho.
 
 Pretty much every user I've worked with in the last 2 years agrees.
 Granted, my view may be skewed as HA is typically what customers approach
 us for in the first place, but yes, this definitely needs a globally 
 understood
 and supported solution.
 
 Cheers,
 Florian
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Automatic evacuate

2014-10-16 Thread Jastrzebski, Michal


 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: Thursday, October 16, 2014 5:04 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Automatic evacuate
 
 On 10/15/2014 05:07 PM, Florian Haas wrote:
  On Wed, Oct 15, 2014 at 10:03 PM, Russell Bryant rbry...@redhat.com
 wrote:
  Am I making sense?
 
  Yep, the downside is just that you need to provide a new set of
  flavors for ha vs non-ha.  A benefit though is that it's a way to
  support it today without *any* changes to OpenStack.
 
  Users are already very used to defining new flavors. Nova itself
  wouldn't even need to define those; if the vendor's deployment tools
  defined them it would be just fine.
 
 Yes, I know Nova wouldn't need to define it.  I was saying I didn't like that 
 it
 was required at all.
 
  This seems like the kind of thing we should also figure out how to
  offer on a per-guest basis without needing a new set of flavors.
  That's why I also listed the server tagging functionality as another 
  possible
 solution.
 
  This still doesn't do away with the requirement to reliably detect
  node failure, and to fence misbehaving nodes. Detecting that a node
  has failed, and fencing it if unsure, is a prerequisite for any
  recovery action. So you need Corosync/Pacemaker anyway.
 
 Obviously, yes.  My post covered all of that directly ... the tagging bit was 
 just
 additional input into the recovery operation.
 
  Note also that when using an approach where you have physically
  clustered nodes, but you are also running non-HA VMs on those, then
  the user must understand that the following applies:
 
  (1) If your guest is marked HA, then it will automatically recover on
  node failure, but
  (2) if your guest is *not* marked HA, then it will go down with the
  node not only if it fails, but also if it is fenced.
 
  So a non-HA guest on an HA node group actually has a slightly
  *greater* chance of going down than a non-HA guest on a non-HA host.
  (And let's not get into don't use fencing then; we all know why
  that's a bad idea.)
 
  Which is why I think it makes sense to just distinguish between
  HA-capable and non-HA-capable hosts, and have the user decide whether
  they want HA or non-HA guests simply by assigning them to the
  appropriate host aggregates.
 
 Very good point.  I hadn't considered that.
 
 --
 Russell Bryant
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

In my opinion flavor defining is a bit hacky. Sure, it will provide us
functionality fairly quickly, but also will strip us from flexibility Heat
would give. Healing can be done in several ways, simple destroy - create
(basic convergence workflow so far), evacuate with or without
shared storage, even rebuild vm, probably few more when we put more thoughts
to it.

I'd rather use nova for low level task and maybe low level monitoring (imho
nova should do that using servicegroup). But I'd use something more more
configurable for actual task triggering like heat. That would give us
framework rather than mechanism. Later we might want to apply HA on network or
volume, then we'll have mechanism ready just monitoring hook and healing
will need to be implemented.
We can use scheduler hints to place resource on host HA-compatible 
(whichever health action we'd like to use), this will bit more complicated, but
also will give us more flexibility.

I agree that we all should meet in Paris and discuss that so we can join our
forces. This is one of bigger gaps to be filled imho.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Automatic evacuate

2014-10-15 Thread Jastrzebski, Michal
I tend to agree that that shouldn't be placed in nova. As it happens I'm 
working on very same thing (hello Russell :)). My current candidate is heat. 
Convergence will be in my opinion great place to do it 
(https://review.openstack.org/#/c/95907/). It's still in state of planning, but 
we'll talk about that more in Paris. I even have working demo of automatic 
evacuation :) (come to intel booth in Paris if you'd like to see it).

Thing is, nova currently isn't ready for that. For example:  
https://bugs.launchpad.net/nova/+bug/1379292
We are working on bp to enable nova to check actual host health, not only nova 
services health (bp coming soon, but in short its enabling zookeeper 
servicegroup api to monitor for example libvirt, or something else which, if 
down, means vms are dead).
That won't replace actual fencing, but that's something, and even if we would 
like to have fencing in nova, that's an requirement.

Maybe it's worth a design session? I've seen this or similar idea in several 
places already, and demand is strong for that.

Regards,
Michał

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: Tuesday, October 14, 2014 8:55 PM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Automatic evacuate
 
 On 10/14/2014 01:01 PM, Jay Pipes wrote:
  2) Looking forward, there is a lot of demand for doing this on a per
  instance basis.  We should decide on a best practice for allowing end
  users to indicate whether they would like their VMs automatically
  rescued by the infrastructure, or just left down in the case of a
  failure.  It could be as simple as a special tag set on an instance [2].
 
  Please note that server instance tagging (thanks for the shout-out,
  BTW) is intended for only user-defined tags, not system-defined
  metadata which is what this sounds like...
 
 I was envisioning the tag being set by the end user to say please keep my
 VM running until I say otherwise, or something like auto-recover
 for short.
 
 So, it's specified by the end user, but potentially acted upon by the system
 (as you say below).
 
  Of course, one might implement some external polling/monitoring system
  using server instance tags, which might do a nova list --tag $TAG
  --host $FAILING_HOST, and initiate a migrate for each returned server
 instance...
 
 Yeah, that's what I was thinking.  Whatever system you use to react to a
 failing host could use the tag as part of the criteria to figure out which
 instances to evacuate and which to leave as dead.
 
 --
 Russell Bryant
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][nova] VM restarting on host, failure in convergence

2014-09-19 Thread Jastrzebski, Michal
   All,
  
   Currently OpenStack does not have a built-in HA mechanism for tenant
   instances which could restore virtual machines in case of a host
   failure. Openstack assumes every app is designed for failure and can
   handle instance failure and will self-remediate, but that is rarely
   the case for the very large Enterprise application ecosystem.
   Many existing enterprise applications are stateful, and assume that
   the physical infrastructure is always on.
  
 
  There is a fundamental debate that OpenStack's vendors need to work out
  here. Existing applications are well served by existing virtualization
  platforms. Turning OpenStack into a work-alike to oVirt is not the end
  goal here. It's a happy accident that traditional apps can sometimes be
  bent onto the cloud without much modification.
 
  The thing that clouds do is they give development teams a _limited_
  infrastructure that lets IT do what they're good at (keep the
  infrastructure up) and lets development teams do what they're good at 
(run
  their app). By putting HA into the _app_, and not the _infrastructure_,
  the dev teams get agility and scalability. No more waiting weeks for
  allocationg specialized servers with hardware fencing setups and fibre
  channel controllers to house a shared disk system so the super reliable
  virtualization can hide HA from the user.
 
  Spin up vms. Spin up volumes.  Run some replication between regions,
  and be resilient.

I don't argue that's the way to go. But reality is somewhat different.
In world of early design fail, low budget and deadlines some good
practices might be omitted early and might be hard to implement later.

Cloud from technical point of view can help to increase such apps, and
I think openstack should approach that part of market as well.

  So, as long as it is understood that whatever is being proposed should
  be an application centric feature, and not an infrastructure centric
  feature, this argument remains interesting in the cloud context.
  Otherwise, it is just an invitation for OpenStack to open up direct
  competition with behemoths like vCenter.
 
   Even the OpenStack controller services themselves do not gracefully
   handle failure.
  
 
  Which ones?

Heat has issues, horizon has issues, neutron l3 only works in 
active-passive setup.

   When these applications were virtualized, they were virtualized on
   platforms that enabled very high SLAs for each virtual machine,
   allowing the application to not be rewritten as the IT team moved them
   from physical to virtual. Now while these apps cannot benefit from
   methods like automatic scaleout, the application owners will greatly
   benefit from the self-service capabilities they will recieve as they
   utilize the OpenStack control plane.
  
 
  These apps were virtualized for IT's benefit. But the application authors
  and users are now stuck in high-cost virtualization. The cloud is best
  utilized when IT can control that cost and shift the burden of uptime
  to the users by offering them more overall capacity and flexibility with
  the caveat that the individual resources will not be as reliable.
 
  So what I'm most interested in is helping authors change their apps to
  be reslient on their own, not in putting more burden on IT.

This can be very costly, therefore not always possible.

   I'd like to suggest to expand heat convergence mechanism to enable
   self-remediation of virtual machines and other heat resources.
  
 
  Convergence is still nascent. I don't know if I'd pile on to what might
  take another 12 - 18 months to get done anyway. We're just now figuring
  out how to get started where we thought we might already be 1/3 of the
  way through. Just something to consider.

We don't need to complete convergence to start working with that. 
However this might take, sooner we start, sooner we deliver.


Thans,
Michał

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][nova] VM restarting on host failure in convergence

2014-09-19 Thread Jastrzebski, Michal
   In short, what we'll need from nova is to have 100% reliable
   host-health monitor and equally reliable rebuild/evacuate mechanism
   with fencing and scheduler. In heat we need scallable and reliable
   event listener and engine to decide which action to perform in given
   situation.
 
  Unfortunately, I don't think Nova can provide this alone.  Nova only
  knows about whether or not the nova-compute daemon is current
  communicating with the rest of the system.  Even if the nova-compute
  daemon drops out, the compute node may still be running all instances
  just fine.  We certainly don't want to impact those running workloads
  unless absolutely necessary.

But, on the other hand if host is really down, nova might want to know
that, if only to change insances status to ERROR or whatever. I don't
think situation when instance is down due to host failure, and nova
doesn't know that is good for anyone.

  I understand that you're suggesting that we enhance Nova to be able to
  provide that level of knowledge and control.  I actually don't think
  Nova should have this knowledge of its underlying infrastructure.
 
  I would put the host monitoring infrastructure (to determine if a host
  is down) and fencing capability as out of scope for Nova and as a part
  of the supporting infrastructure.  Assuming those pieces can properly
  detect that a host is down and fence it, then all that's needed from
  Nova is the evacuate capability, which is already there.  There may be
  some enhancements that could be done to it, but surely it's quite close.

Why do you think nova shouldn't have information about underlying infra?
Since service group is pluggin based, we could develop new plugin for
enhancing nova's information reliability whthout any impact on current
code. I'm a bit concerned about dependency injection we'd have to make.
I'd love to be in situation, where people would have some level (maybe
not best they can get) of SLA in heat out of the box, without bigger
investment in infrastructure configuration.

  There's also the part where a notification needs to go out saying that
  the instance has failed.  Some thing (which could be Heat in the case of
  this proposal) can react to that, either directly or via ceilometer, for
  example.  There is an API today to hard reset the state of an instance
  to ERROR.  After a host is fenced, you could use this API to mark all
  instances on that host as dead.  I'm not sure if there's an easy way to
  do that for all instances on a host today.  That's likely an enhancement
  we could make to python-novaclient, similar to the evacuate all
  instances on a host enhancement that was done in novaclient.

Why nova itself wouldn't do that? I mean, nova should know real status
of its instances at all times in my opinion.

Thanks,
Michał
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [heat][nova] VM restarting on host failure in convergence

2014-09-17 Thread Jastrzebski, Michal
All,

Currently OpenStack does not have a built-in HA mechanism for tenant
instances which could restore virtual machines in case of a host
failure. Openstack assumes every app is designed for failure and can
handle instance failure and will self-remediate, but that is rarely
the case for the very large Enterprise application ecosystem.
Many existing enterprise applications are stateful, and assume that
the physical infrastructure is always on.

Even the OpenStack controller services themselves do not gracefully
handle failure.

When these applications were virtualized, they were virtualized on
platforms that enabled very high SLAs for each virtual machine,
allowing the application to not be rewritten as the IT team moved them
from physical to virtual. Now while these apps cannot benefit from
methods like automatic scaleout, the application owners will greatly
benefit from the self-service capabilities they will recieve as they
utilize the OpenStack control plane.

I'd like to suggest to expand heat convergence mechanism to enable
self-remediation of virtual machines and other heat resources.

convergence specs: https://review.openstack.org/#/c/95907/

Basic flow would look like this:

1. Nova detects host failure and posts notification
Nova service_group API implements host health monitor. We will
use it as notification source when host goes down. Afaik there are
some issues with that, and we might need to fix them. We need
host-health notification source with low latency and good
reliability (when we get host-down notification, we will be 100%
sure that its actually down).
2. Nova sends notifs about affected resources
Nova generates list of affected resources (VMs for example) and
notifies that they are down.
3. Convergence listens on resource-health notification
It schedules rebuild of affected resources, for example VMs on
given host.
4. We introduce different, configurable methods for resource rescue
Client might want to cover different resources with different
level of SLA. For example http edge server may be fault tolerant
and all we want is to simply recreate it on different node and add
to LBaaS pool to regain quorum, while DB server has to be
evacuated.
5. We call nova evacuate if server is configured to use it
By evacuate I mean nova evacuate --on-shared-storage, so
in fact we'll boot up same vm (from existing disk), keep addesses,
data and so on. This will allow pet-servers to minimize downtime
caused by host failure.
We might stumble upon fencing problem in this case. Nova already
has some form of safeguard implemented (it deletes evacuated
instances when host comes back up). We might want to add more
reliable form of fencing (storage locking?) to nova in the future.
6. Heat makes sure that all the configuration needed are applied
Volumes attached, processes run and so on.

In short, what we'll need from nova is to have 100% reliable
host-health monitor and equally reliable rebuild/evacuate mechanism
with fencing and scheduler. In heat we need scallable and reliable
event listener and engine to decide which action to perform in given
situation.

Regards,
Michał inc0 Jastrzębski
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev