Re: [openstack-dev] [Nova] Automatic evacuate

2014-10-22 Thread David Vossel


- Original Message -
 On 10/21/2014 07:53 PM, David Vossel wrote:
 
  - Original Message -
  -Original Message-
  From: Russell Bryant [mailto:rbry...@redhat.com]
  Sent: October 21, 2014 15:07
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Nova] Automatic evacuate
 
  On 10/21/2014 06:44 AM, Balázs Gibizer wrote:
  Hi,
 
  Sorry for the top posting but it was hard to fit my complete view
  inline.
 
  I'm also thinking about a possible solution for automatic server
  evacuation. I see two separate sub problems of this problem:
  1)compute node monitoring and fencing, 2)automatic server evacuation
 
  Compute node monitoring is currently implemented in servicegroup
  module of nova. As far as I understand pacemaker is the proposed
  solution in this thread to solve both monitoring and fencing but we
  tried and found out that pacemaker_remote on baremetal does not work
  together with fencing (yet), see [1]. So if we need fencing then
  either we have to go for normal pacemaker instead of pacemaker_remote
  but that solution doesn't scale or we configure and call stonith
  directly when pacemaker detect the compute node failure.
  I didn't get the same conclusion from the link you reference.  It says:
 
  That is not to say however that fencing of a baremetal node works any
  differently than that of a normal cluster-node. The Pacemaker policy
  engine
  understands how to fence baremetal remote-nodes. As long as a fencing
  device exists, the cluster is capable of ensuring baremetal nodes are
  fenced
  in the exact same way as normal cluster-nodes are fenced.
 
  So, it sounds like the core pacemaker cluster can fence the node to me.
I CC'd David Vossel, a pacemaker developer, to see if he can help
clarify.
  It seems there is a contradiction between chapter 1.5 and 7.2 in [1] as
  7.2
  states:
   There are some complications involved with understanding a bare-metal
  node's state that virtual nodes don't have. Once this logic is complete,
  pacemaker will be able to integrate bare-metal nodes in the same way
  virtual
  remote-nodes currently are. Some special considerations for fencing will
  need to be addressed. 
  Let's wait for David's statement on this.
  Hey, That's me!
 
  I can definitely clear all this up.
 
  First off, this document is out of sync with the current state upstream.
  We're
  already past Pacemaker v1.1.12 upstream. Section 7.2 of the document being
  referenced is still talking about future v1.1.11 features.
 
  I'll make it simple. If the document references anything that needs to be
  done
  in the future, it's already done.  Pacemaker remote is feature complete at
  this
  point. I've accomplished everything I originally set out to do. I see one
  change
  though. In 7.1 I talk about wanting pacemaker to be able to manage
  resources in
  containers. I mention something about libvirt sandbox. I scrapped whatever
  I was
  doing there. Pacemaker now has docker support.
  https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/docker
 
  I've known this document is out of date. It's on my giant list of things to
  do.
  Sorry for any confusion.
 
  As far as pacemaker remote and fencing goes, remote-nodes are fenced the
  exact
  same way as cluster-nodes. The only consideration that needs to be made is
  that
  the cluster-nodes (nodes running the full pacemaker+corosync stack) are the
  only
  nodes allowed to initiate fencing. All you have to do is make sure the
  fencing
  devices you want to use to fence remote-nodes are accessible to the
  cluster-nodes.
   From there you are good to go.
 
  Let me know if there's anything else I can clear up. Pacemaker remote was
  designed
  to be the solution for the exact scenario you all are discussing here.
  Compute nodes
  and pacemaker remote are made for one another :D
 
  If anyone is interested in prototyping pacemaker remote for this compute
  node use
  case, make sure to include me. I have done quite a bit research into how to
  maximize
  pacemaker's ability to scale horizontally. As part of that research I've
  made a few
  changes that are directly related to all of this that are not yet in an
  official
  pacemaker release.  Come to me for the latest rpms and you'll have a less
  painful
  experience setting all this up :)
 
  -- Vossel
 
 
 Hi Vossel,
 
 Could you send us a link to the source RPMs please, we have tested on
 CentOS7. It might need a recompile.

Yes, centos 7.0 isn't going to have the rpms you need to test this.

There are a couple of things you can do.

1. I put the rhel7 related rpms I test with in this repo.
http://davidvossel.com/repo/os/el7/

*disclaimer* I only maintain this repo for myself. I'm not committed to keeping
it active or up-to-date. It just happens to be updated right now for my own use.

That will give you test rpms for the pacemaker version I'm currently using plus
the latest libqb. If you're going to do any

Re: [openstack-dev] [Nova] Automatic evacuate

2014-10-21 Thread David Vossel


- Original Message -
  -Original Message-
  From: Russell Bryant [mailto:rbry...@redhat.com]
  Sent: October 21, 2014 15:07
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Nova] Automatic evacuate
  
  On 10/21/2014 06:44 AM, Balázs Gibizer wrote:
   Hi,
  
   Sorry for the top posting but it was hard to fit my complete view inline.
  
   I'm also thinking about a possible solution for automatic server
   evacuation. I see two separate sub problems of this problem:
   1)compute node monitoring and fencing, 2)automatic server evacuation
  
   Compute node monitoring is currently implemented in servicegroup
   module of nova. As far as I understand pacemaker is the proposed
   solution in this thread to solve both monitoring and fencing but we
   tried and found out that pacemaker_remote on baremetal does not work
   together with fencing (yet), see [1]. So if we need fencing then
   either we have to go for normal pacemaker instead of pacemaker_remote
   but that solution doesn't scale or we configure and call stonith
   directly when pacemaker detect the compute node failure.
  
  I didn't get the same conclusion from the link you reference.  It says:
  
  That is not to say however that fencing of a baremetal node works any
  differently than that of a normal cluster-node. The Pacemaker policy engine
  understands how to fence baremetal remote-nodes. As long as a fencing
  device exists, the cluster is capable of ensuring baremetal nodes are
  fenced
  in the exact same way as normal cluster-nodes are fenced.
  
  So, it sounds like the core pacemaker cluster can fence the node to me.
   I CC'd David Vossel, a pacemaker developer, to see if he can help clarify.
 
 It seems there is a contradiction between chapter 1.5 and 7.2 in [1] as 7.2
 states:
  There are some complications involved with understanding a bare-metal
 node's state that virtual nodes don't have. Once this logic is complete,
 pacemaker will be able to integrate bare-metal nodes in the same way virtual
 remote-nodes currently are. Some special considerations for fencing will
 need to be addressed. 
 Let's wait for David's statement on this.

Hey, That's me!

I can definitely clear all this up.

First off, this document is out of sync with the current state upstream. We're
already past Pacemaker v1.1.12 upstream. Section 7.2 of the document being
referenced is still talking about future v1.1.11 features.

I'll make it simple. If the document references anything that needs to be done
in the future, it's already done.  Pacemaker remote is feature complete at this
point. I've accomplished everything I originally set out to do. I see one change
though. In 7.1 I talk about wanting pacemaker to be able to manage resources in
containers. I mention something about libvirt sandbox. I scrapped whatever I was
doing there. Pacemaker now has docker support.
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/docker

I've known this document is out of date. It's on my giant list of things to do.
Sorry for any confusion.

As far as pacemaker remote and fencing goes, remote-nodes are fenced the exact
same way as cluster-nodes. The only consideration that needs to be made is that
the cluster-nodes (nodes running the full pacemaker+corosync stack) are the only
nodes allowed to initiate fencing. All you have to do is make sure the fencing
devices you want to use to fence remote-nodes are accessible to the 
cluster-nodes.
From there you are good to go.

Let me know if there's anything else I can clear up. Pacemaker remote was 
designed
to be the solution for the exact scenario you all are discussing here.  Compute 
nodes
and pacemaker remote are made for one another :D

If anyone is interested in prototyping pacemaker remote for this compute node 
use
case, make sure to include me. I have done quite a bit research into how to 
maximize
pacemaker's ability to scale horizontally. As part of that research I've made a 
few
changes that are directly related to all of this that are not yet in an official
pacemaker release.  Come to me for the latest rpms and you'll have a less 
painful
experience setting all this up :)

-- Vossel







 
 Cheers,
 Gibi
 
  
  --
  Russell Bryant
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Automatic evacuate

2014-10-21 Thread David Vossel


- Original Message -
 On Thu, Oct 16, 2014 at 7:48 PM, Jay Pipes jaypi...@gmail.com wrote:
  While one of us (Jay or me) speaking for the other and saying we agree
  is a distributed consensus problem that dwarfs the complexity of
  Paxos
 
 
  You've always had a way with words, Florian :)
 
 I knew you'd like that one. :)
 
 , *I* for my part do think that an external toolset (i.e. one
 
  that lives outside the Nova codebase) is the better approach versus
  duplicating the functionality of said toolset in Nova.
 
  I just believe that the toolset that should be used here is
  Corosync/Pacemaker and not Ceilometer/Heat. And I believe the former
  approach leads to *much* fewer necessary code changes *in* Nova than
  the latter.
 
 
  I agree with you that Corosync/Pacemaker is the tool of choice for
  monitoring/heartbeat functionality, and is my choice for compute-node-level
  HA monitoring. For guest-level HA monitoring, I would say use
  Heat/Ceilometer. For container-level HA monitoring, it looks like fleet or
  something like Kubernetes would be a good option.
 
 Here's why I think that's a bad idea: none of these support the
 concept of being subordinate to another cluster.
 
 Again, suppose a VM stops responding. Then
 Heat/Ceilometer/Kubernetes/fleet would need to know whether the node
 hosting the VM is down or not. Only if the node is up or recovered
 (which Pacemaker would be reponsible for) the VM HA facility would be
 able to kick in. Effectively you have two views of the cluster
 membership, and that sort of thing always gets messy. In the HA space
 we're always facing the same issues when a replication facility
 (Galera, GlusterFS, DRBD, whatever) has a different view of the
 cluster membership than the cluster manager itself — which *always*
 happens for a few seconds on any failover, recovery, or fencing event.
 
 Russell's suggestion, by having remote Pacemaker instances on the
 compute nodes tie in with a Pacemaker cluster on the control nodes,
 does away with that discrepancy.
 
  I'm curious to see how the combination of compute-node-level HA and
  container-level HA tools will work together in some of the proposed
  deployment architectures (bare metal + docker containers w/ OpenStack and
  infrastructure services run in a Kubernetes pod or CoreOS fleet).
 
 I have absolutely nothing against an OpenStack cluster using
 *exclusively* Kubernetes or fleet for HA management, once those have
 reached sufficient maturity.

It's not about reaching sufficient maturity for these two projects. They are
on the wrong path to achieve proper HA. Kubernetes and fleet (i'll throw geard
into the mix as well) do a great job at distributed management of containers.
The  difference is instead of integrating with a proper HA stack (like Nova is)
kubernetes and fleet are attempting their own HA. In doing this, they've
unknowingly blown the scope of their respective projects way beyond what they
originally set out to do.

Here's the problem. HA is both very misunderstood and deceivingly difficult to
achieve. System wide deterministic failover behavior is not a matter of 
monitoring
and restarting failed containers. For kubernetes and fleet to succeed, they will
need to integrate with a proper HA stack like pacemaker.

Below are some presentation slides on how I envision pacemaker interacting with
container orchestration tools.

https://github.com/davidvossel/phd/blob/master/doc/presentations/HA_Container_Overview_David_Vossel.pdf?raw=true

-- Vossel

 But just about every significant
 OpenStack distro out there has settled on Corosync/Pacemaker for the
 time being. Let's not shove another cluster manager down their throats
 for little to no real benefit.
 
 Cheers,
 Florian
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] on Dockerfile patterns

2014-10-15 Thread David Vossel


- Original Message -
 I'm not arguing that everything should be managed by one systemd, I'm just
 saying, for certain types of containers, a single docker container with
 systemd in it might be preferable to trying to slice it unnaturally into
 several containers.
 
 Systemd has invested a lot of time/effort to be able to relaunch failed
 services, support spawning and maintaining unix sockets and services across
 them, etc, that you'd have to push out of and across docker containers. All
 of that can be done, but why reinvent the wheel? Like you said, pacemaker
 can be made to make it all work, but I have yet to see a way to deploy
 pacemaker services anywhere near as easy as systemd+yum makes it. (Thanks be
 to redhat. :)
 
 The answer seems to be, its not dockerish. Thats ok. I just wanted to
 understand the issue for what it is. If there is a really good reason for
 not wanting to do it, or that its just not the way things are done. I've
 had kind of the opposite feeling regarding docker containers. Docker use to
 do very bad things when killing the container. nasty if you wanted your
 database not to go corrupt. killing pid 1 is a bit sketchy then forcing the
 container down after 10 seconds was particularly bad. having something like
 systemd in place allows the database to be notified, then shutdown properly.
 Sure you can script up enough shell to make this work, but you have to do
 some difficult code, over and over again... Docker has gotten better more
 recently but it still makes me a bit nervous using it for statefull things.
 
 As for recovery, systemd can do the recovery too. I'd argue at this point in
 time, I'd expect systemd recovery to probably work better then some custom

yes, systemd can do recovery and that is part of the problem. From my 
perspective
there should be one resource management system. Whether that be pacemaker, 
kubernetes,
or some other distributed system, it doesn't matter.  If you are mixing systemd
with these other external distributed orchestration/management tools you have 
containers
that are silently failing/recovering without the management layer having any 
clue.

centralized recovery. There's one tool responsible for detecting and invoking 
recovery.
Everything else in the system is designed to make that possible.

If we want to put a process in the container to manage multiple services, we'd 
need
the ability to escalate failures to the distributed management tool.  Systemd 
could
work if it was given the ability to act more as a watchdog after starting 
services than
invoke recovery. If systemd could be configured to die (or potentially 
gracefully cleanup
the containers resource's before dieing) whenever a failure is detected, then 
systemd
might make sense. 

I'm approaching this from a system management point of view. Running systemd in 
your
one off container that you're managing manually does not have the same 
drawbacks.
I don't have a vendetta against systemd or anything, I just think it's a step 
backwards
to put systemd in containers. I see little value in having containers become 
lightweight
virtual machines. Containers have much more to offer.

-- Vossel



 shell scripts when it comes to doing the right thing recovering at bring up.
 The other thing is, recovery is not just about pid1 going away. often it
 sticks around and other badness is going on. Its A way to know things are
 bad, but you can't necessarily rely on it to know the container's healty.
 You need more robust checks for that.
 
 Thanks,
 Kevin
 
 
 From: David Vossel [dvos...@redhat.com]
 Sent: Tuesday, October 14, 2014 4:52 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [kolla] on Dockerfile patterns
 
 - Original Message -
  Ok, why are you so down on running systemd in a container?
 
 It goes against the grain.
 
 From a distributed systems view, we gain quite a bit of control by
 maintaining
 one service per container. Containers can be re-organised and re-purposed
 dynamically.
 If we have systemd trying to manage an entire stack of resources within a
 container,
 we lose this control.
 
 From my perspective a containerized application stack needs to be managed
 externally
 by whatever is orchestrating the containers to begin with. When we take a
 step back
 and look at how we actually want to deploy containers, systemd doesn't make
 much sense.
 It actually limits us in the long run.
 
 Also... recovery. Using systemd to manage a stack of resources within a
 single container
 makes it difficult for whatever is externally enforcing the availability of
 that container
 to detect the health of the container.  As it is now, the actual service is
 pid 1 of a
 container. If that service dies, the container dies. If systemd is pid 1,
 there can
 be all kinds of chaos occurring within the container, but the external
 distributed
 orchestration system won't have a clue

Re: [openstack-dev] [kolla] on Dockerfile patterns

2014-10-15 Thread David Vossel


- Original Message -
 On Tue, 2014-10-14 at 19:52 -0400, David Vossel wrote:
  
  - Original Message -
   Ok, why are you so down on running systemd in a container?
  
  It goes against the grain.
  
  From a distributed systems view, we gain quite a bit of control by
  maintaining
  one service per container. Containers can be re-organised and re-purposed
  dynamically.
  If we have systemd trying to manage an entire stack of resources within a
  container,
  we lose this control.
  
  From my perspective a containerized application stack needs to be managed
  externally
  by whatever is orchestrating the containers to begin with. When we take a
  step back
  and look at how we actually want to deploy containers, systemd doesn't make
  much sense.
  It actually limits us in the long run.
  
  Also... recovery. Using systemd to manage a stack of resources within a
  single container
  makes it difficult for whatever is externally enforcing the availability of
  that container
  to detect the health of the container.  As it is now, the actual service is
  pid 1 of a
  container. If that service dies, the container dies. If systemd is pid 1,
  there can
  be all kinds of chaos occurring within the container, but the external
  distributed
  orchestration system won't have a clue (unless it invokes some custom
  health monitoring
  tools within the container itself, which will likely be the case someday.)
 
 I don't really think this is a good argument.  If you're using docker,
 docker is the management and orchestration system for the containers.

no, docker is a local tool for pulling images and launching containers.
Docker is not the distributed resource manager in charge of overseeing
what machines launch what containers and how those containers are linked
together.

 There's no dogmatic answer to the question should you run init in the
 container.

an init daemon might make sense to put in some containers where we have
a tightly coupled resource stack. There could be a use case where it would
make more sense to put these resources in a single container.

I don't think systemd is a good solution for the init daemon though. Systemd
attempts to handle recovery itself as if it has the entire view of the 
system. With containers, the system view exists outside of the containers.
If we put an internal init daemon within the containers, that daemon needs
to escalate internal failures. The easiest way to do this is to
have init die if it encounters a resource failure (init is pid 1, pid 1 exiting
causes container to exit, container exiting gets the attention of whatever
is managing the containers)

 The reason for not running init inside a container managed by docker is
 that you want the template to be thin for ease of orchestration and
 transfer, so you want to share as much as possible with the host.  The
 more junk you put into the container, the fatter and less agile it
 becomes, so you should probably share the init system with the host in
 this paradigm.

I don't think the local init system and containers should have anything
to do with one another.  I said this in a previous reply, I'm approaching
this problem from a distributed management perspective. The host's
init daemon only has a local view of the world. 

 
 Conversely, containers can be used to virtualize full operating systems.
 This isn't the standard way of doing docker, but LXC and OpenVZ by
 default do containers this way.  For this type of container, because you
 have a full OS running inside the container, you have to also have
 systemd (assuming it's the init system) running within the container.

sure, if you want to do this use systemd. I don't understand the use case
where this makes any sense though. For me this falls in the yeah you can do it,
but why? category.

-- Vossel

 
 James
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] on Dockerfile patterns

2014-10-15 Thread David Vossel


- Original Message -
 Excerpts from Vishvananda Ishaya's message of 2014-10-15 07:52:34 -0700:
  
  On Oct 14, 2014, at 1:12 PM, Clint Byrum cl...@fewbar.com wrote:
  
   Excerpts from Lars Kellogg-Stedman's message of 2014-10-14 12:50:48
   -0700:
   On Tue, Oct 14, 2014 at 03:25:56PM -0400, Jay Pipes wrote:
   I think the above strategy is spot on. Unfortunately, that's not how
   the
   Docker ecosystem works.
   
   I'm not sure I agree here, but again nobody is forcing you to use this
   tool.
   
   operating system that the image is built for. I see you didn't respond
   to my
   point that in your openstack-containers environment, you end up with
   Debian
   *and* Fedora images, since you use the official MySQL dockerhub
   image. And
   therefore you will end up needing to know sysadmin specifics (such as
   how
   network interfaces are set up) on multiple operating system
   distributions.
   
   I missed that part, but ideally you don't *care* about the
   distribution in use.  All you care about is the application.  Your
   container environment (docker itself, or maybe a higher level
   abstraction) sets up networking for you, and away you go.
   
   If you have to perform system administration tasks inside your
   containers, my general feeling is that something is wrong.
   
   
   Speaking as a curmudgeon ops guy from back in the day.. the reason
   I choose the OS I do is precisely because it helps me _when something
   is wrong_. And the best way an OS can help me is to provide excellent
   debugging tools, and otherwise move out of the way.
   
   When something _is_ wrong and I want to attach GDB to mysqld in said
   container, I could build a new container with debugging tools installed,
   but that may lose the very system state that I'm debugging. So I need to
   run things inside the container like apt-get or yum to install GDB.. and
   at some point you start to realize that having a whole OS is actually a
   good thing even if it means needing to think about a few more things up
   front, such as which OS will I use? and what tools do I need installed
   in my containers?
   
   What I mean to say is, just grabbing off the shelf has unstated
   consequences.
  
  If this is how people are going to use and think about containers, I would
  submit they are a huge waste of time. The performance value they offer is
  dramatically outweighed by the flexibilty and existing tooling that exists
  for virtual machines. As I state in my blog post[1] if we really want to
  get value from containers, we must convert to the single application per
  container view. This means having standard ways of doing the above either
  on the host machine or in a debugging container that is as easy (or easier)
  than the workflow you mention. There are not good ways to do this yet, and
  the community hand-waves it away, saying things like, well you could …”.
  You could isn’t good enough. The result is that a lot of people that are
  using containers today are doing fat containers with a full os.
  
 
 I think we really agree.
 
 What the container universe hasn't worked out is all the stuff that the
 distros have worked out for a long time now: consistency.

I agree we need consistency. I have an idea. What if we developed an entrypoint
script standard...

Something like LSB init scripts except tailored towards the container use case.
The primary difference would be that the 'start' action of this new standard
wouldn't fork. Instead 'start' would be pid 1. The 'status' could be checked
externally by calling the exact same entry point script to invoke the 'status'
function.

This standard would lock us into the 'one service per container' concept while
giving us the ability to standardize on how the container is launched and 
monitored.

If we all conformed to something like this, docker could even extend the 
standard
so health checks could be performed using the docker cli tool.

docker status container id

Internally docker would just be doing a nsenter into the container and calling
the internal status function in our init script standard.

We already have docker start container and docker stop container. Being able
to generically call something like docker status container and have that 
translate
into some service specific command on the inside of the container would be kind 
of
neat.

Tools like kubernetes could use this functionality to poll a container's health 
and
be able to detect issues occurring within the container that don't necessarily
involve the container's service failing.

Does anyone else have any interest in this? I have quite a bit of of init 
script type
standard experience. It would be trivial for me to define something like this 
for us
to begin discussing.

-- Vossel

 I think it would be a good idea for containers' filesystem contents to
 be a whole distro. What's at question in this thread is what should be
 running. If we can just chroot into the container's FS 

Re: [openstack-dev] [kolla] on Dockerfile patterns

2014-10-14 Thread David Vossel


- Original Message -
 Same thing works with cloud init too...
 
 
 I've been waiting on systemd working inside a container for a while. it seems
 to work now.

oh no...

 The idea being its hard to write a shell script to get everything up and
 running with all the interactions that may need to happen. The init system's
 already designed for that. Take a nova-compute docker container for example,
 you probably need nova-compute, libvirt, neutron-openvswitch-agent, and the
 celiometer-agent all backed in. Writing a shell script to get it all started
 and shut down properly would be really ugly.

 You could split it up into 4 containers and try and ensure they are
 coscheduled and all the pieces are able to talk to each other, but why?
 Putting them all in one container with systemd starting the subprocesses is
 much easier and shouldn't have many drawbacks. The components code is
 designed and tested assuming the pieces are all together.

What you need is a dependency model that is enforced outside of the containers. 
Something
that manages the order containers are started/stopped/recovered in. This allows
you to isolate your containers with 1 service per container, yet still express 
that
container with service A needs to start before container with service B.

Pacemaker does this easily. There's even a docker resource-agent for Pacemaker 
now.
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/docker

-- Vossel

ps. don't run systemd in a container... If you think you should, talk to me 
first.

 
 You can even add a ssh server in there easily too and then ansible in to do
 whatever other stuff you want to do to the container like add other
 monitoring and such
 
 Ansible or puppet or whatever should work better in this arrangement too
 since existing code assumes you can just systemctl start foo;
 
 Kevin
 
 From: Lars Kellogg-Stedman [l...@redhat.com]
 Sent: Tuesday, October 14, 2014 12:10 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [kolla] on Dockerfile patterns
 
 On Tue, Oct 14, 2014 at 02:45:30PM -0400, Jay Pipes wrote:
  With Docker, you are limited to the operating system of whatever the image
  uses.
 
 See, that's the part I disagree with.  What I was saying about ansible
 and puppet in my email is that I think the right thing to do is take
 advantage of those tools:
 
   FROM ubuntu
 
   RUN apt-get install ansible
   COPY my_ansible_config.yaml /my_ansible_config.yaml
   RUN ansible /my_ansible_config.yaml
 
 Or:
 
   FROM Fedora
 
   RUN yum install ansible
   COPY my_ansible_config.yaml /my_ansible_config.yaml
   RUN ansible /my_ansible_config.yaml
 
 Put the minimal instructions in your dockerfile to bootstrap your
 preferred configuration management tool. This is exactly what you
 would do when booting, say, a Nova instance into an openstack
 environment: you can provide a shell script to cloud-init that would
 install whatever packages are required to run your config management
 tool, and then run that tool.
 
 Once you have bootstrapped your cm environment you can take advantage
 of all those distribution-agnostic cm tools.
 
 In other words, using docker is no more limiting than using a vm or
 bare hardware that has been installed with your distribution of
 choice.
 
  [1] Is there an official MySQL docker image? I found 553 Dockerhub
  repositories for MySQL images...
 
 Yes, it's called mysql.  It is in fact one of the official images
 highlighted on https://registry.hub.docker.com/.
 
  I have looked into using Puppet as part of both the build and runtime
  configuration process, but I haven't spent much time on it yet.
 
  Oh, I don't think Puppet is any better than Ansible for these things.
 
 I think it's pretty clear that I was not suggesting it was better than
 ansible.  That is hardly relevant to this discussion.  I was only
 saying that is what *I* have looked at, and I was agreeing that *any*
 configuration management system is probably better than writing shells
 cript.
 
  How would I go about essentially transferring the ownership of the RPC
  exchanges that the original nova-conductor container managed to the new
  nova-conductor container? Would it be as simple as shutting down the old
  container and starting up the new nova-conductor container using things
  like
  --link rabbitmq:rabbitmq in the startup docker line?
 
 I think that you would not necessarily rely on --link for this sort of
 thing.  Under kubernetes, you would use a service definition, in
 which kubernetes maintains a proxy that directs traffic to the
 appropriate place as containers are created and destroyed.
 
 Outside of kubernetes, you would use some other service discovery
 mechanism; there are many available (etcd, consul, serf, etc).
 
 But this isn't particularly a docker problem.  This is the same
 problem you would face running the same software on top of a cloud
 

Re: [openstack-dev] [kolla] on Dockerfile patterns

2014-10-14 Thread David Vossel


- Original Message -
 Ok, why are you so down on running systemd in a container?

It goes against the grain.

From a distributed systems view, we gain quite a bit of control by maintaining
one service per container. Containers can be re-organised and re-purposed 
dynamically.
If we have systemd trying to manage an entire stack of resources within a 
container,
we lose this control.

From my perspective a containerized application stack needs to be managed 
externally
by whatever is orchestrating the containers to begin with. When we take a step 
back
and look at how we actually want to deploy containers, systemd doesn't make 
much sense.
It actually limits us in the long run.

Also... recovery. Using systemd to manage a stack of resources within a single 
container
makes it difficult for whatever is externally enforcing the availability of 
that container
to detect the health of the container.  As it is now, the actual service is pid 
1 of a
container. If that service dies, the container dies. If systemd is pid 1, there 
can
be all kinds of chaos occurring within the container, but the external 
distributed
orchestration system won't have a clue (unless it invokes some custom health 
monitoring
tools within the container itself, which will likely be the case someday.)

-- Vossel


 Pacemaker works, but its kind of a pain to setup compared just yum installing
 a few packages and setting init to systemd. There are some benefits for
 sure, but if you have to force all the docker components onto the same
 physical machine anyway, why bother with the extra complexity?

 Thanks,
 Kevin
 
 
 From: David Vossel [dvos...@redhat.com]
 Sent: Tuesday, October 14, 2014 3:14 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [kolla] on Dockerfile patterns
 
 - Original Message -
  Same thing works with cloud init too...
 
 
  I've been waiting on systemd working inside a container for a while. it
  seems
  to work now.
 
 oh no...
 
  The idea being its hard to write a shell script to get everything up and
  running with all the interactions that may need to happen. The init
  system's
  already designed for that. Take a nova-compute docker container for
  example,
  you probably need nova-compute, libvirt, neutron-openvswitch-agent, and the
  celiometer-agent all backed in. Writing a shell script to get it all
  started
  and shut down properly would be really ugly.
 
  You could split it up into 4 containers and try and ensure they are
  coscheduled and all the pieces are able to talk to each other, but why?
  Putting them all in one container with systemd starting the subprocesses is
  much easier and shouldn't have many drawbacks. The components code is
  designed and tested assuming the pieces are all together.
 
 What you need is a dependency model that is enforced outside of the
 containers. Something
 that manages the order containers are started/stopped/recovered in. This
 allows
 you to isolate your containers with 1 service per container, yet still
 express that
 container with service A needs to start before container with service B.
 
 Pacemaker does this easily. There's even a docker resource-agent for
 Pacemaker now.
 https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/docker
 
 -- Vossel
 
 ps. don't run systemd in a container... If you think you should, talk to me
 first.
 
 
  You can even add a ssh server in there easily too and then ansible in to do
  whatever other stuff you want to do to the container like add other
  monitoring and such
 
  Ansible or puppet or whatever should work better in this arrangement too
  since existing code assumes you can just systemctl start foo;
 
  Kevin
  
  From: Lars Kellogg-Stedman [l...@redhat.com]
  Sent: Tuesday, October 14, 2014 12:10 PM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [kolla] on Dockerfile patterns
 
  On Tue, Oct 14, 2014 at 02:45:30PM -0400, Jay Pipes wrote:
   With Docker, you are limited to the operating system of whatever the
   image
   uses.
 
  See, that's the part I disagree with.  What I was saying about ansible
  and puppet in my email is that I think the right thing to do is take
  advantage of those tools:
 
FROM ubuntu
 
RUN apt-get install ansible
COPY my_ansible_config.yaml /my_ansible_config.yaml
RUN ansible /my_ansible_config.yaml
 
  Or:
 
FROM Fedora
 
RUN yum install ansible
COPY my_ansible_config.yaml /my_ansible_config.yaml
RUN ansible /my_ansible_config.yaml
 
  Put the minimal instructions in your dockerfile to bootstrap your
  preferred configuration management tool. This is exactly what you
  would do when booting, say, a Nova instance into an openstack
  environment: you can provide a shell script to cloud-init that would
  install whatever packages are required to run