Re: [lxc-devel] call to setup_dev_symlinks with lxc.autodev

2014-05-15 Thread William Dauchy
On Fri, Apr 25, 2014 at 5:53 PM, Michael H. Warfield m...@wittsend.com wrote:
 Bingo!  I guess my conjecture about it being a quirk in the kernel VFS
 must be pretty close.

 Ok...  I'll submit a formal patch shortly.

any news about this?

Thanks,
-- 
William
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [PATCH v2] add support for qcow2

2014-05-15 Thread Dwight Engen
On Wed, 14 May 2014 19:59:11 +
Serge Hallyn serge.hal...@ubuntu.com wrote:

 Quoting Dwight Engen (dwight.en...@oracle.com):
  On Mon, 12 May 2014 18:02:28 +
  Serge Hallyn serge.hal...@ubuntu.com wrote:
  
   qcow2 backing stores can be attached to a nbd block device using
   qemu-nbd.  This user-space process (pair) stays around for the
   duration of the device attachment.  Obviously we want it to go
   away when the container shuts down, but not before the filesystems
   have been cleanly unmounted.
   
   The device attachment is done from the task which will become the
   container monitor before the container setup+init task is spawned.
   That task starts in a new pid namespace to ensure that the
   qemu-nbd process will be killed if need be.  It sets its parent
   death signal to sighup, and, on receiving sighup, attempts to do
   a clean qemu-device detach, then exits.  This should ensure that
   the device is detached if the qemu monitor crashes or exits.
   
   It may be worth adding a delay before the qemu-nbd is detached,
   but my brief tests haven't seen any data corruption.
   
   Only the parts required for running a qcow2-backed container are
   implemented here.  Create, destroy, and clone are not.  The first
   use of this that I imagine is for people to use downloaded
   qcow2-backed images (like ubuntu cloud images, or anything
   previously used with qemu).  I imagine people will want to
   create/clone/destroy out of band using qemu-img, but if I'm wrong
   about that we can implement the rest later.
   
   Because attach_block_device() is done before the bdev is
   initialized, and bdev_init needs to know the nbd index so that it
   can mount the filesystem, we now need to pass the lxc_conf.
   
   file_exists() is moved to utils.c so we can use it from bdev.c
   
   The nbd attach/detach should lay the groundwork for trivial
   implementation of qed and raw images.
   
   changelog (may 12): qcow: fix idx check at detach
  
  Hey Serge, I had to check the code for how to use this so maybe we
  should document somewhere what the rootfs line needs to look like
  (ie. lxc.rootfs = qcow2:/path/to/diskimg:1).
  
  Also, I used this against a .vdi image just fine, so maybe we
  should be more generic than just qcow2 and call it qemu? Not sure
  if qemu-nbd supports all the same image formats as qemu-img.
 
 so,
 
 nbd:/file[:partition]
 
 ?

Sounds good to me.
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-15 Thread Michael H. Warfield
On Wed, 2014-05-14 at 21:00 -0700, Greg Kroah-Hartman wrote:
 On Wed, May 14, 2014 at 10:15:27PM -0500, Seth Forshee wrote:
  On Wed, May 14, 2014 at 10:17:31PM -0400, Michael H. Warfield wrote:
 Using devtmpfs is one possible
 solution, and it would have the added benefit of making container 
 setup
 simpler. But simply letting containers mount devtmpfs isn't sufficient
 since the container may need to see a different, more limited set of
 devices, and because different environments making modifications to
 the filesystem could lead to conflicts.
 
 This series solves these problems by assigning devices to user
 namespaces. Each device has an owner namespace which specifies which
 devtmpfs mount the device should appear in as well allowing priveleged
 operations on the device from that namespace. This defaults to
 init_user_ns. There's also an ns_global flag to indicate a device 
 should
 appear in all devtmpfs mounts.
   
I'd strongly argue that this isn't even a problem at all.  And, as I
said at the Plumbers conference last year, adding namespaces to devices
isn't going to happen, sorry.  Please don't continue down this path.
   
   I was just mentioning that to Serge just a week or so ago reminding him
   of what you told all of us face to face back then.  We were having a
   discussion over loop devices into containers and this topic came up.
  
  It was the loop device use case that got me started down this path in
  the first place, so I don't personally have any interest in physical
  devices right now (though I was sure others would).

 Why do you want to give access to a loop device to a container?
 Shouldn't you set up the loop devices before creating the container and
 then pass those mount points into the container?  I thought that was how
 things worked today, or am I missing something?

Ah, you keep feeding me easy ones.  I need raw access to loop devices
and loop-control because I'm using containers to build NST (Network
Security Toolkit) distribution iso images (one container is x86_64 while
the other is i686).  Each requires 2 loop devices.  You can't set up the
loop devices in advance since the containers will be creating the images
and building them.  NST tinkers with the base build engine
configuration, so I really DON'T want it running on a hard iron host. 
There may be other cases where I need other specialized containers for
building distros.  I'm also looking at custom builds of Kali (another
security distribution).

 Giving the ability for a container to create a loop device at all is a
 horrid idea, as you have pointed out, lots of information leakage could
 easily happen.

It does but only slightly.  I noticed that losetup will list all the
devices regardless of container where run or the container where set up.
But that seems to be largely cosmetic.  You can't do anything with the
loop device in the other container.  You can't disconnected it, read it,
or mount it (I've tested it).  In the former case, losetup returns with
no error but does nothing.  In the later case, you get a busy error.
Not clean, not pretty, but no damage.  Since loop-control is working on
the global pool of loop devices, it's impossible to know what device to
move to what container when the container runs losetup.

For me, this isn't a serious problem, since it only involves 2
specialized containers out of over 4 dozen containers I have running
across 3 sites.  And those two containers are under my explicit and
exclusive control.  None of the others need it.  I can get away with
adding extra loop devices and adding them to the containers and let
losetup deal with allocation and contention.

Serge mentioned something to me about a loopdevfs (?) thing that someone
else is working on.  That would seem to be a better solution in this
particular case but I don't know much about it or where it's at.

Mind you, I heard your arguments at LinuxPlumbers regarding pushing user
space policies into the kernel and all and basically I agree with you,
this should be handled in host system user space and it seems
reasonable.  I'm just pointing out real world cases I have in operation
right now and pointing out that I have solutions for them in host user
space, even if some of them may not be estheticly pretty.

 greg k-h

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 978-7061 |  m...@wittsend.com
   /\/\|=mhw=|\/\/  | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9  | An optimist believes we live in the best of all
 PGP Key: 0x674627FF| possible worlds.  A pessimist is sure of it!



signature.asc
Description: This is a digitally signed message part
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-15 Thread Greg Kroah-Hartman
On Thu, May 15, 2014 at 09:42:17AM -0400, Michael H. Warfield wrote:
 On Wed, 2014-05-14 at 21:00 -0700, Greg Kroah-Hartman wrote:
  On Wed, May 14, 2014 at 10:15:27PM -0500, Seth Forshee wrote:
   On Wed, May 14, 2014 at 10:17:31PM -0400, Michael H. Warfield wrote:
  Using devtmpfs is one possible
  solution, and it would have the added benefit of making container 
  setup
  simpler. But simply letting containers mount devtmpfs isn't 
  sufficient
  since the container may need to see a different, more limited set of
  devices, and because different environments making modifications to
  the filesystem could lead to conflicts.
  
  This series solves these problems by assigning devices to user
  namespaces. Each device has an owner namespace which specifies 
  which
  devtmpfs mount the device should appear in as well allowing 
  priveleged
  operations on the device from that namespace. This defaults to
  init_user_ns. There's also an ns_global flag to indicate a device 
  should
  appear in all devtmpfs mounts.

 I'd strongly argue that this isn't even a problem at all.  And, as I
 said at the Plumbers conference last year, adding namespaces to 
 devices
 isn't going to happen, sorry.  Please don't continue down this path.

I was just mentioning that to Serge just a week or so ago reminding him
of what you told all of us face to face back then.  We were having a
discussion over loop devices into containers and this topic came up.
   
   It was the loop device use case that got me started down this path in
   the first place, so I don't personally have any interest in physical
   devices right now (though I was sure others would).
 
  Why do you want to give access to a loop device to a container?
  Shouldn't you set up the loop devices before creating the container and
  then pass those mount points into the container?  I thought that was how
  things worked today, or am I missing something?
 
 Ah, you keep feeding me easy ones.  I need raw access to loop devices
 and loop-control because I'm using containers to build NST (Network
 Security Toolkit) distribution iso images (one container is x86_64 while
 the other is i686).  Each requires 2 loop devices.  You can't set up the
 loop devices in advance since the containers will be creating the images
 and building them.  NST tinkers with the base build engine
 configuration, so I really DON'T want it running on a hard iron host. 
 There may be other cases where I need other specialized containers for
 building distros.  I'm also looking at custom builds of Kali (another
 security distribution).

Then don't use a container to build such a thing, or fix the build
scripts to not do that :)

That is not a normal use case for a container at all.  Containers are
not for everything, use a virtual machine for some tasks (like this
one).

 Serge mentioned something to me about a loopdevfs (?) thing that someone
 else is working on.  That would seem to be a better solution in this
 particular case but I don't know much about it or where it's at.

Ok, let's see those patches then.

thanks,

greg k-h
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] call to setup_dev_symlinks with lxc.autodev

2014-05-15 Thread William Dauchy
On Thu, May 15, 2014 at 3:45 PM, Michael H. Warfield m...@wittsend.com wrote:
 The patch was submitted and committed to git head shortly after I
 submitted it.  It's there now but there hasn't been a point release
 since.  1.0.4 is not out yet and I have no idea if this patch made the
 cut for 1.0.4 or not.  That release seems to have taken a week or two
 longer than expected, so I hope it will be included.

ok thanks for the info

-- 
William
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


[lxc-devel] [PATCH 1/2] add support for nbd (v3)

2014-05-15 Thread Serge Hallyn
backing stores supported by qemu-nbd can be attached to a nbd block
device using qemu-nbd.  This user-space process (pair) stays around for
the duration of the device attachment.  Obviously we want it to go away
when the container shuts down, but not before the filesystems have been
cleanly unmounted.

The device attachment is done from the task which will become the
container monitor before the container setup+init task is spawned.
That task starts in a new pid namespace to ensure that the qemu-nbd
process will be killed if need be.  It sets its parent death signal
to sighup, and, on receiving sighup, attempts to do a clean
qemu-device detach, then exits.  This should ensure that the
device is detached if the qemu monitor crashes or exits.

It may be worth adding a delay before the qemu-nbd is detached, but
my brief tests haven't seen any data corruption.

Only the parts required for running a nbd-backed container are
implemented here.  Create, destroy, and clone are not.  The first
use of this that I imagine is for people to use downloaded nbd-backed
images (like ubuntu cloud images, or anything previously used with
qemu).  I imagine people will want to create/clone/destroy out of
band using qemu-img, but if I'm wrong about that we can implement
the rest later.

Because attach_block_device() is done before the bdev is initialized,
and bdev_init needs to know the nbd index so that it can mount the
filesystem, we now need to pass the lxc_conf.

file_exists() is moved to utils.c so we can use it from bdev.c

The nbd attach/detach should lay the groundwork for trivial implementation
of qed and raw images.

changelog (may 12): fix idx check at detach
changelog (may 15): generalize qcow2 to nbd

Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com
---
 src/lxc/bdev.c | 293 -
 src/lxc/bdev.h |  17 ++-
 src/lxc/conf.c |   3 +-
 src/lxc/conf.h |   1 +
 src/lxc/lxccontainer.c |  19 +---
 src/lxc/start.c|  11 +-
 src/lxc/utils.c|   7 ++
 src/lxc/utils.h|   1 +
 8 files changed, 329 insertions(+), 23 deletions(-)

diff --git a/src/lxc/bdev.c b/src/lxc/bdev.c
index 20e9fb3..e22d83d 100644
--- a/src/lxc/bdev.c
+++ b/src/lxc/bdev.c
@@ -41,6 +41,7 @@
 #include libgen.h
 #include linux/loop.h
 #include dirent.h
+#include sys/prctl.h
 
 #include lxc.h
 #include config.h
@@ -2410,6 +2411,287 @@ static const struct bdev_ops aufs_ops = {
.can_snapshot = true,
 };
 
+//
+// nbd dev ops
+//
+
+static int nbd_detect(const char *path)
+{
+   if (strncmp(path, nbd:, 4) == 0)
+   return 1;
+   return 0;
+}
+
+struct nbd_attach_data {
+   const char *nbd;
+   const char *path;
+};
+
+static void nbd_detach(const char *path)
+{
+   int ret;
+   pid_t pid = fork();
+
+   if (pid  0) {
+   SYSERROR(Error forking to detach nbd);
+   return;
+   }
+   if (pid) {
+   ret = wait_for_pid(pid);
+   if (ret  0)
+   ERROR(nbd disconnect returned an error);
+   return;
+   }
+   execlp(qemu-nbd, qemu-nbd, -d, path, NULL);
+   SYSERROR(Error executing qemu-nbd);
+   exit(1);
+}
+
+static int do_attach_nbd(void *d)
+{
+   struct nbd_attach_data *data = d;
+   const char *nbd, *path;
+   pid_t pid;
+   sigset_t mask;
+   int sfd;
+   ssize_t s;
+   struct signalfd_siginfo fdsi;
+
+   sigemptyset(mask);
+   sigaddset(mask, SIGHUP);
+   sigaddset(mask, SIGCHLD);
+
+   nbd = data-nbd;
+   path = data-path;
+
+   if (sigprocmask(SIG_BLOCK, mask, NULL) == -1) {
+   SYSERROR(Error blocking signals for nbd watcher);
+   exit(1);
+   }
+
+   sfd = signalfd(-1, mask, 0);
+   if (sfd == -1) {
+   SYSERROR(Error opening signalfd for nbd task);
+   exit(1);
+   }
+
+   if (prctl(PR_SET_PDEATHSIG, SIGHUP, 0, 0, 0)  0)
+   SYSERROR(Error setting parent death signal for nbd watcher);
+
+   pid = fork();
+   if (pid) {
+   for (;;) {
+   s = read(sfd, fdsi, sizeof(struct signalfd_siginfo));
+   if (s != sizeof(struct signalfd_siginfo))
+   SYSERROR(Error reading from signalfd);
+
+   if (fdsi.ssi_signo == SIGHUP) {
+   /* container has exited */
+   nbd_detach(nbd);
+   exit(0);
+   } else if (fdsi.ssi_signo == SIGCHLD) {
+   int status;
+   while (waitpid(-1, status, WNOHANG)  0);
+   }
+   }
+   }
+
+   close(sfd);
+   if (sigprocmask(SIG_UNBLOCK, mask, NULL) == -1)
+   WARN(Warning: unblocking signals for nbd watcher);
+
+   execlp(qemu-nbd, qemu-nbd, -c, 

[lxc-devel] [PATCH 2/2] lxc.container.conf: document the type: lxc.rootfs conventions

2014-05-15 Thread Serge Hallyn
Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com
---
 doc/lxc.container.conf.sgml.in | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/doc/lxc.container.conf.sgml.in b/doc/lxc.container.conf.sgml.in
index 6e96889..39de1cc 100644
--- a/doc/lxc.container.conf.sgml.in
+++ b/doc/lxc.container.conf.sgml.in
@@ -876,6 +876,20 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, 
MA 02110-1301 USA
  specified, the container shares its root file system
  with the host.
/para
+   para
+  For directory or simple block-device backed containers,
+  a pathname can be used.  If the rootfs is backed by a nbd
+  device, then filenamenbd:file:1/filename specifies that
+  filenamefile/filename should be attached to a nbd device,
+  and partition 1 should be mounted as the rootfs.
+  filenamenbd:file/filename specifies that the nbd device
+  itself should be mounted.  
filenameoverlayfs:/lower:/upper/filename
+  specifies that the rootfs should be an overlay with 
filename/upper/filename
+  being mounted read-write over a read-only mount of 
filename/lower/filename.
+  filenameaufs:/lower:/upper/filename does the same using aufs in 
place
+  of overlayfs. filenameloop:/file/filename tells lxc to attach
+  filename/file/filename to a loop device and mount the loop 
device.
+   /para
  /listitem
/varlistentry
 
-- 
1.9.1

___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 1/2] add support for nbd (v3)

2014-05-15 Thread Dwight Engen
On Thu, 15 May 2014 14:33:18 +
Serge Hallyn serge.hal...@ubuntu.com wrote:

 backing stores supported by qemu-nbd can be attached to a nbd block
 device using qemu-nbd.  This user-space process (pair) stays around
 for the duration of the device attachment.  Obviously we want it to
 go away when the container shuts down, but not before the filesystems
 have been cleanly unmounted.
 
 The device attachment is done from the task which will become the
 container monitor before the container setup+init task is spawned.
 That task starts in a new pid namespace to ensure that the qemu-nbd
 process will be killed if need be.  It sets its parent death signal
 to sighup, and, on receiving sighup, attempts to do a clean
 qemu-device detach, then exits.  This should ensure that the
 device is detached if the qemu monitor crashes or exits.
 
 It may be worth adding a delay before the qemu-nbd is detached, but
 my brief tests haven't seen any data corruption.
 
 Only the parts required for running a nbd-backed container are
 implemented here.  Create, destroy, and clone are not.  The first
 use of this that I imagine is for people to use downloaded nbd-backed
 images (like ubuntu cloud images, or anything previously used with
 qemu).  I imagine people will want to create/clone/destroy out of
 band using qemu-img, but if I'm wrong about that we can implement
 the rest later.
 
 Because attach_block_device() is done before the bdev is initialized,
 and bdev_init needs to know the nbd index so that it can mount the
 filesystem, we now need to pass the lxc_conf.
 
 file_exists() is moved to utils.c so we can use it from bdev.c
 
 The nbd attach/detach should lay the groundwork for trivial
 implementation of qed and raw images.
 
 changelog (may 12): fix idx check at detach
 changelog (may 15): generalize qcow2 to nbd
 
 Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com

Acked-by: Dwight Engen dwight.en...@oracle.com

 ---
  src/lxc/bdev.c | 293
 -
 src/lxc/bdev.h |  17 ++- src/lxc/conf.c |   3 +-
  src/lxc/conf.h |   1 +
  src/lxc/lxccontainer.c |  19 +---
  src/lxc/start.c|  11 +-
  src/lxc/utils.c|   7 ++
  src/lxc/utils.h|   1 +
  8 files changed, 329 insertions(+), 23 deletions(-)
 
 diff --git a/src/lxc/bdev.c b/src/lxc/bdev.c
 index 20e9fb3..e22d83d 100644
 --- a/src/lxc/bdev.c
 +++ b/src/lxc/bdev.c
 @@ -41,6 +41,7 @@
  #include libgen.h
  #include linux/loop.h
  #include dirent.h
 +#include sys/prctl.h
  
  #include lxc.h
  #include config.h
 @@ -2410,6 +2411,287 @@ static const struct bdev_ops aufs_ops = {
   .can_snapshot = true,
  };
  
 +//
 +// nbd dev ops
 +//
 +
 +static int nbd_detect(const char *path)
 +{
 + if (strncmp(path, nbd:, 4) == 0)
 + return 1;
 + return 0;
 +}
 +
 +struct nbd_attach_data {
 + const char *nbd;
 + const char *path;
 +};
 +
 +static void nbd_detach(const char *path)
 +{
 + int ret;
 + pid_t pid = fork();
 +
 + if (pid  0) {
 + SYSERROR(Error forking to detach nbd);
 + return;
 + }
 + if (pid) {
 + ret = wait_for_pid(pid);
 + if (ret  0)
 + ERROR(nbd disconnect returned an error);
 + return;
 + }
 + execlp(qemu-nbd, qemu-nbd, -d, path, NULL);
 + SYSERROR(Error executing qemu-nbd);
 + exit(1);
 +}
 +
 +static int do_attach_nbd(void *d)
 +{
 + struct nbd_attach_data *data = d;
 + const char *nbd, *path;
 + pid_t pid;
 + sigset_t mask;
 + int sfd;
 + ssize_t s;
 + struct signalfd_siginfo fdsi;
 +
 + sigemptyset(mask);
 + sigaddset(mask, SIGHUP);
 + sigaddset(mask, SIGCHLD);
 +
 + nbd = data-nbd;
 + path = data-path;
 +
 + if (sigprocmask(SIG_BLOCK, mask, NULL) == -1) {
 + SYSERROR(Error blocking signals for nbd watcher);
 + exit(1);
 + }
 +
 + sfd = signalfd(-1, mask, 0);
 + if (sfd == -1) {
 + SYSERROR(Error opening signalfd for nbd task);
 + exit(1);
 + }
 +
 + if (prctl(PR_SET_PDEATHSIG, SIGHUP, 0, 0, 0)  0)
 + SYSERROR(Error setting parent death signal for nbd
 watcher); +
 + pid = fork();
 + if (pid) {
 + for (;;) {
 + s = read(sfd, fdsi, sizeof(struct
 signalfd_siginfo));
 + if (s != sizeof(struct signalfd_siginfo))
 + SYSERROR(Error reading from
 signalfd); +
 + if (fdsi.ssi_signo == SIGHUP) {
 + /* container has exited */
 + nbd_detach(nbd);
 + exit(0);
 + } else if (fdsi.ssi_signo == SIGCHLD) {
 + int status;
 + while (waitpid(-1, status, WNOHANG)
  0);
 + }
 + }
 + }
 +
 + close(sfd);
 + 

Re: [lxc-devel] [PATCH 2/2] lxc.container.conf: document the type: lxc.rootfs conventions

2014-05-15 Thread Michael H. Warfield
On Thu, 2014-05-15 at 14:33 +, Serge Hallyn wrote:
 Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com
 ---
  doc/lxc.container.conf.sgml.in | 14 ++
  1 file changed, 14 insertions(+)

 diff --git a/doc/lxc.container.conf.sgml.in b/doc/lxc.container.conf.sgml.in
 index 6e96889..39de1cc 100644
 --- a/doc/lxc.container.conf.sgml.in
 +++ b/doc/lxc.container.conf.sgml.in
 @@ -876,6 +876,20 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, 
 Boston, MA 02110-1301 USA
 specified, the container shares its root file system
 with the host.
   /para
 + para
 +  For directory or simple block-device backed containers,
 +  a pathname can be used.  If the rootfs is backed by a nbd
 +  device, then filenamenbd:file:1/filename specifies that
 +  filenamefile/filename should be attached to a nbd device,
 +  and partition 1 should be mounted as the rootfs.
 +  filenamenbd:file/filename specifies that the nbd device
 +  itself should be mounted.  
 filenameoverlayfs:/lower:/upper/filename
 +  specifies that the rootfs should be an overlay with 
 filename/upper/filename
 +  being mounted read-write over a read-only mount of 
 filename/lower/filename.
 +  filenameaufs:/lower:/upper/filename does the same using aufs 
 in place
 +  of overlayfs. filenameloop:/file/filename tells lxc to attach
 +  filename/file/filename to a loop device and mount the loop 
 device.
 + /para
 /listitem
   /varlistentry
  
 -- 
 1.9.1

I may be off base here but, does this relate to that exchange on the
-users list a couple of weeks ago about the Fedora template and an lvm
backing store?  Or is that one of these simple block-device backed
things?  The OP never got back with us and I haven't tried testing it
myself.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 978-7061 |  m...@wittsend.com
   /\/\|=mhw=|\/\/  | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9  | An optimist believes we live in the best of all
 PGP Key: 0x674627FF| possible worlds.  A pessimist is sure of it!



signature.asc
Description: This is a digitally signed message part
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


[lxc-devel] [PATCH 3/2] nbd: exit cleanly if nbd fails to attach

2014-05-15 Thread Serge Hallyn
Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com
---
 src/lxc/bdev.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/lxc/bdev.c b/src/lxc/bdev.c
index e22d83d..1d9a25a 100644
--- a/src/lxc/bdev.c
+++ b/src/lxc/bdev.c
@@ -2491,7 +2491,15 @@ static int do_attach_nbd(void *d)
exit(0);
} else if (fdsi.ssi_signo == SIGCHLD) {
int status;
-   while (waitpid(-1, status, WNOHANG)  0);
+   /* If qemu-nbd fails, or is killed by a signal,
+* then exit */
+   while (waitpid(-1, status, WNOHANG)  0) {
+   if ((WIFEXITED(status)  
WEXITSTATUS(status) != 0) ||
+   WIFSIGNALED(status)) {
+   nbd_detach(nbd);
+   exit(1);
+   }
+   }
}
}
}
-- 
1.9.1

___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 2/2] lxc.container.conf: document the type: lxc.rootfs conventions

2014-05-15 Thread Dwight Engen
On Thu, 15 May 2014 14:33:47 +
Serge Hallyn serge.hal...@ubuntu.com wrote:

 Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com

Acked-by: Dwight Engen dwight.en...@oracle.com

 ---
  doc/lxc.container.conf.sgml.in | 14 ++
  1 file changed, 14 insertions(+)
 
 diff --git a/doc/lxc.container.conf.sgml.in
 b/doc/lxc.container.conf.sgml.in index 6e96889..39de1cc 100644
 --- a/doc/lxc.container.conf.sgml.in
 +++ b/doc/lxc.container.conf.sgml.in
 @@ -876,6 +876,20 @@ Foundation, Inc., 51 Franklin Street, Fifth
 Floor, Boston, MA 02110-1301 USA specified, the container shares its
 root file system with the host.
   /para
 + para
 +  For directory or simple block-device backed containers,
 +  a pathname can be used.  If the rootfs is backed by a nbd
 +  device, then filenamenbd:file:1/filename specifies that
 +  filenamefile/filename should be attached to a nbd
 device,
 +  and partition 1 should be mounted as the rootfs.
 +  filenamenbd:file/filename specifies that the nbd device
 +  itself should be mounted.
 filenameoverlayfs:/lower:/upper/filename
 +  specifies that the rootfs should be an overlay with
 filename/upper/filename
 +  being mounted read-write over a read-only mount of
 filename/lower/filename.
 +  filenameaufs:/lower:/upper/filename does the same
 using aufs in place
 +  of overlayfs. filenameloop:/file/filename tells lxc to
 attach
 +  filename/file/filename to a loop device and mount the
 loop device.
 + /para
 /listitem
   /varlistentry
  

___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-15 Thread Richard Weinberger
On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman
gre...@linuxfoundation.org wrote:
 Then don't use a container to build such a thing, or fix the build
 scripts to not do that :)

I second this.
To me it looks like some folks try to (ab)use Linux containers
for purposes where KVM would much better fit in.
Please don't put more complexity into containers. They are already
horrible complex
and error prone.

-- 
Thanks,
//richard
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-15 Thread Serge Hallyn
Quoting Richard Weinberger (richard.weinber...@gmail.com):
 On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman
 gre...@linuxfoundation.org wrote:
  Then don't use a container to build such a thing, or fix the build
  scripts to not do that :)
 
 I second this.
 To me it looks like some folks try to (ab)use Linux containers
 for purposes where KVM would much better fit in.
 Please don't put more complexity into containers. They are already
 horrible complex
 and error prone.

I, naturally, disagree :)  The only use case which is inherently not
valid for containers is running a kernel.  Practically speaking there
are other things which likely will never be possible, but if someone
offers a way to do something in containers, you can't do that in
containers is not an apropos response.

That abstraction is wrong is certainly valid, as when vpids were
originally proposed and rejected, resulting in the development of
pid namespaces.  We have to work out (x) first can be valid (and
I can think of examples here), assuming it's not just trying to hide
behind a catch-22/chicken-egg problem.

Finally, saying containers are complex and error prone is conflating
several large suites of userspace code and many kernel features which
support them.  Being more precise would, if the argument is valid,
lend it a lot more weight.
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-15 Thread Richard Weinberger
Am 15.05.2014 21:50, schrieb Serge Hallyn:
 Quoting Richard Weinberger (richard.weinber...@gmail.com):
 On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman
 gre...@linuxfoundation.org wrote:
 Then don't use a container to build such a thing, or fix the build
 scripts to not do that :)

 I second this.
 To me it looks like some folks try to (ab)use Linux containers
 for purposes where KVM would much better fit in.
 Please don't put more complexity into containers. They are already
 horrible complex
 and error prone.
 
 I, naturally, disagree :)  The only use case which is inherently not
 valid for containers is running a kernel.  Practically speaking there
 are other things which likely will never be possible, but if someone
 offers a way to do something in containers, you can't do that in
 containers is not an apropos response.
 
 That abstraction is wrong is certainly valid, as when vpids were
 originally proposed and rejected, resulting in the development of
 pid namespaces.  We have to work out (x) first can be valid (and
 I can think of examples here), assuming it's not just trying to hide
 behind a catch-22/chicken-egg problem.
 
 Finally, saying containers are complex and error prone is conflating
 several large suites of userspace code and many kernel features which
 support them.  Being more precise would, if the argument is valid,
 lend it a lot more weight.

We (my company) use Linux containers since 2011 in production. First LXC, now 
libvirt-lxc.
To understand the internals better I also wrote my own userspace to create/start
containers. There are so many things which can hurt you badly.
With user namespaces we expose a really big attack surface to regular users.
I.e. Suddenly a user is allowed to mount filesystems.
Ask Andy, he found already lots of nasty things...
I agree that user namespaces are the way to go, all the papering with LSM
over security issues is much worse.
But we have to make sure that we don't add too much features too fast.

That said, I like containers a lot because they are cheap but as they are 
lightweight
also therefore also isolation level is lightweight.
IMHO containers are not a cheap replacement for KVM.

Thanks,
//richard
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-15 Thread Serge E. Hallyn
Quoting Richard Weinberger (rich...@nod.at):
 Am 15.05.2014 21:50, schrieb Serge Hallyn:
  Quoting Richard Weinberger (richard.weinber...@gmail.com):
  On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman
  gre...@linuxfoundation.org wrote:
  Then don't use a container to build such a thing, or fix the build
  scripts to not do that :)
 
  I second this.
  To me it looks like some folks try to (ab)use Linux containers
  for purposes where KVM would much better fit in.
  Please don't put more complexity into containers. They are already
  horrible complex
  and error prone.
  
  I, naturally, disagree :)  The only use case which is inherently not
  valid for containers is running a kernel.  Practically speaking there
  are other things which likely will never be possible, but if someone
  offers a way to do something in containers, you can't do that in
  containers is not an apropos response.
  
  That abstraction is wrong is certainly valid, as when vpids were
  originally proposed and rejected, resulting in the development of
  pid namespaces.  We have to work out (x) first can be valid (and
  I can think of examples here), assuming it's not just trying to hide
  behind a catch-22/chicken-egg problem.
  
  Finally, saying containers are complex and error prone is conflating
  several large suites of userspace code and many kernel features which
  support them.  Being more precise would, if the argument is valid,
  lend it a lot more weight.
 
 We (my company) use Linux containers since 2011 in production. First LXC, now 
 libvirt-lxc.
 To understand the internals better I also wrote my own userspace to 
 create/start
 containers. There are so many things which can hurt you badly.
 With user namespaces we expose a really big attack surface to regular users.
 I.e. Suddenly a user is allowed to mount filesystems.

That is currently not the case.  They can mount some virtual filesystems
and do bind mounts, but cannot mount most real filesystems.  This keeps
us protected (for now) from potentially unsafe superblock readers in the
kernel.

 Ask Andy, he found already lots of nasty things...

Yes, of course, and there may be more to come...

 I agree that user namespaces are the way to go, all the papering with LSM
 over security issues is much worse.
 But we have to make sure that we don't add too much features too fast.

Agreed.  Like I said, 'we have to work (x) out first' could be valid,
including 'we should wait (a year?) for user ns issues to fall out
before relaxing any of the current user ns constraints. 

On the other hand, not exercising the new code may only mean that
existing flaws stick around longer, undetected (by most).

 That said, I like containers a lot because they are cheap but as they are 
 lightweight
 also therefore also isolation level is lightweight.
 IMHO containers are not a cheap replacement for KVM.

The building blocks for containers can also be used for entirely
new, simpler use cases - i.e. perhaps a new fakeroot alternative based
on user namespace mappings.  Which is why this is not a use case for
containers is not the right way to push back, whether or not the
feature ends up being appropriate.

-serge
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


[lxc-devel] [PATCH] Refactoring lxc-autostart boot process and group handling.

2014-05-15 Thread Michael H. Warfield
Ok...

Here it is.  The roll up of Dwight's earlier patch, my previous 3
patches plus, now, the rework to lxc-autostart to support multiple
invocations of the -g option, allow ordinal inclusion of the NULL group,
and process containers first in order of the groups specified on the
command line and their membership in those groups, subject to all other
constraints (lxc.start.auto) and orderings (lxc.start.order).

I have also noticed that host shutdown time can be very exhorbatently
long.  I added parameters to set a shutdown time of -t 5 seconds to
speed that process up (may still not be fast enough).  Problem here is
that while startup returns quickly while a container initializes and we
need a delay, shutdown is largely serial and not parallelized at all.
We may need to address that in some way moving forward.

Here's the grand patch:

-- 
Refactoring lxc-autostart boot process and group handling.

This is a rollup of 4 earlier patches patching the systemd
init to use the sysvinit script, adding an onboot group to the
boot set, updating upstart to include the onboot group, and adding
documentation for the special boot groups.

This also adds new functionality to lxc-autostart.

*) The -g / --groups option is multiple cummulative entry.
This may be mixed freely with the previous comma separated
group list convention.  Groups are processed in the
order they first appear in the aggregated group list.

*) The NULL group may be specified in the group list using either a
leading comma, a trailing comma, or an embedded comma.

*) Booting proceeds in order of the groups specified on the command line
then ordered by lxc.start.org and name collalating sequence.

*) Default host bootup is now specified as -g onboot, meaning that first
the onboot group is booted and then any remaining enabled
containers in the NULL group are booted.

From the previous 4 individual patches:

Reported-by: CDR vene...@gmail.com
Signed-off-by: Dwight Engen dwight.en...@oracle.com

- reuse the sysvinit script to ensure that if the lxc is configured to use
  a bridge setup by libvirt, the bridge will be available before starting
  the container

- made the sysvinit script check for the existance of ifconfig, and fall
  back to ip link list if available

- made the lxc service also dependant on the network.target

- autoconfized the paths in the service file and sysvinit script

- v2: rename script lxc-autostart to lxc-autostart-helper to avoid confusion

From: Michael H. Warfield m...@wittsend.com

- This adds a non-null group (onboot) to the sysvinit startup script
for autobooting containers.  This allows for containers which are
in other groups to be included in the autoboot process.

This script is used by both the sysvinit systems and the systemd
systems.

From: Michael H. Warfield m...@wittsend.com

- Add the feature to the Upstart init script to boot the onboot
group dependent on the start.auto = 1 flag.  This brings the
the Upstart behavior into congruence with the sysvinit script
for SysV Init and Systemd.

From: Michael H. Warfield m...@wittsend.com

Added sections to lxc-autostart and lxc.container.config to document
the behavior of the LXC service at host system boot time with regards
to the lxc.group and lxc.start.auto parameters.

Signed-off-by: Michael H. Warfield m...@wittsend.com
---
 .gitignore |   3 +
 config/init/systemd/Makefile.am|  14 +-
 config/init/systemd/lxc.service|  17 --
 config/init/systemd/lxc.service.in |  17 ++
 config/init/sysvinit/lxc   |  66 
 config/init/sysvinit/lxc.in|  86 ++
 config/init/upstart/lxc.conf   |   8 +-
 configure.ac   |   2 +
 doc/lxc-autostart.sgml.in  |  30 
 doc/lxc.container.conf.sgml.in |  23 +++
 lxc.spec.in|   1 +
 src/lxc/lxc_autostart.c| 331 +++--
 12 files changed, 427 insertions(+), 171 deletions(-)
 delete mode 100644 config/init/systemd/lxc.service
 create mode 100644 config/init/systemd/lxc.service.in
 delete mode 100755 config/init/sysvinit/lxc
 create mode 100644 config/init/sysvinit/lxc.in

diff --git a/.gitignore b/.gitignore
index 8145f81..2b478cd 100644
--- a/.gitignore
+++ b/.gitignore
@@ -111,6 +111,9 @@ config/missing
 config/libtool.m4
 config/lt*.m4
 config/bash/lxc
+config/init/systemd/lxc-autostart-helper
+config/init/systemd/lxc.service
+config/init/sysvinit/lxc
 
 doc/*.1
 doc/*.5
diff --git a/config/init/systemd/Makefile.am b/config/init/systemd/Makefile.am
index de5ee50..fc374c5 100644
--- a/config/init/systemd/Makefile.am
+++ b/config/init/systemd/Makefile.am
@@ -5,7 +5,17 @@ EXTRA_DIST = \
 if INIT_SCRIPT_SYSTEMD
 SYSTEMD_UNIT_DIR = $(prefix)/lib/systemd/system
 
-install-systemd: lxc.service lxc-devsetup
+lxc-autostart-helper: ../sysvinit/lxc.in $(top_builddir)/config.status
+   $(AM_V_GEN)sed 

Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-15 Thread Greg Kroah-Hartman
On Thu, May 15, 2014 at 05:42:54PM +, Serge Hallyn wrote:
 What exactly defines 'normal use case for a container'?

Well, I'd say acting like a virtual machine is a good start :)

 Not too long ago much of what we can now do with network namespaces
 was not a normal container use case.  Neither you can't do it now
 nor I don't use it like that should be grounds for a pre-emptive
 nack.  It will horribly break security assumptions certainly would
 be.

I agree, and maybe we will get there over time, but this patch is nto
the way to do that.

 That's not to say there might not be good reasons why this in particular
 is not appropriate, but ISTM if things are going to be nacked without
 consideration of the patchset itself, we ought to be having a ksummit
 session to come to a consensus [ or receive a decree, presumably by you :)
 but after we have a chance to make our case ] on what things are going to
 be un/acceptable.

I already stood up and publically said this last year at Plumbers, why
is anything now different?

And this patchset is proof of why it's not a good idea.  You really
didn't do anything with all of the namespace stuff, except change loop.
That's the only thing that cares, so, just do it there, like I said to
do so, last August.

And you are ignoring the notifications to userspace and how namespaces
here would deal with that.

   Serge mentioned something to me about a loopdevfs (?) thing that someone
   else is working on.  That would seem to be a better solution in this
   particular case but I don't know much about it or where it's at.
  
  Ok, let's see those patches then.
 
 I think Seth has a git tree ready, but not sure which branch he'd want
 us to look at.
 
 Splitting a namespaced devtmpfs from loopdevfs discussion might be
 sensible.  However, in defense of a namespaced devtmpfs I'd say
 that for userspace to, at every container startup, bind-mount in
 devices from the global devtmpfs into a private tmpfs (for systemd's
 sake it can't just be on the container rootfs), seems like something
 worth avoiding.

I think having to pick and choose what device nodes you want in a
container is a good thing.  Becides, you would have to do the same thing
in the kernel anyway, what's wrong with userspace making the decision
here, especially as it knows exactly what it wants to do much more so
than the kernel ever can.

 PS - Apparently both parallels and Michael independently
 project devices which are hot-plugged on the host into containers.
 That also seems like something worth talking about (best practices,
 shortcomings, use cases not met by it, any ways tha the kernel can
 help out) at ksummit/linuxcon.

I was told that containers would never want devices hotplugged into
them.  What use case has this happening / needed?

thanks,

greg k-h
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-15 Thread Greg Kroah-Hartman
On Fri, May 16, 2014 at 01:49:59AM +, Serge Hallyn wrote:
  I think having to pick and choose what device nodes you want in a
  container is a good thing.  Becides, you would have to do the same thing
  in the kernel anyway, what's wrong with userspace making the decision
  here, especially as it knows exactly what it wants to do much more so
  than the kernel ever can.
 
 For 'real' devices that sounds sensible.  The thing about loop devices
 is that we simply want to allow a container to say give me a loop
 device to use and have it receive a unique loop device (or 3), without
 having to pre-assign them.  I think that would be cleaner to do using
 a pseudofs and loop-control device, rather than having to have a
 daemon in userspace on the host farming those out in response to
 some, I don't know, dbus request?

I agree that loop devices would be nice to have in a container, and that
the existing loop interface doesn't really lend itself to that.  So
create a new type of thing that acts like a loop device in a container.
But don't try to mess with the whole driver core just for a single type
of device.

greg k-h
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel