Re: [lxc-devel] read-only container root

2010-02-16 Thread Michael Tokarev
Daniel Lezcano wrote:
 Michael Tokarev wrote:
 lxc-start: No such file or directory - failed to mount a new instance 
 of '/dev/pts'
 I'm experimenting with a read-only root fs in the container.
 So far it does not work.

 First of all, when trying to start a container in a read-only root
 lxc-start complains:
   lxc-start: Read-only file system - can't make temporary mountpoint

 This is in conf.c:setup_rootfs_pivot_root() function.  That function
 uses optional parameter lxc.pivotdir, or creates (and later removes)
 a temporary directory for pivot_root.  Obviously there's no way to
 create a directory in a read-only filesystem.

 Why do you need to use a read-only root fs ?

There's no _need_, but it's an extension on a principle of least
privilege, and also helps keeping things in a more accurate way
and also guarantees that no bad things will happen with the system
in case of any unexpected power failure and things like that (in
that case, say, /var might be badly damaged still, but the system
will actually boot to the point where some repairment tools are
available).

 But lxc.pivotdir does not work either. In the function mentioned above
 it is used with leading dot (eg. if I specify lxc.pivotdir=pivot in
 the config file the pivot_root() syscall will be made to .pivot with
 leading dot, not to pivot), but later on it is used without that dot,
 and fails...
[]
 It's a bug introduced with the pivot_root feature. Investigation on the 
 way.

I tried to debug it too, but realized that the last git repo I have
locally is from 22th Jan, which is almost a month from now, and I've
seen quite some changes mentioned on the list.  So it is either that
the changes hasn't been comitted, or the git repository has been
moved somewhere else.  It actually was my 3rd email I planned to
write, asking what's up with the git repo... ;)

[]
 Ok, so your need is to call a script between:
 
 lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755
 
 ...
 lxc.tty = 4
 
 where the script will populate /dev, right ?
 
 mmh, not obvious.

Or maybe just call it _instead_ of specifying all the
above (lxc.mount.entry and lxc.tty), leaving only things
such as network device setup (which can't easily be done
from shell) to lxc-start.

[]
 What about the lxc.script configuration line which calls a script at the 
 point it is in the configuration file ?

That's not possible.  The configuration is an _unordered_ set
of key=value pairs.  lxc-start calls different functions now
at pre-defined (programmatically) order, regardless of the
order in which the config file is written.

The specified script (lxc.script) should also be called at some
(random) pre-determined point in the container setup procedure.
In that case the script can _replace_ some things from the
config file if they're at wrong order or are staying in
the way.  But it's still not obvious where's that random
place is: for example, is it before lxc-start (implicitly!)
mounts /dev/pts or after?  For my example the script should
run before /dev/pts is mounted, but maybe someone will want
to run some other program that uses pseudo-terminals, which
obviously should be done after /dev/pts is mounted (granted
I can't think of such a situation/program for now).

 The whole mess started when I realized that bind-mounting host's /dev
 works perfectly _except_ the syslogging, -- /dev/log does not work with
 multiple containers, only the container where syslogd (re)started last
 works, all the rest gives ECONNREFUSED when trying to send any message
 to /dev/log.
   
 /dev/log is an af_unix socket, the network is isolated, the af_unix 
 belongs to the network namespace.
 It's probable /dev/log is unlinked, created again and binded by syslogd. 
 So as /dev/ is shared between the containers, the last one get the socket.
 Any process outside of the container trying to access this socket won't 
 be able.

That's what I figured, and it's quite obvious thing to do really.

Actually it might be a good idea to not start syslogd in containers
and inherit real /dev from host, -- this way all logging will be
automatically sent to central syslog (hopefully :).  But that works
up until the host syslogd will be restarted, and at this point we're
back at ECONNREFUSED.

Note another my email about mounting new filesystems within containers.
In this context, like, after restarting syslogd on the host, is it possible
to bind-mount host's /dev/log to container's /dev/log (provided they were
bind-mointed before)?

Thanks!

/mjt

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] read-only container root

2010-02-15 Thread Michael Tokarev
lxc-start: No such file or directory - failed to mount a new instance of 
'/dev/pts'
I'm experimenting with a read-only root fs in the container.
So far it does not work.

First of all, when trying to start a container in a read-only root
lxc-start complains:
  lxc-start: Read-only file system - can't make temporary mountpoint

This is in conf.c:setup_rootfs_pivot_root() function.  That function
uses optional parameter lxc.pivotdir, or creates (and later removes)
a temporary directory for pivot_root.  Obviously there's no way to
create a directory in a read-only filesystem.

But lxc.pivotdir does not work either. In the function mentioned above
it is used with leading dot (eg. if I specify lxc.pivotdir=pivot in
the config file the pivot_root() syscall will be made to .pivot with
leading dot, not to pivot), but later on it is used without that dot,
and fails:

  lxc-start: No such file or directory - failed to open /pivot/proc/mounts
  lxc-start: No such file or directory - failed to read or parse mount list 
'/pivot/proc/mounts'
  lxc-start: failed to pivot_root to '/stage/t'

(that's with lxc.pivotdir = pivot in the config file).  After symlinking
pivot to .pivot it still fails:

  lxc-start: Device or resource busy - could not unmount old rootfs
  lxc-start: failed to pivot_root to '/stage/t'


Ok, so far so good.

Next thing is the /dev directory.  I prefer to have it in a tmpfs, because
of several reasons (one is that the root is mounted with -o nodev), but that
fails too unless the directory is pre-populated:

  lxc-start: No such file or directory - failed to mount a new instance of 
'/dev/pts'
  lxc-start: failed to setup the new pts instance

That's when specifying:

   lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755

in the config file.  That creates an empty directory for container's /dev,
which is populated later in the startup script.

Similar thing happens when I pre-create dev/pts - it fails to bind-mount
tty1..tty4.

So far it works by using a wrapper around lxc-start which mounts tmpfs
over dev, fills it with a bunch of standard entries, and executes lxc-start.

But this is really getting quite ugly.  And the only solution to all this
mess is to let to perform the setup from a shell script/command which is
called after forking the (filesystem) namespace but before entering the
container for real, or _instead_ of entering the container.  As was
discussed previously.

The whole mess started when I realized that bind-mounting host's /dev
works perfectly _except_ the syslogging, -- /dev/log does not work with
multiple containers, only the container where syslogd (re)started last
works, all the rest gives ECONNREFUSED when trying to send any message
to /dev/log.

Comments?

Thanks!

/mjt

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] read-only container root

2010-02-15 Thread Daniel Lezcano
Michael Tokarev wrote:
 lxc-start: No such file or directory - failed to mount a new instance of 
 '/dev/pts'
 I'm experimenting with a read-only root fs in the container.
 So far it does not work.

 First of all, when trying to start a container in a read-only root
 lxc-start complains:
   lxc-start: Read-only file system - can't make temporary mountpoint

 This is in conf.c:setup_rootfs_pivot_root() function.  That function
 uses optional parameter lxc.pivotdir, or creates (and later removes)
 a temporary directory for pivot_root.  Obviously there's no way to
 create a directory in a read-only filesystem.
   
Why do you need to use a read-only root fs ?

 But lxc.pivotdir does not work either. In the function mentioned above
 it is used with leading dot (eg. if I specify lxc.pivotdir=pivot in
 the config file the pivot_root() syscall will be made to .pivot with
 leading dot, not to pivot), but later on it is used without that dot,
 and fails:

   lxc-start: No such file or directory - failed to open /pivot/proc/mounts
   lxc-start: No such file or directory - failed to read or parse mount list 
 '/pivot/proc/mounts'
   lxc-start: failed to pivot_root to '/stage/t'

 (that's with lxc.pivotdir = pivot in the config file).  After symlinking
 pivot to .pivot it still fails:

   lxc-start: Device or resource busy - could not unmount old rootfs
   lxc-start: failed to pivot_root to '/stage/t'
   
It's a bug introduced with the pivot_root feature. Investigation on the way.

 Ok, so far so good.

 Next thing is the /dev directory.  I prefer to have it in a tmpfs, because
 of several reasons (one is that the root is mounted with -o nodev), but that
 fails too unless the directory is pre-populated:

   lxc-start: No such file or directory - failed to mount a new instance of 
 '/dev/pts'
   lxc-start: failed to setup the new pts instance

 That's when specifying:

lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755

 in the config file.  That creates an empty directory for container's /dev,
 which is populated later in the startup script.

 Similar thing happens when I pre-create dev/pts - it fails to bind-mount
 tty1..tty4.
   
Ok, so your need is to call a script between:

lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755

...
lxc.tty = 4

where the script will populate /dev, right ?

mmh, not obvious.

 So far it works by using a wrapper around lxc-start which mounts tmpfs
 over dev, fills it with a bunch of standard entries, and executes lxc-start.

 But this is really getting quite ugly.  And the only solution to all this
 mess is to let to perform the setup from a shell script/command which is
 called after forking the (filesystem) namespace but before entering the
 container for real, or _instead_ of entering the container.  As was
 discussed previously.
   

What about the lxc.script configuration line which calls a script at the 
point it is in the configuration file ?

 The whole mess started when I realized that bind-mounting host's /dev
 works perfectly _except_ the syslogging, -- /dev/log does not work with
 multiple containers, only the container where syslogd (re)started last
 works, all the rest gives ECONNREFUSED when trying to send any message
 to /dev/log.
   
 /dev/log is an af_unix socket, the network is isolated, the af_unix 
belongs to the network namespace.
It's probable /dev/log is unlinked, created again and binded by syslogd. 
So as /dev/ is shared between the containers, the last one get the socket.
Any process outside of the container trying to access this socket won't 
be able.



--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel