Re: [lxc-devel] read-only container root
Daniel Lezcano wrote: Michael Tokarev wrote: lxc-start: No such file or directory - failed to mount a new instance of '/dev/pts' I'm experimenting with a read-only root fs in the container. So far it does not work. First of all, when trying to start a container in a read-only root lxc-start complains: lxc-start: Read-only file system - can't make temporary mountpoint This is in conf.c:setup_rootfs_pivot_root() function. That function uses optional parameter lxc.pivotdir, or creates (and later removes) a temporary directory for pivot_root. Obviously there's no way to create a directory in a read-only filesystem. Why do you need to use a read-only root fs ? There's no _need_, but it's an extension on a principle of least privilege, and also helps keeping things in a more accurate way and also guarantees that no bad things will happen with the system in case of any unexpected power failure and things like that (in that case, say, /var might be badly damaged still, but the system will actually boot to the point where some repairment tools are available). But lxc.pivotdir does not work either. In the function mentioned above it is used with leading dot (eg. if I specify lxc.pivotdir=pivot in the config file the pivot_root() syscall will be made to .pivot with leading dot, not to pivot), but later on it is used without that dot, and fails... [] It's a bug introduced with the pivot_root feature. Investigation on the way. I tried to debug it too, but realized that the last git repo I have locally is from 22th Jan, which is almost a month from now, and I've seen quite some changes mentioned on the list. So it is either that the changes hasn't been comitted, or the git repository has been moved somewhere else. It actually was my 3rd email I planned to write, asking what's up with the git repo... ;) [] Ok, so your need is to call a script between: lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755 ... lxc.tty = 4 where the script will populate /dev, right ? mmh, not obvious. Or maybe just call it _instead_ of specifying all the above (lxc.mount.entry and lxc.tty), leaving only things such as network device setup (which can't easily be done from shell) to lxc-start. [] What about the lxc.script configuration line which calls a script at the point it is in the configuration file ? That's not possible. The configuration is an _unordered_ set of key=value pairs. lxc-start calls different functions now at pre-defined (programmatically) order, regardless of the order in which the config file is written. The specified script (lxc.script) should also be called at some (random) pre-determined point in the container setup procedure. In that case the script can _replace_ some things from the config file if they're at wrong order or are staying in the way. But it's still not obvious where's that random place is: for example, is it before lxc-start (implicitly!) mounts /dev/pts or after? For my example the script should run before /dev/pts is mounted, but maybe someone will want to run some other program that uses pseudo-terminals, which obviously should be done after /dev/pts is mounted (granted I can't think of such a situation/program for now). The whole mess started when I realized that bind-mounting host's /dev works perfectly _except_ the syslogging, -- /dev/log does not work with multiple containers, only the container where syslogd (re)started last works, all the rest gives ECONNREFUSED when trying to send any message to /dev/log. /dev/log is an af_unix socket, the network is isolated, the af_unix belongs to the network namespace. It's probable /dev/log is unlinked, created again and binded by syslogd. So as /dev/ is shared between the containers, the last one get the socket. Any process outside of the container trying to access this socket won't be able. That's what I figured, and it's quite obvious thing to do really. Actually it might be a good idea to not start syslogd in containers and inherit real /dev from host, -- this way all logging will be automatically sent to central syslog (hopefully :). But that works up until the host syslogd will be restarted, and at this point we're back at ECONNREFUSED. Note another my email about mounting new filesystems within containers. In this context, like, after restarting syslogd on the host, is it possible to bind-mount host's /dev/log to container's /dev/log (provided they were bind-mointed before)? Thanks! /mjt -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel
[lxc-devel] read-only container root
lxc-start: No such file or directory - failed to mount a new instance of '/dev/pts' I'm experimenting with a read-only root fs in the container. So far it does not work. First of all, when trying to start a container in a read-only root lxc-start complains: lxc-start: Read-only file system - can't make temporary mountpoint This is in conf.c:setup_rootfs_pivot_root() function. That function uses optional parameter lxc.pivotdir, or creates (and later removes) a temporary directory for pivot_root. Obviously there's no way to create a directory in a read-only filesystem. But lxc.pivotdir does not work either. In the function mentioned above it is used with leading dot (eg. if I specify lxc.pivotdir=pivot in the config file the pivot_root() syscall will be made to .pivot with leading dot, not to pivot), but later on it is used without that dot, and fails: lxc-start: No such file or directory - failed to open /pivot/proc/mounts lxc-start: No such file or directory - failed to read or parse mount list '/pivot/proc/mounts' lxc-start: failed to pivot_root to '/stage/t' (that's with lxc.pivotdir = pivot in the config file). After symlinking pivot to .pivot it still fails: lxc-start: Device or resource busy - could not unmount old rootfs lxc-start: failed to pivot_root to '/stage/t' Ok, so far so good. Next thing is the /dev directory. I prefer to have it in a tmpfs, because of several reasons (one is that the root is mounted with -o nodev), but that fails too unless the directory is pre-populated: lxc-start: No such file or directory - failed to mount a new instance of '/dev/pts' lxc-start: failed to setup the new pts instance That's when specifying: lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755 in the config file. That creates an empty directory for container's /dev, which is populated later in the startup script. Similar thing happens when I pre-create dev/pts - it fails to bind-mount tty1..tty4. So far it works by using a wrapper around lxc-start which mounts tmpfs over dev, fills it with a bunch of standard entries, and executes lxc-start. But this is really getting quite ugly. And the only solution to all this mess is to let to perform the setup from a shell script/command which is called after forking the (filesystem) namespace but before entering the container for real, or _instead_ of entering the container. As was discussed previously. The whole mess started when I realized that bind-mounting host's /dev works perfectly _except_ the syslogging, -- /dev/log does not work with multiple containers, only the container where syslogd (re)started last works, all the rest gives ECONNREFUSED when trying to send any message to /dev/log. Comments? Thanks! /mjt -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel
Re: [lxc-devel] read-only container root
Michael Tokarev wrote: lxc-start: No such file or directory - failed to mount a new instance of '/dev/pts' I'm experimenting with a read-only root fs in the container. So far it does not work. First of all, when trying to start a container in a read-only root lxc-start complains: lxc-start: Read-only file system - can't make temporary mountpoint This is in conf.c:setup_rootfs_pivot_root() function. That function uses optional parameter lxc.pivotdir, or creates (and later removes) a temporary directory for pivot_root. Obviously there's no way to create a directory in a read-only filesystem. Why do you need to use a read-only root fs ? But lxc.pivotdir does not work either. In the function mentioned above it is used with leading dot (eg. if I specify lxc.pivotdir=pivot in the config file the pivot_root() syscall will be made to .pivot with leading dot, not to pivot), but later on it is used without that dot, and fails: lxc-start: No such file or directory - failed to open /pivot/proc/mounts lxc-start: No such file or directory - failed to read or parse mount list '/pivot/proc/mounts' lxc-start: failed to pivot_root to '/stage/t' (that's with lxc.pivotdir = pivot in the config file). After symlinking pivot to .pivot it still fails: lxc-start: Device or resource busy - could not unmount old rootfs lxc-start: failed to pivot_root to '/stage/t' It's a bug introduced with the pivot_root feature. Investigation on the way. Ok, so far so good. Next thing is the /dev directory. I prefer to have it in a tmpfs, because of several reasons (one is that the root is mounted with -o nodev), but that fails too unless the directory is pre-populated: lxc-start: No such file or directory - failed to mount a new instance of '/dev/pts' lxc-start: failed to setup the new pts instance That's when specifying: lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755 in the config file. That creates an empty directory for container's /dev, which is populated later in the startup script. Similar thing happens when I pre-create dev/pts - it fails to bind-mount tty1..tty4. Ok, so your need is to call a script between: lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755 ... lxc.tty = 4 where the script will populate /dev, right ? mmh, not obvious. So far it works by using a wrapper around lxc-start which mounts tmpfs over dev, fills it with a bunch of standard entries, and executes lxc-start. But this is really getting quite ugly. And the only solution to all this mess is to let to perform the setup from a shell script/command which is called after forking the (filesystem) namespace but before entering the container for real, or _instead_ of entering the container. As was discussed previously. What about the lxc.script configuration line which calls a script at the point it is in the configuration file ? The whole mess started when I realized that bind-mounting host's /dev works perfectly _except_ the syslogging, -- /dev/log does not work with multiple containers, only the container where syslogd (re)started last works, all the rest gives ECONNREFUSED when trying to send any message to /dev/log. /dev/log is an af_unix socket, the network is isolated, the af_unix belongs to the network namespace. It's probable /dev/log is unlinked, created again and binded by syslogd. So as /dev/ is shared between the containers, the last one get the socket. Any process outside of the container trying to access this socket won't be able. -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel