Following the suggestion in the systemd-nspawn manpage I populated a mini Fedora 19 chroot, on a Fedora 19 host
# yum -y --releasever=19 --nogpg --installroot=/srv/mycontainer \ --disablerepo='*' --enablerepo=fedora \ install systemd passwd yum fedora-release vim-minimal # chroot /srv/mycontainer passwd # systemd-nspawn -bD /srv/mycontainer Systemd boots up nicely & presents a login prompt, but it is impossible to actually login, PAM always denying the attempts. Debugging this, there seem to be two issues 1. pam_loginuid.so tries to write to /proc/self/loginuid but is denied by the kernel. My kernel has CONFIG_AUDIT_LOGINUID_IMMUTABLE=y which means once a loginuid is set (in this case from my ssh session into the host), it can't be changed (eg by the 'login' process inside the container). From the KConfig comment, this appears to have been a new feature built explicitly for systemd based hosts. The loginuid appears to be inherited across fork/exec so, AFAICT, the only way to avoid this is to spawn the container from something which does not already have a loginuid set, eg systemd itself or some other process not associated with a login session. Not being able to spawn containers from a login session on the host is kind of a PITA for development / debuging :-( Seems we need to find a way to have systemd-nspawn ensure that the 'init' process inside the container does not have a 'loginuid' set, even if the thing starting the container does. On the flipside, it seems this would violate the kernel security design for this feature ? If that were the case, then the pam_loginuid module might need to be made a no-op inside containers. 2. The audit_log_acct_message() method which is called by pretty much any PAM module returns EPERM There is no actual syscall returning EPERM here. The EPERM appears to be coming back inside the netlink reply message from the kernel audit subsystem. Since pretty much every PAM module sends audit messages, this causes them all to return fatal errors, failing the login attempt The _pam_audit_writelog() method does have code to ignore EPERM, but it only does so if 'getuid() != 0'. The container login process has uid == 0, so EPERM is treated as fatal. The "easy" (but not neccessarily correct) fix is to change diff -rup Linux-PAM-1.1.6.orig/libpam/pam_audit.c Linux-PAM-1.1.6.new/libpam/pam_audit.c --- Linux-PAM-1.1.6.orig/libpam/pam_audit.c 2012-08-15 12:08:43.000000000 +0100 +++ Linux-PAM-1.1.6.new/libpam/pam_audit.c 2013-05-09 10:17:48.679403471 +0100 @@ -46,7 +46,7 @@ _pam_audit_writelog(pam_handle_t *pamh, pamh->audit_state |= PAMAUDIT_LOGGED; if (rc < 0) { - if (rc == -EPERM && getuid() != 0) + if (rc == -EPERM) return 0; if (errno != old_errno) { old_errno = errno; but I'd rather like to understand why the kernel audit netlink layer is replying with EPERM in the first place. The container has CAP_AUDIT_WRITE capability. Instead of removing the 'getuid() != 0' check, another option would be to augment it to also check /proc/1/environ for any 'container' env variable. If I remove the pam_loginuid module and also apply that above audit patch to PAM, then I can successfuly login to a container launched by systemd-nspawn. It would obviously be preferrable to figure out what needs to be done to make this work out of the box though. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel