bug#41575: [bug#43371] [PATCH] doc: prevent host/container nscd mismatch
Sure, I'm happy to take a stab at this. Jason On Mon, Sep 14, 2020 at 3:28 AM Ludovic Courtès wrote: > In that case, we can have ‘containerized-operating-system’ provide its > own NSS configuration with a reduced cache size (or without cache since > there’s caching happening on the host for host name lookups, for > instance). > > WDYT? Would you like to give it a try? > > Thanks, > Ludo’. > > > >
bug#41575: [bug#43371] [PATCH] doc: prevent host/container nscd mismatch
Hello Ludo', A separate nscd per container also seems like a reasonable option. However, for the sake of machines hosting many long-lived containers, perhaps we should consider reducing the cache size: currently it's 32MB for each name service type, with an expiration of 12-24 hours: https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?id=1042d269a723360a02b19a2baafef1e24a3bfc73#n1115 Cheers, Jason On Sun, Sep 13, 2020 at 5:05 PM Ludovic Courtès wrote: > Hi, > > e...@beaver-labs.com skribis: > > > doc/guix.texi: (Name Service Switch) add a workaround for bug #41575 > > --- > > doc/guix.texi | 16 +++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/doc/guix.texi b/doc/guix.texi > > index a6e14ea177..a9472e680e 100644 > > --- a/doc/guix.texi > > +++ b/doc/guix.texi > > @@ -1706,6 +1706,20 @@ this binary incompatibility problem because those > @code{libnss_*.so} > > files are loaded in the @command{nscd} process, not in applications > > themselves. > > > > +For applications running in containers (@pxref{Invokin guix container}), > > +however, @code{nscd} may leak information from the host to the > container. > > +If there is a configuration mismatch between the two ---e.g., the host > > +has no @code{sshd} user while the container needs one--- then it may be > > I find the example is hard to understand. How about: “applications in > the container could end up looking users in the host”? > > > +worthwhile to limit which kind of information the host's @code{nscd} > > +daemon may give to the container by adding the following to > > +@code{/etc/nscd.conf}. > > + > > +@example > > +enable-cachepasswd no > > +enable-cachegroup no > > +enable-cachenetgroupno > > +@end example > > Actually, perhaps the better fix is to never use the host’s nscd? We > could change ‘containerized-operating-system’ accordingly. > > That would allow guest OSes to work correctly regardless of the host’s > nscd config, which seems like an improvement. > > Thoughts? > > Ludo’. >
bug#41575: Container with openssh-service requires sshd user on the host
My pleasure, Edouard. Thanks for the doc update! Jason On Sun, Sep 13, 2020 at 6:39 AM wrote: > Thank you for this thourough investigation and for finding the > workaround ! > > I just submitted a patch to the doc based on your email. > > Cheers, > > Edouard. > conjaroy writes: > > > In an eariler bug comment [1] I corroborated that nscd was leaking > > /etc/passwd information from the host OS into the Guix container, and I > > wondered aloud why the container would use the host OS's nscd if there > was > > a risk of this happening. > > > > I've looked into how Guix configures its own nscd, and it turns out that > by > > default it enables lookups only for `hosts` and `services` - not for > > `passwd`, `group`, or `netgroup`. Presumably, then, this configuration is > > sufficient for nscd to prevent the glibc compatibility issues described > in > > the manual [3]. > > > > After adding the following 3 lines in nscd.conf on my foreign distro > > (Debian 10) and restarting nscd, my Guix system containers were able to > > boot successfully while talking to the daemon: > > > > enable-cachepasswd no > > enable-cachegroup no > > enable-cachenetgroupno > > > > So I think the bug here is that the Guix manual page advising the use of > > nscd on a foreign distro [3] doesn't elaborate on which types of service > > lookups are safe to enable in the daemon. If Guix is used only to build > and > > run binaries then perhaps it could use nscd for all lookups, but this is > > evidently not the case for Guix system containers. > > > > > > Cheers, > > > > Jason > > > > > > [1] https://www.mail-archive.com/bug-guix@gnu.org/msg19915.html > > [2] > > > https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?h=version-1.1.0#n1238 > > [3] https://guix.gnu.org/manual/en/html_node/Application-Setup.html > > > > On Mon, Aug 24, 2020 at 11:15 PM conjaroy wrote: > > > >> I've observed this error under similar circumstances: launching a guix > >> system container script with network sharing enabled, on a foreign disto > >> (Debian 10) with nscd running. > >> > >> Using `strace -f /gnu/store/...-run-container`, we can observe the > >> container's lookup of user accounts via the foreign distro's nscd > socket: > >> > >> [pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) > = 11 > >> [pid 16582] connect(11, {sa_family=AF_UNIX, > >> sun_path="/var/run/nscd/socket"}, 110) = 0 > >> [pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21, > >> MSG_NOSIGNAL, NULL, 0) = 21 > >> [pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 > >> ([{fd=11, revents=POLLIN}]) > >> [pid 16582] read(11, > >> > "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"..., > >> 36) = 36 > >> [pid 16582] close(11) = 0 > >> > >> Since the user ("postgres") is indeed missing in the foreign disto, the > >> lookup fails. In this case, disabling nscd on the foreign distro allowed > >> the container script to run without error. > >> > >> Based on comments in https://issues.guix.info/issue/28128, I see that > it > >> was a deliberate choice to bind-mount the foreign distro's nscd socket > >> inside the container (instead of starting a separate containerized nscd > >> instance). But I'm having trouble seeing why it's acceptable to leak > state > >> from the foreign distro's user space into the container. Is there > something > >> I'm missing? > >> > >> Cheers, > >> > >> Jason > >> > >
bug#41575: Container with openssh-service requires sshd user on the host
In an eariler bug comment [1] I corroborated that nscd was leaking /etc/passwd information from the host OS into the Guix container, and I wondered aloud why the container would use the host OS's nscd if there was a risk of this happening. I've looked into how Guix configures its own nscd, and it turns out that by default it enables lookups only for `hosts` and `services` - not for `passwd`, `group`, or `netgroup`. Presumably, then, this configuration is sufficient for nscd to prevent the glibc compatibility issues described in the manual [3]. After adding the following 3 lines in nscd.conf on my foreign distro (Debian 10) and restarting nscd, my Guix system containers were able to boot successfully while talking to the daemon: enable-cachepasswd no enable-cachegroup no enable-cachenetgroupno So I think the bug here is that the Guix manual page advising the use of nscd on a foreign distro [3] doesn't elaborate on which types of service lookups are safe to enable in the daemon. If Guix is used only to build and run binaries then perhaps it could use nscd for all lookups, but this is evidently not the case for Guix system containers. Cheers, Jason [1] https://www.mail-archive.com/bug-guix@gnu.org/msg19915.html [2] https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?h=version-1.1.0#n1238 [3] https://guix.gnu.org/manual/en/html_node/Application-Setup.html On Mon, Aug 24, 2020 at 11:15 PM conjaroy wrote: > I've observed this error under similar circumstances: launching a guix > system container script with network sharing enabled, on a foreign disto > (Debian 10) with nscd running. > > Using `strace -f /gnu/store/...-run-container`, we can observe the > container's lookup of user accounts via the foreign distro's nscd socket: > > [pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11 > [pid 16582] connect(11, {sa_family=AF_UNIX, > sun_path="/var/run/nscd/socket"}, 110) = 0 > [pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21, > MSG_NOSIGNAL, NULL, 0) = 21 > [pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 > ([{fd=11, revents=POLLIN}]) > [pid 16582] read(11, > "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"..., > 36) = 36 > [pid 16582] close(11) = 0 > > Since the user ("postgres") is indeed missing in the foreign disto, the > lookup fails. In this case, disabling nscd on the foreign distro allowed > the container script to run without error. > > Based on comments in https://issues.guix.info/issue/28128, I see that it > was a deliberate choice to bind-mount the foreign distro's nscd socket > inside the container (instead of starting a separate containerized nscd > instance). But I'm having trouble seeing why it's acceptable to leak state > from the foreign distro's user space into the container. Is there something > I'm missing? > > Cheers, > > Jason >
bug#43039: Vanilla GUIX 1.1.0 reconfigure fails on nss-certs
An older bug (https://issues.guix.info/issue/37662) discusses a similar issue, but for a foreign distro with Guix installed (not a native Guix distribution). That bug mentions two things: - make sure that either ‘glibc-utf8-locales’ or ‘glibc-locales’ is installed (as root) - make sure that the GUIX daemon is configured to use a UTF locale so it can handle the UTF-encoded filenames in the nss-certs package. I'm not sure whether these issues apply to a native Guix distribution. What I do know is that when I encountered the error myself (running Guix on Debian 10) I needed one additional thing: the environment of the user installing the package had to include a UTF locale. After switching this environment from LANG=C to LANG=en_US.utf8, the package installed without issue.
bug#41575: Container with openssh-service requires sshd user on the host
I've observed this error under similar circumstances: launching a guix system container script with network sharing enabled, on a foreign disto (Debian 10) with nscd running. Using `strace -f /gnu/store/...-run-container`, we can observe the container's lookup of user accounts via the foreign distro's nscd socket: [pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11 [pid 16582] connect(11, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0 [pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21, MSG_NOSIGNAL, NULL, 0) = 21 [pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=11, revents=POLLIN}]) [pid 16582] read(11, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"..., 36) = 36 [pid 16582] close(11) = 0 Since the user ("postgres") is indeed missing in the foreign disto, the lookup fails. In this case, disabling nscd on the foreign distro allowed the container script to run without error. Based on comments in https://issues.guix.info/issue/28128, I see that it was a deliberate choice to bind-mount the foreign distro's nscd socket inside the container (instead of starting a separate containerized nscd instance). But I'm having trouble seeing why it's acceptable to leak state from the foreign distro's user space into the container. Is there something I'm missing? Cheers, Jason
bug#30939: shepherd: detailed output should be placed into well-known location and not tty
Hello - I too have found that debugging is a challenge when a service's stdout/stderr aren't captured automatically. From my point of view though, the issue is not just that certain binaries lack syslog support: since a service implementation's gexp can do much more than just exec a binary, and since mistakes in gexps usually go unnoticed until a runtime, I've found it easy to write scripts that trigger fatal Guile errors before the service binary is even started (syntax errors, missing `use-modules` declarations, etc.) Will the solution proposed here capture output for this class of errors as well?