bug#41575: [bug#43371] [PATCH] doc: prevent host/container nscd mismatch

2020-09-14 Thread conjaroy
Sure, I'm happy to take a stab at this.

Jason

On Mon, Sep 14, 2020 at 3:28 AM Ludovic Courtès  wrote:

> In that case, we can have ‘containerized-operating-system’ provide its
> own NSS configuration with a reduced cache size (or without cache since
> there’s caching happening on the host for host name lookups, for
> instance).
>
> WDYT?  Would you like to give it a try?
>
> Thanks,
> Ludo’.
>
>
>
>


bug#41575: [bug#43371] [PATCH] doc: prevent host/container nscd mismatch

2020-09-13 Thread conjaroy
Hello Ludo',

A separate nscd per container also seems like a reasonable option. However,
for the sake of machines hosting many long-lived containers, perhaps we
should consider reducing the cache size: currently it's 32MB for each name
service type, with an expiration of 12-24 hours:

https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?id=1042d269a723360a02b19a2baafef1e24a3bfc73#n1115

Cheers,

Jason

On Sun, Sep 13, 2020 at 5:05 PM Ludovic Courtès  wrote:

> Hi,
>
> e...@beaver-labs.com skribis:
>
> > doc/guix.texi: (Name Service Switch) add a workaround for bug #41575
> > ---
> >  doc/guix.texi | 16 +++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guix.texi b/doc/guix.texi
> > index a6e14ea177..a9472e680e 100644
> > --- a/doc/guix.texi
> > +++ b/doc/guix.texi
> > @@ -1706,6 +1706,20 @@ this binary incompatibility problem because those
> @code{libnss_*.so}
> >  files are loaded in the @command{nscd} process, not in applications
> >  themselves.
> >
> > +For applications running in containers (@pxref{Invokin guix container}),
> > +however, @code{nscd} may leak information from the host to the
> container.
> > +If there is a configuration mismatch between the two ---e.g., the host
> > +has no @code{sshd} user while the container needs one--- then it may be
>
> I find the example is hard to understand.  How about: “applications in
> the container could end up looking users in the host”?
>
> > +worthwhile to limit which kind of information the host's @code{nscd}
> > +daemon may give to the container by adding the following to
> > +@code{/etc/nscd.conf}.
> > +
> > +@example
> > +enable-cachepasswd  no
> > +enable-cachegroup   no
> > +enable-cachenetgroupno
> > +@end example
>
> Actually, perhaps the better fix is to never use the host’s nscd?  We
> could change ‘containerized-operating-system’ accordingly.
>
> That would allow guest OSes to work correctly regardless of the host’s
> nscd config, which seems like an improvement.
>
> Thoughts?
>
> Ludo’.
>


bug#41575: Container with openssh-service requires sshd user on the host

2020-09-13 Thread conjaroy
My pleasure, Edouard. Thanks for the doc update!

Jason

On Sun, Sep 13, 2020 at 6:39 AM  wrote:

> Thank you for this thourough investigation and for finding the
> workaround !
>
> I just submitted a patch to the doc based on your email.
>
> Cheers,
>
> Edouard.
> conjaroy writes:
>
> > In an eariler bug comment [1] I corroborated that nscd was leaking
> > /etc/passwd information from the host OS into the Guix container, and I
> > wondered aloud why the container would use the host OS's nscd if there
> was
> > a risk of this happening.
> >
> > I've looked into how Guix configures its own nscd, and it turns out that
> by
> > default it enables lookups only for `hosts` and `services` - not for
> > `passwd`, `group`, or `netgroup`. Presumably, then, this configuration is
> > sufficient for nscd to prevent the glibc compatibility issues described
> in
> > the manual [3].
> >
> > After adding the following 3 lines in nscd.conf on my foreign distro
> > (Debian 10) and restarting nscd, my Guix system containers were able to
> > boot successfully while talking to the daemon:
> >
> > enable-cachepasswd  no
> > enable-cachegroup   no
> > enable-cachenetgroupno
> >
> > So I think the bug here is that the Guix manual page advising the use of
> > nscd on a foreign distro [3] doesn't elaborate on which types of service
> > lookups are safe to enable in the daemon. If Guix is used only to build
> and
> > run binaries then perhaps it could use nscd for all lookups, but this is
> > evidently not the case for Guix system containers.
> >
> >
> > Cheers,
> >
> > Jason
> >
> >
> > [1] https://www.mail-archive.com/bug-guix@gnu.org/msg19915.html
> > [2]
> >
> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?h=version-1.1.0#n1238
> > [3] https://guix.gnu.org/manual/en/html_node/Application-Setup.html
> >
> > On Mon, Aug 24, 2020 at 11:15 PM conjaroy  wrote:
> >
> >> I've observed this error under similar circumstances: launching a guix
> >> system container script with network sharing enabled, on a foreign disto
> >> (Debian 10) with nscd running.
> >>
> >> Using `strace -f /gnu/store/...-run-container`, we can observe the
> >> container's lookup of user accounts via the foreign distro's nscd
> socket:
> >>
> >> [pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0)
> = 11
> >> [pid 16582] connect(11, {sa_family=AF_UNIX,
> >> sun_path="/var/run/nscd/socket"}, 110) = 0
> >> [pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21,
> >> MSG_NOSIGNAL, NULL, 0) = 21
> >> [pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1
> >> ([{fd=11, revents=POLLIN}])
> >> [pid 16582] read(11,
> >>
> "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"...,
> >> 36) = 36
> >> [pid 16582] close(11)   = 0
> >>
> >> Since the user ("postgres") is indeed missing in the foreign disto, the
> >> lookup fails. In this case, disabling nscd on the foreign distro allowed
> >> the container script to run without error.
> >>
> >> Based on comments in https://issues.guix.info/issue/28128, I see that
> it
> >> was a deliberate choice to bind-mount the foreign distro's nscd socket
> >> inside the container (instead of starting a separate containerized nscd
> >> instance). But I'm having trouble seeing why it's acceptable to leak
> state
> >> from the foreign distro's user space into the container. Is there
> something
> >> I'm missing?
> >>
> >> Cheers,
> >>
> >> Jason
> >>
>
>


bug#41575: Container with openssh-service requires sshd user on the host

2020-09-08 Thread conjaroy
In an eariler bug comment [1] I corroborated that nscd was leaking
/etc/passwd information from the host OS into the Guix container, and I
wondered aloud why the container would use the host OS's nscd if there was
a risk of this happening.

I've looked into how Guix configures its own nscd, and it turns out that by
default it enables lookups only for `hosts` and `services` - not for
`passwd`, `group`, or `netgroup`. Presumably, then, this configuration is
sufficient for nscd to prevent the glibc compatibility issues described in
the manual [3].

After adding the following 3 lines in nscd.conf on my foreign distro
(Debian 10) and restarting nscd, my Guix system containers were able to
boot successfully while talking to the daemon:

enable-cachepasswd  no
enable-cachegroup   no
enable-cachenetgroupno

So I think the bug here is that the Guix manual page advising the use of
nscd on a foreign distro [3] doesn't elaborate on which types of service
lookups are safe to enable in the daemon. If Guix is used only to build and
run binaries then perhaps it could use nscd for all lookups, but this is
evidently not the case for Guix system containers.


Cheers,

Jason


[1] https://www.mail-archive.com/bug-guix@gnu.org/msg19915.html
[2]
https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?h=version-1.1.0#n1238
[3] https://guix.gnu.org/manual/en/html_node/Application-Setup.html

On Mon, Aug 24, 2020 at 11:15 PM conjaroy  wrote:

> I've observed this error under similar circumstances: launching a guix
> system container script with network sharing enabled, on a foreign disto
> (Debian 10) with nscd running.
>
> Using `strace -f /gnu/store/...-run-container`, we can observe the
> container's lookup of user accounts via the foreign distro's nscd socket:
>
> [pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
> [pid 16582] connect(11, {sa_family=AF_UNIX,
> sun_path="/var/run/nscd/socket"}, 110) = 0
> [pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21,
> MSG_NOSIGNAL, NULL, 0) = 21
> [pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1
> ([{fd=11, revents=POLLIN}])
> [pid 16582] read(11,
> "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"...,
> 36) = 36
> [pid 16582] close(11)   = 0
>
> Since the user ("postgres") is indeed missing in the foreign disto, the
> lookup fails. In this case, disabling nscd on the foreign distro allowed
> the container script to run without error.
>
> Based on comments in https://issues.guix.info/issue/28128, I see that it
> was a deliberate choice to bind-mount the foreign distro's nscd socket
> inside the container (instead of starting a separate containerized nscd
> instance). But I'm having trouble seeing why it's acceptable to leak state
> from the foreign distro's user space into the container. Is there something
> I'm missing?
>
> Cheers,
>
> Jason
>


bug#43039: Vanilla GUIX 1.1.0 reconfigure fails on nss-certs

2020-08-26 Thread conjaroy
An older bug (https://issues.guix.info/issue/37662) discusses a similar
issue, but for a foreign distro with Guix installed (not a native Guix
distribution).

That bug mentions two things:

- make sure that either ‘glibc-utf8-locales’ or ‘glibc-locales’ is
installed (as root)
- make sure that the GUIX daemon is configured to use a UTF locale so it
can handle the UTF-encoded filenames in the nss-certs package.

I'm not sure whether these issues apply to a native Guix distribution. What
I do know is that when I encountered the error myself (running Guix on
Debian 10) I needed one additional thing: the environment of the user
installing the package had to include a UTF locale. After switching this
environment from LANG=C to LANG=en_US.utf8, the package installed without
issue.


bug#41575: Container with openssh-service requires sshd user on the host

2020-08-24 Thread conjaroy
I've observed this error under similar circumstances: launching a guix
system container script with network sharing enabled, on a foreign disto
(Debian 10) with nscd running.

Using `strace -f /gnu/store/...-run-container`, we can observe the
container's lookup of user accounts via the foreign distro's nscd socket:

[pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
[pid 16582] connect(11, {sa_family=AF_UNIX,
sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21,
MSG_NOSIGNAL, NULL, 0) = 21
[pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1
([{fd=11, revents=POLLIN}])
[pid 16582] read(11,
"\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"...,
36) = 36
[pid 16582] close(11)   = 0

Since the user ("postgres") is indeed missing in the foreign disto, the
lookup fails. In this case, disabling nscd on the foreign distro allowed
the container script to run without error.

Based on comments in https://issues.guix.info/issue/28128, I see that it
was a deliberate choice to bind-mount the foreign distro's nscd socket
inside the container (instead of starting a separate containerized nscd
instance). But I'm having trouble seeing why it's acceptable to leak state
from the foreign distro's user space into the container. Is there something
I'm missing?

Cheers,

Jason


bug#30939: shepherd: detailed output should be placed into well-known location and not tty

2020-07-18 Thread conjaroy
Hello -

I too have found that debugging is a challenge when a service's
stdout/stderr aren't captured automatically. From my point of view though,
the issue is not just that certain binaries lack syslog support: since a
service implementation's gexp can do much more than just exec a binary, and
since mistakes in gexps usually go unnoticed until a runtime, I've found it
easy to write scripts that trigger fatal Guile errors before the service
binary is even started (syntax errors, missing `use-modules` declarations,
etc.)

Will the solution proposed here capture output for this class of errors as
well?