Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]

2015-05-21 Thread Lennart Poettering
On Wed, 20.05.15 22:40, Martin Pitt (martin.p...@ubuntu.com) wrote:

 Hey Lennart,
 
 Lennart Poettering [2015-05-20 17:49 +0200]:
  Nope, ConditionSecurity=audit is only a simple boolean check that
  holds when audit is enabled at all. It doesn't tell you anything about
  the precise audit feature set of the kernel.
 
 Ah, thanks for the clarification.
 
  I have now conditionalized the unit on CAP_ADMIN_READ, which is the
  cap that you need to read the audit multicast stuff. You container
  manager hence should simply drop that cap fro, the cap set it passes
  and all should be good.
 
 Wonderful! Now it works perfectly in nspawn. (This needs to be fixed
 in unprivileged LXC containers, but that's not a systemd problem; I'll
 talk to LXC upstream about that).
 
 With these two fixes, should we now remove the scary warning in
 README? AFAICS there is no need to turn auditing off on the host any
 more.

As mentioned before: unless you turn auditing off in the kernel,
you cannot even log into any Fedora system running in a container
(unless you have the seccomp trick on and are on x86-64). The message
hence really should stay.

Note that Debian/Ubuntu are not as restrictive regarding audit as
Fedora is. In Fedora due to government craziness failing audit will
result in refused logins, and that's the issue here.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]

2015-05-21 Thread Serge Hallyn
Quoting Lennart Poettering (lenn...@poettering.net):
 On Wed, 20.05.15 22:40, Martin Pitt (martin.p...@ubuntu.com) wrote:
 
  Hey Lennart,
  
  Lennart Poettering [2015-05-20 17:49 +0200]:
   Nope, ConditionSecurity=audit is only a simple boolean check that
   holds when audit is enabled at all. It doesn't tell you anything about
   the precise audit feature set of the kernel.
  
  Ah, thanks for the clarification.
  
   I have now conditionalized the unit on CAP_ADMIN_READ, which is the
   cap that you need to read the audit multicast stuff. You container
   manager hence should simply drop that cap fro, the cap set it passes
   and all should be good.

I want to clarify this point.  Dropping CAP_ADMIN_READ from the bounding
set means dropping it from the capabilities targeted at your own user
namespace.  The only check in the kernel for CAP_ADMIN_READ currently is
against the initial user namespace.  One day of course (maybe soon) this
may change so that you only need CAP_ADMIN_READ against your own
user_ns.  Following the above, container managers could then again keep
CAP_ADMIN_READ in the bounding set.

But I'm claiming that checking for CAP_ADMIN_READ in your bounding set
is the wrong check here.  It simply has nothing to do with what you
actually want to be able to do.  One could argue that the right answer
is a new kernel facility to check for caps against init_user_ns, but no
that will have the same problem after audit ns becomes possible.  I
think the right check for systemd to perform to check whether this is
allowed is to actuallly try the bind().  That will return the right
answer both now and when namespaced audit is possible, without taking a
probably-wrong unrelated cue from the container manager.

It's not earth-shatteringly important and what you've got is workable,
but I think it may set a better precedent to do it the other way.

-serge

(One might almost think that we should have a new kernel facility to
answer questions such questions.  CAP_MAC_ADMIN is similar.)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]

2015-05-20 Thread Martin Pitt
Hello Frank and systemd devs,

frank.thalberg at tuta.io [2015-04-12 20:51 +]:
 This fixes an issue within journald aborting when running inside
 archlinux container via systemd-nspawn on a debian host with audit
 enabled kernel.

We have exactly the same problem with both standard nspawn as well as
user LXC containers. This completely breaks journalling in containers
and also prints a lot of error messages.

Using audit=0 on the host is not satisfying, though:

 - It's hard to discover
 - There is no reason to disable audit support on the host when all
   you need to do is to unbreak containers and disable auditing there.
 - We don't want admins to create static configs which are (1)
   always the same workaround everywhere, and (2) become obsolete once
   the kernel is fixed.

So I'd rather have a dynamic solution. Your patch works, but I don't
like it that much:

 +    if (errno == EPERM  detect_container(NULL)  0) {
 +    log_debug(Audit not supported in 
 containers.);
 +    return 0;
 +    }

The detect_container() check is not really related to the question
does audit work. It's certainly that way today, but future kernels
might change this.

 --- a/src/journal/journald-server.c
 +++ b/src/journal/journald-server.c
 @@ -1585,9 +1585,11 @@ int server_init(Server *s) {
   if (r  0)
   return r;
 
 +#ifdef HAVE_AUDIT
   r = server_open_audit(s);
   if (r  0)
   return r;
 +#endif

This would require statically enabling/disabling the complete audit
support in the systemd package, while we can do this check at runtime
without much effort.

Also, with your patch you merely unbreak journald itself, but
systemd-journald-audit.socket and other units which have
ConditionSecurity=audit will still fail.

I created a patch which is a more direct approach which makes the
ConditionSecurity=audit check more thorough and actually working in
containers. With that I can leave audit enabled on the host,
containers will boot fine (including journalling) without audit
support, and as soon as the kernel gets fixed it'll automagically start
working in containers as well.

Lennart, WDYT?

Thanks,

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
From 4900d31e22719464ed7208c7013730acb551d961 Mon Sep 17 00:00:00 2001
From: Martin Pitt martin.p...@ubuntu.com
Date: Wed, 20 May 2015 13:36:45 +0200
Subject: [PATCH] audit: Fix journal failing on unsupported audit in containers

Commit cfb1f5df introduced ConditionSecurity=audit via use_audit(). However,
use_audit() is still true in namespaces like nspawn containers as there
creating an audit socket succeeds. What fails in journald is binding to it
(Permission denied). So make the check more thorough to only declare that
audit is supported when bind() works as well. This is now exactly the same
check as journald does.

In journald, check use_audit() to see if auditing is supported before we
actually try to bind to the audit socket (as that will fail hard in
namespaces).

This avoids a failing journald and systemd-journald-audit.socket in nspawn and
similar environments. If/once the kernel gets proper auditing support for
namespaces, this will magically start to work without further changes.

Adjust README accordingly.
---
 README   | 18 --
 src/journal/journald-audit.c |  5 +
 src/shared/audit.c   |  9 -
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/README b/README
index 039110e..da7a86a 100644
--- a/README
+++ b/README
@@ -97,20 +97,6 @@ REQUIREMENTS:
   CONFIG_EFIVAR_FS
   CONFIG_EFI_PARTITION
 
-Note that kernel auditing is broken when used with systemd's
-container code. When using systemd in conjunction with
-containers, please make sure to either turn off auditing at
-runtime using the kernel command line option audit=0, or
-turn it off at kernel compile time using:
-  CONFIG_AUDIT=n
-If systemd is compiled with libseccomp support on
-architectures which do not use socketcall() and where seccomp
-is supported (this effectively means x86-64 and ARM, but
-excludes 32-bit x86!), then nspawn will now install a
-work-around seccomp filter that makes containers boot even
-with audit being enabled. This works correctly only on kernels
-3.14 and newer though. TL;DR: turn audit off, still.
-
 glibc = 2.16
 libcap
 libmount = 2.20 (from util-linux)
@@ -249,6 +235,10 @@ WARNINGS:
 false positives will be triggered by code which violates
 some rules but is actually safe.
 
+Kernel auditing is broken when used with systemd's container
+code. journal support for audit messages will not be available

Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]

2015-05-20 Thread systemd github import bot
Patchset imported to github.
Pull request:
https://github.com/systemd-devs/systemd/compare/master...systemd-mailing-devs:20150520115326.GA32127%40piware.de

--
Generated by https://github.com/haraldh/mail2git
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]

2015-05-20 Thread Lennart Poettering
On Wed, 20.05.15 13:53, Martin Pitt (martin.p...@ubuntu.com) wrote:

 -cached_use = true;
 +/* bind() fails in namespaces (containers), so check 
 that too */
 +static const union sockaddr_union sa = {
 +.nl.nl_family = AF_NETLINK,
 +.nl.nl_pid= 0,
 +.nl.nl_groups = AUDIT_NLGRP_READLOG,
 +};
 +cached_use = (bind(fd, sa.sa, sizeof(sa.nl)) = 0);
  safe_close(fd);

This check is simply not right. With that you check whether the
multicast audit API is available. But given that it has been added
only one or two kernel releases ago, and is protected by its own
capabality the check is definitely too broad.

The fact is simply that the kernel audit subsystem is borked in the
kernel when it comes to namespacing, and there's no nice way to detect
whether it is borked I am aware of. 

And it's not really about this multicast journald feature only. Sooner
or later you will run into other problems: any fedora-based distro
will not allow you to even log in in the container if you leave audit
on in the kernel, and don#t use the seccomp hack we have in place (for
example, because you are on 32bit x86, or because your distro turned
it off).

We could of course add a detect_container() check now to journald. But
I think that would be a big mistake, since there was work on fixing
audit in the kernel for containers (by adding audit namespacing or
so). And we should try to write our code so that things will start
working when the kernel is fixed, but a detect_container() check would
make that impossible.

Anyway, I think people are mostly concerned about bind() failing here,
hence I have now added some code to handle that gracefully.

Right now it will still log a message with LOG_WARNING. I'd be willing
to downgrade this to LOG_DEBUG for select error codes, if you tell me
the ones you run into. EINVAL?

Also, please convince your distro to enable seccomp support!

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]

2015-05-20 Thread Martin Pitt
Lennart Poettering [2015-05-20 14:57 +0200]:
 On Wed, 20.05.15 13:53, Martin Pitt (martin.p...@ubuntu.com) wrote:
 
  -cached_use = true;
  +/* bind() fails in namespaces (containers), so 
  check that too */
  +static const union sockaddr_union sa = {
  +.nl.nl_family = AF_NETLINK,
  +.nl.nl_pid= 0,
  +.nl.nl_groups = AUDIT_NLGRP_READLOG,
  +};
  +cached_use = (bind(fd, sa.sa, sizeof(sa.nl)) = 
  0);
   safe_close(fd);
 
 This check is simply not right. With that you check whether the
 multicast audit API is available. But given that it has been added
 only one or two kernel releases ago, and is protected by its own
 capabality the check is definitely too broad.

OK. I thought the intention of ConditionSecurity=audit was exactly
that, as this condition was added together with adding it to
systemd-journald-audit.socket.

 The fact is simply that the kernel audit subsystem is borked in the
 kernel when it comes to namespacing, and there's no nice way to detect
 whether it is borked I am aware of. 

Right, hence my thinking was that we check for the things we actually
want to do with it.

 We could of course add a detect_container() check now to journald. But
 I think that would be a big mistake, since there was work on fixing
 audit in the kernel for containers (by adding audit namespacing or
 so).

Right, fully agreed. That's why I wrote that I didn't like Frank's original
patch.

 Anyway, I think people are mostly concerned about bind() failing here,
 hence I have now added some code to handle that gracefully.
 
 Right now it will still log a message with LOG_WARNING. I'd be willing
 to downgrade this to LOG_DEBUG for select error codes, if you tell me
 the ones you run into. EINVAL?

bind(7, {sa_family=AF_NETLINK, pid=0, groups=0001}, 12) = -1 EPERM 
(Operation not permitted)

With commit 417a7fdc journald now starts working, but
systemd-journald-audit.socket still fails:

   Active: failed (Result: resources)
   systemd[1]: systemd-journald-audit.socket: Socket service 
systemd-journald.service already active, refusing.
   systemd[1]: Failed to listen on Journal Audit Socket.

That's why I thought tightening the ConditionSecurity=audit check
would make more sense, as systemd-journald-audit.socket is the only
unit which actually uses it. We could add
ConditionVirtualization=!container to it as a distro-level workaround,
but I don't like that for the reasons above. I don't just want to
leave it like that as it makes the system stay in degraded mode.

 Also, please convince your distro to enable seccomp support!

Fair enough, but that hack doesn't work on all platforms we support
(i386, powerpc, ppc64el, etc.), and quite frankly that's an even worse
hack: You'd need to disable that filter once the kernel gets fixed,
and/or conditionalize it based on the running kernel version. I'd like
the same code to work everywhere :-)

Thanks,

Martin

-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]

2015-05-20 Thread Lennart Poettering
On Wed, 20.05.15 15:48, Martin Pitt (martin.p...@ubuntu.com) wrote:

 Lennart Poettering [2015-05-20 14:57 +0200]:
  On Wed, 20.05.15 13:53, Martin Pitt (martin.p...@ubuntu.com) wrote:
  
   -cached_use = true;
   +/* bind() fails in namespaces (containers), so 
   check that too */
   +static const union sockaddr_union sa = {
   +.nl.nl_family = AF_NETLINK,
   +.nl.nl_pid= 0,
   +.nl.nl_groups = AUDIT_NLGRP_READLOG,
   +};
   +cached_use = (bind(fd, sa.sa, sizeof(sa.nl)) = 
   0);
safe_close(fd);
  
  This check is simply not right. With that you check whether the
  multicast audit API is available. But given that it has been added
  only one or two kernel releases ago, and is protected by its own
  capabality the check is definitely too broad.
 
 OK. I thought the intention of ConditionSecurity=audit was exactly
 that, as this condition was added together with adding it to
 systemd-journald-audit.socket.

Nope, ConditionSecurity=audit is only a simple boolean check that
holds when audit is enabled at all. It doesn't tell you anything about
the precise audit feature set of the kernel.

  Anyway, I think people are mostly concerned about bind() failing here,
  hence I have now added some code to handle that gracefully.
  
  Right now it will still log a message with LOG_WARNING. I'd be willing
  to downgrade this to LOG_DEBUG for select error codes, if you tell me
  the ones you run into. EINVAL?
 
 bind(7, {sa_family=AF_NETLINK, pid=0, groups=0001}, 12) = -1 EPERM 
 (Operation not permitted)
 
 With commit 417a7fdc journald now starts working, but
 systemd-journald-audit.socket still fails:
 
Active: failed (Result: resources)
systemd[1]: systemd-journald-audit.socket: Socket service 
 systemd-journald.service already active, refusing.
systemd[1]: Failed to listen on Journal Audit Socket.

I have now conditionalized the unit on CAP_ADMIN_READ, which is the
cap that you need to read the audit multicast stuff. You container
manager hence should simply drop that cap fro, the cap set it passes
and all should be good.

I didn't test this though, hence please check if current git fixes
that for you now.

 That's why I thought tightening the ConditionSecurity=audit check
 would make more sense, as systemd-journald-audit.socket is the only
 unit which actually uses it. We could add
 ConditionVirtualization=!container to it as a distro-level workaround,
 but I don't like that for the reasons above. I don't just want to
 leave it like that as it makes the system stay in degraded mode.

Both conditions are now in place, and we need both: one can have the
cap without auditing being enabled, and auditing can be enabled
without the cap available, only if one has both the audit suff in
journal can work.

  Also, please convince your distro to enable seccomp support!
 
 Fair enough, but that hack doesn't work on all platforms we support
 (i386, powerpc, ppc64el, etc.), and quite frankly that's an even worse
 hack: You'd need to disable that filter once the kernel gets fixed,
 and/or conditionalize it based on the running kernel version. I'd like
 the same code to work everywhere :-)

Well, it's relatively simply fixing one container manager than all
userspaces that can run within it...

But anyway, please check if git works for you now.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]

2015-05-20 Thread Martin Pitt
Hey Lennart,

Lennart Poettering [2015-05-20 17:49 +0200]:
 Nope, ConditionSecurity=audit is only a simple boolean check that
 holds when audit is enabled at all. It doesn't tell you anything about
 the precise audit feature set of the kernel.

Ah, thanks for the clarification.

 I have now conditionalized the unit on CAP_ADMIN_READ, which is the
 cap that you need to read the audit multicast stuff. You container
 manager hence should simply drop that cap fro, the cap set it passes
 and all should be good.

Wonderful! Now it works perfectly in nspawn. (This needs to be fixed
in unprivileged LXC containers, but that's not a systemd problem; I'll
talk to LXC upstream about that).

With these two fixes, should we now remove the scary warning in
README? AFAICS there is no need to turn auditing off on the host any
more.

Thanks!

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel