subject:"RFC\(v2\)\: Audit Kernel Container IDs"

Re: RFC(v2): Audit Kernel Container IDs

2017-12-11 Thread Steve Grubb

On Monday, December 11, 2017 11:30:57 AM EST Eric Paris wrote:
> > Because a container doesn't have to use namespaces to be a container
> > you still need a mechanism for a process to declare that it is in
> > fact
> > in a container, and to identify the container.
> 
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like'
> container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
>   nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged?

If it is a normal process, then everything would match the init name space and 
you wouldn't have entered a container. If it were a container, any generated 
event should have the container ID from registration attached to it.

> Did this command run in it's own 'container' unrelated to the 'docker-like'
> container?

That should be determined by what's in the task struct.

-Steve

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-12-11 Thread Casey Schaufler

On 12/11/2017 8:30 AM, Eric Paris wrote:
> On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
>> Because a container doesn't have to use namespaces to be a container
>> you still need a mechanism for a process to declare that it is in
>> fact
>> in a container, and to identify the container.
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like' 
> container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
>   nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged? Did this command run in it's own 'container'
> unrelated to the 'docker-like' container?

Jose Bollo's PTAGS ( https://gitlab.com/jobol/ptags ) would be
prefect. Any time you declare something to be a container or
enter a namespace you slap a tag on it. Identifying nested
containers would be easy, you'd have multiple tags.

PTAGS unfortunately needs module stacking, but how hard could that be?


> -Eric

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-12-11 Thread Eric Paris

On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
> On 12/9/2017 2:20 AM, Mickaï¿½l Salaï¿½n wrote:

> >  What about automatically create
> > and assign an ID to a process when it enters a namespace different
> > than
> > one of its parent process? This delegates the (permission)
> > responsibility to the use of namespaces (e.g. /proc/sys/user/max_*
> > limit).
> 
> That gets ugly when you have a container that uses user, filesystem,
> network and whatever else namespaces. If all containers used the same
> set of namespaces I think this would be a fine idea, but they don't.
> 
> > One interesting side effect of this approach would be to be able to
> > identify which processes are in the same set of namespaces, even if
> > not
> > spawn from the container but entered after its creation (i.e. using
> > setns), by creating container IDs as a (deterministic) checksum
> > from the
> > /proc/self/ns/* IDs.
> > 
> > Since the concern is to identify a container, I think the ability
> > to
> > audit the switch from one container ID to another is enough. I
> > don't
> > think we need nested IDs.
> 
> Because a container doesn't have to use namespaces to be a container
> you still need a mechanism for a process to declare that it is in
> fact
> in a container, and to identify the container.

I like the idea but I'm still tossing it around in my head (and
thinking about Casey's statement too). Lets say we have a 'docker-like' 
container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
in all init namespaces and I run
  nsenter -t 100 -n ip link set eth0 promisc on
How should this be logged? Did this command run in it's own 'container'
unrelated to the 'docker-like' container?

-Eric

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-12-11 Thread Richard Guy Briggs

On 2017-12-09 11:20, Mickaël Salaün wrote:
> 
> On 12/10/2017 18:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >> Containers are a userspace concept.  The kernel knows nothing of them.
> >>
> >> The Linux audit system needs a way to be able to track the container
> >> provenance of events and actions.  Audit needs the kernel's help to do
> >> this.
> >>
> >> Since the concept of a container is entirely a userspace concept, a
> >> registration from the userspace container orchestration system initiates
> >> this.  This will define a point in time and a set of resources
> >> associated with a particular container with an audit container ID.
> >>
> >> The registration is a pseudo filesystem (proc, since PID tree already
> >> exists) write of a u8[16] UUID representing the container ID to a file
> >> representing a process that will become the first process in a new
> >> container.  This write might place restrictions on mount namespaces
> >> required to define a container, or at least careful checking of
> >> namespaces in the kernel to verify permissions of the orchestrator so it
> >> can't change its own container ID.  A bind mount of nsfs may be
> >> necessary in the container orchestrator's mntNS.
> >> Note: Use a 128-bit scalar rather than a string to make compares faster
> >> and simpler.
> >>
> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >> registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> > 
> >>   At that time, record the target container's user-supplied
> >> container identifier along with the target container's first process
> >> (which may become the target container's "init" process) process ID
> >> (referenced from the initial PID namespace), all namespace IDs (in the
> >> form of a nsfs device number and inode number tuple) in a new auxilliary
> >> record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace. What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

A container doesn't imply a namespace and vice versa.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.

This would be really helpful, but it isn't the case.

> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Since container namespace membership is arbitrary between container
orchestrators, this needs a registration process and a way for the
container orchestrator to know the ID.


I completely agree with Casey here.

> As a side note, you may want to take a look at the Linux-VServer's XID.
> 
> Regards,
>  Mickaël

- RGB

--
Richard Guy Briggs 
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-12-10 Thread Mickaël Salaün

On 12/10/2017 18:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>> Containers are a userspace concept.  The kernel knows nothing of them.
>>
>> The Linux audit system needs a way to be able to track the container
>> provenance of events and actions.  Audit needs the kernel's help to do
>> this.
>>
>> Since the concept of a container is entirely a userspace concept, a
>> registration from the userspace container orchestration system initiates
>> this.  This will define a point in time and a set of resources
>> associated with a particular container with an audit container ID.
>>
>> The registration is a pseudo filesystem (proc, since PID tree already
>> exists) write of a u8[16] UUID representing the container ID to a file
>> representing a process that will become the first process in a new
>> container.  This write might place restrictions on mount namespaces
>> required to define a container, or at least careful checking of
>> namespaces in the kernel to verify permissions of the orchestrator so it
>> can't change its own container ID.  A bind mount of nsfs may be
>> necessary in the container orchestrator's mntNS.
>> Note: Use a 128-bit scalar rather than a string to make compares faster
>> and simpler.
>>
>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>> registration.
> 
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?
> 
>>   At that time, record the target container's user-supplied
>> container identifier along with the target container's first process
>> (which may become the target container's "init" process) process ID
>> (referenced from the initial PID namespace), all namespace IDs (in the
>> form of a nsfs device number and inode number tuple) in a new auxilliary
>> record AUDIT_CONTAINER with a qualifying op=$action field.

Here is an idea to avoid privilege problems or the need for a new
capability: make it automatic. What makes a container a container seems
to be the use of at least a namespace. What about automatically create
and assign an ID to a process when it enters a namespace different than
one of its parent process? This delegates the (permission)
responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

One interesting side effect of this approach would be to be able to
identify which processes are in the same set of namespaces, even if not
spawn from the container but entered after its creation (i.e. using
setns), by creating container IDs as a (deterministic) checksum from the
/proc/self/ns/* IDs.

Since the concern is to identify a container, I think the ability to
audit the switch from one container ID to another is enough. I don't
think we need nested IDs.

As a side note, you may want to take a look at the Linux-VServer's XID.

Regards,
 Mickaël

signature.asc
Description: OpenPGP digital signature
--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-12-10 Thread Casey Schaufler

On 12/9/2017 2:20 AM, Mickaï¿½l Salaï¿½n wrote:
> On 12/10/2017 18:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions.  Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this.  This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
>>
>>>   At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace.

You might think so, but I am assured that you can have a container
without using namespaces. Intel's "Clear Containers", which use
virtualization technology, are one example. I have considered creating
"Smack Containers" using mandatory access control technology, more
to press the point that "containers" is a marketing concept, not
technology.

>  What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

That gets ugly when you have a container that uses user, filesystem,
network and whatever else namespaces. If all containers used the same
set of namespaces I think this would be a fine idea, but they don't.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.
>
> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Because a container doesn't have to use namespaces to be a container
you still need a mechanism for a process to declare that it is in fact
in a container, and to identify the container.

>
> As a side note, you may want to take a look at the Linux-VServer's XID.
>
> Regards,
>  Mickaï¿½l
>

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Steve Grubb

On Thursday, October 19, 2017 7:11:33 PM EDT Aleksa Sarai wrote:
> >>> The registration is a pseudo filesystem (proc, since PID tree already
> >>> exists) write of a u8[16] UUID representing the container ID to a file
> >>> representing a process that will become the first process in a new
> >>> container.  This write might place restrictions on mount namespaces
> >>> required to define a container, or at least careful checking of
> >>> namespaces in the kernel to verify permissions of the orchestrator so it
> >>> can't change its own container ID.  A bind mount of nsfs may be
> >>> necessary in the container orchestrator's mntNS.
> >>> Note: Use a 128-bit scalar rather than a string to make compares faster
> >>> and simpler.
> >>> 
> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >>> registration.
> >> 
> >> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
> > 
> > No, because then any process with that capability (vsftpd) could change
> > its own container ID.  This is discussed more in other parts of the
> > thread...

For the record, I changed my mind. CAP_AUDIT_CONTROL is the correct 
capability. 

> Not if we make the container ID append-only (to support nesting), or
> write-once (the other idea thrown around). 

Well...I like to use lessons learned if they can be applied. In the normal 
world without containers we have uid, auid, and session_id. uid is who you are 
now, auid is how you got into the system, session_id distinguishes individual 
auids. We have a default auid of -1 for system objects and a real number for 
people.

I think there should be the equivalent of auid and session_id but tailored for 
containers. Loginuid == container id. It can be set, overridden, or appended 
to (we'll figure this out later) in very limited circumstances. 
Container_session == session which is tamper-proof. This way things can enter 
a container with the same ID but under a different session. And everything 
else gets to inherit the original ID. This way we can trace actions to 
something that entered the container rather than normal system activity in the 
container.

What a security officer wants to know is what did people do inside the 
system / container. The system objects we typically don't care about. Sure 
they might get hacked and then work on behalf of someone, but they would 
almost always pop a shell so that they can have freedom. That should set off 
an AVC or create other activity that gets picked up.

-Steve

> In that case, you can't move "out" from a particular container ID, you can
> only go "deeper". These semantics don't make sense for generic containers,
> but since the point of this facility is *specifically* for audit I imagine
> that not being able to move a process from a sub-container's ID is a
> benefit.

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Aleksa Sarai


The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator so it
can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.


Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.


No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...


Not if we make the container ID append-only (to support nesting), or 
write-once (the other idea thrown around). In that case, you can't move 
"out" from a particular container ID, you can only go "deeper". These 
semantics don't make sense for generic containers, but since the point 
of this facility is *specifically* for audit I imagine that not being 
able to move a process from a sub-container's ID is a benefit.


--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Aleksa Sarai


The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator 
so it

can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.


Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.


No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...


Not if we make the container ID append-only (to support nesting), or 
write-once (the other idea thrown around). In that case, you can't move 
"out" from a particular container ID, you can only go "deeper". These 
semantics don't make sense for generic containers, but since the point 
of this facility is *specifically* for audit I imagine that not being 
able to move a process from a sub-container's ID is a benefit.


[This assumes it's CAP_AUDIT_CONTROL which is what we are discussing in 
a sister thread.]


--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Richard Guy Briggs

On 2017-10-12 15:45, Steve Grubb wrote:
> On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote:
> > Containers are a userspace concept.  The kernel knows nothing of them.
> > 
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions.  Audit needs the kernel's help to do
> > this.
> > 
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this.  This will define a point in time and a set of resources
> > associated with a particular container with an audit container ID.
> 
> The requirements for common criteria around containers should be very closely 
> modeled on the requirements for virtualization. It would be the container 
> manager that is responsible for logging the resource assignment events.

I suspect we are in violent agreement here.

> > The registration is a pseudo filesystem (proc, since PID tree already
> > exists) write of a u8[16] UUID representing the container ID to a file
> > representing a process that will become the first process in a new
> > container.  This write might place restrictions on mount namespaces
> > required to define a container, or at least careful checking of
> > namespaces in the kernel to verify permissions of the orchestrator so it
> > can't change its own container ID.  A bind mount of nsfs may be
> > necessary in the container orchestrator's mntNS.
> > Note: Use a 128-bit scalar rather than a string to make compares faster
> > and simpler.
> > 
> > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > registration.
> 
> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.

No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...

> > At that time, record the target container's user-supplied
> > container identifier along with the target container's first process
> > (which may become the target container's "init" process) process ID
> > (referenced from the initial PID namespace), all namespace IDs (in the
> > form of a nsfs device number and inode number tuple) in a new auxilliary
> > record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> This would be in addition to the normal audit fields.

It was intended that this be an auxilliary record, but this issue is
being debated in threads about other upstream issues currently so I
won't cover that here.

> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> > 
> > Forked and cloned processes inherit their parent's container ID,
> > referenced in the process' task_struct.
> > 
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children.  If this is deemed overly restrictive,
> > switch all threads and children to the new containerID.
> > 
> > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> > 
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' containerID(s), if applicable.  Include the spawning and
> > spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process.
> > 
> > Log the destruction of every namespace when it is no longer used by any
> > process, include the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> 
> In the virtualization requirements, we only log removal of resources when 
> something is removed by intention. If the VM shuts down, the manager issues a 
> VIRT_CONTROL stop event and the user space utilities knows this means all 
> resources have been unassigned.

Ok, this assumes the orchestrator is waiting on that child process (and
that it is in turn waiting on all its children) so it knows when that
job has exited naturally or errored out.  I don't know if there is any
consensus or best practice with orchestrators out there now.  The kernel
should know, so it seemed reasonable to report what was known.  Besides,
in this case, I was talking specifically about namespace creation and
destruction rather than containers.

> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> >

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Paul Moore

On Thu, Oct 19, 2017 at 12:25 PM, Eric W. Biederman
 wrote:
> Paul Moore  writes:
>
>> On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
>>  wrote:
>>> Aleksa Sarai  writes:
>> The security implications are that anything that can change the label
>> could also hide itself and its doings from the audit system and thus
>> would be used as a means to evade detection.  I actually think this
>> means the label should be write once (once you've set it, you can't
>> change it) ...
>
> Richard and I have talked about a write once approach, but the
> thinking was that you may want to allow a nested container
> orchestrator (Why? I don't know, but people always want to do the
> craziest things.) and a write-once policy makes that impossible.  If
> we punt on the nested orchestrator, I believe we can seriously think
> about a write-once policy to simplify things.

 Nested containers are a very widely used use-case (see LXC system 
 containers,
 inside of which people run other container runtimes). So I would definitely
 consider it something that "needs to be supported in some way". While the 
 LXC
 guys might be a *tad* crazy, the use-case isn't. :P
>>
>> No worries, we're all a little crazy in our own special ways ;)
>>
>> Kidding aside, thanks for explaining the use case.
>>
>>> Of course some of that gets to running auditd inside a container which
>>> we don't have yet either.
>>>
>>> So I think to start it is perfectly fine to figure out the non-nested
>>> case first and what makes sense there.  Then to sort out the nested
>>> container case.
>>>
>>> The solution might be that a process gets at most one id per ``audit
>>> namespace''.
>>
>> In an attempt to stay on-topic, let's try to stick with "audit
>> container ID" or "container ID" if you must.  I really want to avoid
>> the term "audit namespace" simply because the term "namespace" implies
>> some things which we aren't planning on doing.
>
> This is 100% on topic.  I am saying that unless we are planing to have
> auditd running in a container with it's own set of rules you probably
> don't care about nested containers.  Last time I heard a discussion
> about that the term in use was audit namespace.   So I was referring to
> that support when I said audit namespace, even if the end result only
> loosely fits the term namespace.

My "stay on-topic" comment is directed at, and limited to, your choice
of terminology, not the discussion about container nesting.  I'm
purposefully not using the term "audit namespace" to refer to anything
that Richard has presented, and I'm kindly asking you to do the same,
it simply doesn't fit.

> I could be wrong of course.  I don't fully understand what is driving
> the desire to connect audit and containers.  But my naive guess is that
> one from an audit perspective you don't care about nested containers
> unless there is also a nested auditd who is looking at it from a nested
> perspective.

Two motivations that are clear to me: the first is the desire to be
able to associate events in the audit log with a container (much like
how the session ID helped us associate events with a login session),
the second is the desire for users to run an audit daemon instance in
their containers to capture audit events generated by their container.
There is also a security certification motivation, see some of Steve's
comments for more on that.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Paul Moore

On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
 wrote:
> Aleksa Sarai  writes:
 The security implications are that anything that can change the label
 could also hide itself and its doings from the audit system and thus
 would be used as a means to evade detection.  I actually think this
 means the label should be write once (once you've set it, you can't
 change it) ...
>>>
>>> Richard and I have talked about a write once approach, but the
>>> thinking was that you may want to allow a nested container
>>> orchestrator (Why? I don't know, but people always want to do the
>>> craziest things.) and a write-once policy makes that impossible.  If
>>> we punt on the nested orchestrator, I believe we can seriously think
>>> about a write-once policy to simplify things.
>>
>> Nested containers are a very widely used use-case (see LXC system containers,
>> inside of which people run other container runtimes). So I would definitely
>> consider it something that "needs to be supported in some way". While the LXC
>> guys might be a *tad* crazy, the use-case isn't. :P

No worries, we're all a little crazy in our own special ways ;)

Kidding aside, thanks for explaining the use case.

> Of course some of that gets to running auditd inside a container which
> we don't have yet either.
>
> So I think to start it is perfectly fine to figure out the non-nested
> case first and what makes sense there.  Then to sort out the nested
> container case.
>
> The solution might be that a process gets at most one id per ``audit
> namespace''.

In an attempt to stay on-topic, let's try to stick with "audit
container ID" or "container ID" if you must.  I really want to avoid
the term "audit namespace" simply because the term "namespace" implies
some things which we aren't planning on doing.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Eric W. Biederman

Paul Moore  writes:

> On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
>  wrote:
>> Aleksa Sarai  writes:
> The security implications are that anything that can change the label
> could also hide itself and its doings from the audit system and thus
> would be used as a means to evade detection.  I actually think this
> means the label should be write once (once you've set it, you can't
> change it) ...

 Richard and I have talked about a write once approach, but the
 thinking was that you may want to allow a nested container
 orchestrator (Why? I don't know, but people always want to do the
 craziest things.) and a write-once policy makes that impossible.  If
 we punt on the nested orchestrator, I believe we can seriously think
 about a write-once policy to simplify things.
>>>
>>> Nested containers are a very widely used use-case (see LXC system 
>>> containers,
>>> inside of which people run other container runtimes). So I would definitely
>>> consider it something that "needs to be supported in some way". While the 
>>> LXC
>>> guys might be a *tad* crazy, the use-case isn't. :P
>
> No worries, we're all a little crazy in our own special ways ;)
>
> Kidding aside, thanks for explaining the use case.
>
>> Of course some of that gets to running auditd inside a container which
>> we don't have yet either.
>>
>> So I think to start it is perfectly fine to figure out the non-nested
>> case first and what makes sense there.  Then to sort out the nested
>> container case.
>>
>> The solution might be that a process gets at most one id per ``audit
>> namespace''.
>
> In an attempt to stay on-topic, let's try to stick with "audit
> container ID" or "container ID" if you must.  I really want to avoid
> the term "audit namespace" simply because the term "namespace" implies
> some things which we aren't planning on doing.

This is 100% on topic.  I am saying that unless we are planing to have
auditd running in a container with it's own set of rules you probably
don't care about nested containers.  Last time I heard a discussion
about that the term in use was audit namespace.   So I was referring to
that support when I said audit namespace, even if the end result only
loosely fits the term namespace.

I could be wrong of course.  I don't fully understand what is driving
the desire to connect audit and containers.  But my naive guess is that
one from an audit perspective you don't care about nested containers
unless there is also a nested auditd who is looking at it from a nested
perspective.

So far we have established with the term container that we are talking
about a running instance of processes, not a filesystem instance that
Docker and friends ship around.   Beyond that I am not certain what you
care about.

Eric

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Paul Moore

On Thu, Oct 19, 2017 at 9:32 AM, Casey Schaufler  wrote:
> On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:
>> On 2017-10-17 01:10, Casey Schaufler wrote:
>>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
 On 2017-10-12 16:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>> Containers are a userspace concept.  The kernel knows nothing of them.
>>
>> The Linux audit system needs a way to be able to track the container
>> provenance of events and actions.  Audit needs the kernel's help to do
>> this.
>>
>> Since the concept of a container is entirely a userspace concept, a
>> registration from the userspace container orchestration system initiates
>> this.  This will define a point in time and a set of resources
>> associated with a particular container with an audit container ID.
>>
>> The registration is a pseudo filesystem (proc, since PID tree already
>> exists) write of a u8[16] UUID representing the container ID to a file
>> representing a process that will become the first process in a new
>> container.  This write might place restrictions on mount namespaces
>> required to define a container, or at least careful checking of
>> namespaces in the kernel to verify permissions of the orchestrator so it
>> can't change its own container ID.  A bind mount of nsfs may be
>> necessary in the container orchestrator's mntNS.
>> Note: Use a 128-bit scalar rather than a string to make compares faster
>> and simpler.
>>
>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>> registration.
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?
 There is such a thing, but the kernel doesn't know about it yet.
>>> Then how can it be the kernel's place to control access to a
>>> container resource, that is, the containerID.
>> Ok, let me try to address your objections.
>>
>> The kernel can know enough that if it is already set to not allow it to
>> be set again.  Or if the user doesn't have permission to set it that the
>> user be denied this action.  How is this different from loginuid and
>> sessionid?
   This
 same situation exists for loginuid and sessionid which are userspace
 concepts that the kernel tracks for the convenience of userspace.
>>> Ah, no. Loginuid identifies a user, which is a kernel concept in
>>> that a user is defined by the uid.
>> This simple explanation doesn't help me.  What makes that a kernel
>> concept?  The fact that it is stored and compared in more than one
>> place?
>>
>>> The session ID has well defined kernel semantics. You're trying to say
>>> that the containerID is an opaque value that is meaningless to the
>>> kernel, but you still want the kernel to protect it. How can the
>>> kernel know if it is protecting it correctly?
>> How so?  A userspace process triggers this.  Does the kernel know what
>> these values mean?  Does it do anything with them other than report
>> them or allow audit to filter them?  It is given some instructions on
>> how to treat it.
>>
>> This is what we're trying to do with the containerID.
>>
   As
 for its name, I'm not particularly picky, so if you don't like
 CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
 needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
 don't want to give the ability to set a containerID to any process that
 is able to do audit logging (such as vsftpd) and similarly we don't want
 to give the orchestrator the ability to control the setup of the audit
 daemon.
>>> Sorry, but what aspect of the kernel security policy is this
>>> capability supposed to protect? That's what capabilities are
>>> for, not the undefined support of undefined user-space behavior.
>> Similarly, loginuids and sessionIDs are only used for audit tracking and
>> filtering.
>
> Tell me again why you're not reusing either of these?

Ah, granularity arguments, welcome back old friend :)

Once again, we're still trying to sort all this out so I reserve the
right to change my mind, but my current thinking is as follows ...
CAP_AUDIT_WRITE exists to control which applications can submit
userspace generated audit records to the kernel, CAP_AUDIT_CONTROL
exists to control which applications can manage the in-kernel audit
configuration (e.g. filter rules) and the current task's loginuid
value.  Reusing CAP_AUDIT_WRITE here would allow any application that
can submit userspace audit records the ability to change the audit
container ID; this would be bad, we don't allow CAP_AUDIT_WRITE to
change the loginuid, it would be even worse to allow it to change the
audit container ID.  Reusing CAP_AUDIT_CONTROL is less worse than than
CAP_AUDIT_WRITE, but it gets

Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Casey Schaufler

On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:
> On 2017-10-17 01:10, Casey Schaufler wrote:
>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
>>> On 2017-10-12 16:33, Casey Schaufler wrote:
 On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.
>
> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
>
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.
 Hang on. If containers are a user space concept, how can
 you want CAP_CONTAINER_ANYTHING? If there's not such thing as
 a container, how can you be asking for a capability to manage
 them?
>>> There is such a thing, but the kernel doesn't know about it yet.
>> Then how can it be the kernel's place to control access to a
>> container resource, that is, the containerID.
> Ok, let me try to address your objections.
>
> The kernel can know enough that if it is already set to not allow it to
> be set again.  Or if the user doesn't have permission to set it that the
> user be denied this action.  How is this different from loginuid and
> sessionid?
>>>   This
>>> same situation exists for loginuid and sessionid which are userspace
>>> concepts that the kernel tracks for the convenience of userspace.
>> Ah, no. Loginuid identifies a user, which is a kernel concept in
>> that a user is defined by the uid.
> This simple explanation doesn't help me.  What makes that a kernel
> concept?  The fact that it is stored and compared in more than one
> place?
>
>> The session ID has well defined kernel semantics. You're trying to say
>> that the containerID is an opaque value that is meaningless to the
>> kernel, but you still want the kernel to protect it. How can the
>> kernel know if it is protecting it correctly?
> How so?  A userspace process triggers this.  Does the kernel know what
> these values mean?  Does it do anything with them other than report
> them or allow audit to filter them?  It is given some instructions on
> how to treat it.
>
> This is what we're trying to do with the containerID.
>
>>>   As
>>> for its name, I'm not particularly picky, so if you don't like
>>> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
>>> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
>>> don't want to give the ability to set a containerID to any process that
>>> is able to do audit logging (such as vsftpd) and similarly we don't want
>>> to give the orchestrator the ability to control the setup of the audit
>>> daemon.
>> Sorry, but what aspect of the kernel security policy is this
>> capability supposed to protect? That's what capabilities are
>> for, not the undefined support of undefined user-space behavior.
> Similarly, loginuids and sessionIDs are only used for audit tracking and
> filtering.

Tell me again why you're not reusing either of these?

>
>> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
>> more than audit behavior you have to define what system security
>> policy you're dealing with in order to pick the right capability.
> It isn't audit behaviour (yet), it is audit reporting information, a
> level above simply writing logs and a level below controlling daemon
> behaviour.

You are changing audit information. That's CAP_AUDIT_CONTROL.

>
>> We get this request pretty regularly. "I need my own capability
>> because I have a niche thing that isn't part of the system security
>> policy but that is important!" Fit the containerID into the
>> system security policy, and if that results in using CAP_SYS_ADMIN,
>> oh well.
> There's far too much piled in to CAP_SYS_ADMIN already, which is making
> capabilites less and less useful.  

No. The value of capabilities is in separating privilege from DAC.
Granularity is a bonus. The current granularity is too fine in some
cases and too coarse in others.

> I

Re: RFC(v2): Audit Kernel Container IDs

2017-10-18 Thread Eric W. Biederman

Aleksa Sarai  writes:

>>> The security implications are that anything that can change the label
>>> could also hide itself and its doings from the audit system and thus
>>> would be used as a means to evade detection.  I actually think this
>>> means the label should be write once (once you've set it, you can't
>>> change it) ...
>>
>> Richard and I have talked about a write once approach, but the
>> thinking was that you may want to allow a nested container
>> orchestrator (Why? I don't know, but people always want to do the
>> craziest things.) and a write-once policy makes that impossible.  If
>> we punt on the nested orchestrator, I believe we can seriously think
>> about a write-once policy to simplify things.
>
> Nested containers are a very widely used use-case (see LXC system containers,
> inside of which people run other container runtimes). So I would definitely
> consider it something that "needs to be supported in some way". While the LXC
> guys might be a *tad* crazy, the use-case isn't. :P

Of course some of that gets to running auditd inside a container which
we don't have yet either.

So I think to start it is perfectly fine to figure out the non-nested
case first and what makes sense there.  Then to sort out the nested
container case.

The solution might be that a process gets at most one id per ``audit
namespace''.

Eric

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-18 Thread Aleksa Sarai


The security implications are that anything that can change the label
could also hide itself and its doings from the audit system and thus
would be used as a means to evade detection.  I actually think this
means the label should be write once (once you've set it, you can't
change it) ...


Richard and I have talked about a write once approach, but the
thinking was that you may want to allow a nested container
orchestrator (Why? I don't know, but people always want to do the
craziest things.) and a write-once policy makes that impossible.  If
we punt on the nested orchestrator, I believe we can seriously think
about a write-once policy to simplify things.


Nested containers are a very widely used use-case (see LXC system 
containers, inside of which people run other container runtimes). So I 
would definitely consider it something that "needs to be supported in 
some way". While the LXC guys might be a *tad* crazy, the use-case isn't. :P



... and orchestration systems should begin as unlabelled
processes allowing them to do arbitrary forks.


My current thinking is that the default state is to start unlabeled (I
just vomited a little into my SELinux hat); in other words
init/systemd/PID-1 in the host system starts with an "unset" audit
container ID.  This not only helps define the host system (anything
that has an unset audit container ID) but provides a blank slate for
the orchestrator(s).


For nested containers, I actually think the label should be
hierarchical, so you can add a label for the new nested container but
it still also contains its parents label as well.


I haven't made up my mind on this completely just yet, but I'm
currently of the mindset that supporting multiple audit container IDs
on a given process is not a good idea.


As long as creating a new "container" (that is, changing a process's 
"audit container ID") is an audit event then I think that having a 
hierarchy be explicit is not necessary (userspace audit can figure out 
the hierarchy quite easily -- but also there are cases where thinking of 
it as being hierarchical isn't necessarily correct).


--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-18 Thread Paul Moore

On Tue, Oct 17, 2017 at 11:44 AM, James Bottomley
 wrote:
> On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
>> > Without a *kernel* policy on containerIDs you can't say what
>> > security policy is being exempted.
>>
>> The policy has been basically stated earlier.
>>
>> A way to track a set of processes from a specific point in time
>> forward. The name used is "container id", but it could be anything.
>> This marker is mostly used by user space to track process hierarchies
>> without races, these processes can be very privileged, and must not
>> be allowed to change the marker themselves when granted the current
>> common capabilities.
>>
>> Is this a good enough description ? If not can you clarify your
>> expectations ?
>
> I think you mean you want to be able to apply a label to a process
> which is inherited across forks.  The label should only be susceptible
> to modification by something possessing a capability (which one TBD).
> The idea is that processes spawned into a container would be labelled
> by the container orchestration system.  It's unclear what should happen
> to processes using nsenter after the fact, but policy for that should
> be up to the orchestration system.
>
> The label will be used as a tag for audit information.
>
> I think you were missing label inheritance above.

That is a pretty good summary of what we want to do, and what Richard
and I have discussed while brainstorming this offline.  The details
may not have translated well into those initial emails from Richard,
but I think you've got the idea, even if some of the smaller details
are still TBD.  FWIW, right now I'm not as worried about the exact
capability or the size of the audit container ID, I think those things
will sort themselves out as we progress through the implementation,
especially once we get to the next stage when we start to allow copies
of the audit records to be routed to audit daemons running inside
containers (note well that I said "copies", the host system still sees
all).

> The security implications are that anything that can change the label
> could also hide itself and its doings from the audit system and thus
> would be used as a means to evade detection.  I actually think this
> means the label should be write once (once you've set it, you can't
> change it) ...

Richard and I have talked about a write once approach, but the
thinking was that you may want to allow a nested container
orchestrator (Why? I don't know, but people always want to do the
craziest things.) and a write-once policy makes that impossible.  If
we punt on the nested orchestrator, I believe we can seriously think
about a write-once policy to simplify things.

A bit off topic, but I've also wondered about not even implementing
read access, just to help ensure the audit container ID wouldn't be
abused, but I'm not sure how practical that will be.  Something else
to sort out during the RFC phase of the implementation with the
container orchestrators.

> ... and orchestration systems should begin as unlabelled
> processes allowing them to do arbitrary forks.

My current thinking is that the default state is to start unlabeled (I
just vomited a little into my SELinux hat); in other words
init/systemd/PID-1 in the host system starts with an "unset" audit
container ID.  This not only helps define the host system (anything
that has an unset audit container ID) but provides a blank slate for
the orchestrator(s).

> For nested containers, I actually think the label should be
> hierarchical, so you can add a label for the new nested container but
> it still also contains its parents label as well.

I haven't made up my mind on this completely just yet, but I'm
currently of the mindset that supporting multiple audit container IDs
on a given process is not a good idea.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-18 Thread Paul Moore

On Tue, Oct 17, 2017 at 8:31 AM, Simo Sorce  wrote:
> The container Id can be used also for authorization purposes (by other
> processes on the host), not just audit, I think this is why a separate
> control has been proposed.

Apologies, but I'm just now getting a chance to work my way through
this thread, and I wanted to make a quick comment on this point ...

The audit container ID (note I said "audit container ID" not
"container ID") is intended strictly for use by the audit subsystem at
this point.  Allowing other uses opens the door to a larger set of
problems we are trying to avoid (e.g. handling migration across
hosts).  We would love to have a generic kernel facility that the
audit subsystem could use to identify containers, but we don't, and
previous attempts have failed, so we have to create our own.  We are
intentionally trying to limit its scope in an attempt to limit
problems.  If a more general solution appears in the future I think we
would make every effect to migrate to that; keeping this initial
effort small should make that easier.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread Steve Grubb

On Tuesday, October 17, 2017 1:57:43 PM EDT James Bottomley wrote:
> > > > The idea is that processes spawned into a container would be
> > > > labelled by the container orchestration system.  It's unclear
> > > > what should happen to processes using nsenter after the fact, but
> > > > policy for that should be up to the orchestration system.
> > > 
> > > I'm fine with that. The user space policy can be anything y'all
> > > like.
> > 
> > I think there should be a login event.
> 
> I thought you wanted this for containers?  Container creation doesn't
> have login events.  In an unprivileged orchestration system it may be
> hard to synthetically manufacture them.

I realize this. This work is very similar to problems we've solved 12 years 
ago. We'll figure out what the right name is for it down the road. But the 
concept is the same. If something enters a container, we need to know about 
it. It needs to get tagged and be associated with the container. The way this 
was solved for the loginuid problem was to add a session identifier so that 
new logins of the same loginuid can coexist and we can trace actions back to a 
specific login. I'd think we can apply lessons learned from a while back to 
make container identification act similarly.

-Steve

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread James Bottomley

On Tue, 2017-10-17 at 13:15 -0400, Steve Grubb wrote:
> On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote:
> > 
> > > 
> > > The idea is that processes spawned into a container would be
> > > labelled by the container orchestration system.  It's unclear
> > > what should happen to processes using nsenter after the fact, but
> > > policy for that should be up to the orchestration system.
> > 
> > I'm fine with that. The user space policy can be anything y'all
> > like.
> 
> I think there should be a login event.

I thought you wanted this for containers?  Container creation doesn't
have login events.  In an unprivileged orchestration system it may be
hard to synthetically manufacture them.

James

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread Steve Grubb

On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote:
> > The idea is that processes spawned into a container would be labelled
> > by the container orchestration system.  It's unclear what should happen
> > to processes using nsenter after the fact, but policy for that should
> > be up to the orchestration system.
> 
> I'm fine with that. The user space policy can be anything y'all like.

I think there should be a login event.


> > The label will be used as a tag for audit information.
> 
> Deep breath ...
> 
> Which *is* a kernel security policy mechanism. Since the "label"
> is part of the audit information that the kernel is guaranteeing
> changing it would be covered by CAP_AUDIT_CONTROL. If the kernel
> does not use the "label" for any other purpose this is the only
> capability that makes sense for it.

I agree. The ability to set the container label grants the ability to evade 
rules or modify audit rules. CAP_AUDIT_CONTROL makes sense to me.


> > I think you were missing label inheritance above.
> > 
> > The security implications are that anything that can change the label
> > could also hide itself and its doings from the audit system and thus
> > would be used as a means to evade detection.

Yes. We have the same problem with loginuid. There are restrictions on who can 
change it once set. And then we made an immutable flag so that people that 
want a hard guarantee can get that.

-Steve

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread Casey Schaufler

On 10/17/2017 8:44 AM, James Bottomley wrote:
> On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
>>> Without a *kernel* policy on containerIDs you can't say what
>>> security policy is being exempted.
>> The policy has been basically stated earlier.
>>
>> A way to track a set of processes from a specific point in time
>> forward. The name used is "container id", but it could be anything.
>> This marker is mostly used by user space to track process hierarchies
>> without races, these processes can be very privileged, and must not
>> be allowed to change the marker themselves when granted the current
>> common capabilities.
>>
>> Is this a good enough description ? If not can you clarify your
>> expectations ?
> I think you mean you want to be able to apply a label to a process
> which is inherited across forks.

That would be PTAGS. I agree that such a general mechanism
could be very useful for a variety of purposes, not just
containers. I do not agree that a single integer (e.g. a
containerID) warrants more than trivial mechanism.

> The label should only be susceptible
> to modification by something possessing a capability (which one TBD).

I think that the reason we're going to have crying and gnashing
of teeth is that whatever capability is used. There will always be
an issue of the capability granted being less specific than the
application security model would like.

And no, we're not going down the 330 capabilities road. It's been
done in the UNIX world. Application security models hate that
just as much as they hate the coarser granularity.

> The idea is that processes spawned into a container would be labelled
> by the container orchestration system.  It's unclear what should happen
> to processes using nsenter after the fact, but policy for that should
> be up to the orchestration system.

I'm fine with that. The user space policy can be anything y'all like.

> The label will be used as a tag for audit information.

Deep breath ...

Which *is* a kernel security policy mechanism. Since the "label"
is part of the audit information that the kernel is guaranteeing
changing it would be covered by CAP_AUDIT_CONTROL. If the kernel
does not use the "label" for any other purpose this is the only
capability that makes sense for it.

> I think you were missing label inheritance above.
>
> The security implications are that anything that can change the label
> could also hide itself and its doings from the audit system and thus
> would be used as a means to evade detection.  

Yes. This is a consequence of the capability granularity. There is
no way we can make the capability granularity sufficiently fine to
prevent this. No one wants the 330 capabilities that Data General
had in their secure UNIX system. 

> I actually think this
> means the label should be write once (once you've set it, you can't
> change it) and orchestration systems should begin as unlabelled
> processes allowing them to do arbitrary forks.
>
> For nested containers, I actually think the label should be
> hierarchical, so you can add a label for the new nested container but
> it still also contains its parents label as well.

You can't support this reasonably with a single containerID.
You want PTAGS for this. I know that there is resistance to
requiring anything beyond what's in the base kernel (and for
good reasons) for containers. Especially something that is
pending future work. But let's not jam something into the base
kernel that isn't really going to address the issue.

> James

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread Casey Schaufler

On 10/17/2017 8:28 AM, Simo Sorce wrote:
> On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote:
>> On 10/17/2017 5:31 AM, Simo Sorce wrote:
>>> On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
 On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs
 wrote:
> There is such a thing, but the kernel doesn't know about it
> yet.  This same situation exists for loginuid and sessionid
> which
> are userspace concepts that the kernel tracks for the
> convenience
> of userspace.  As for its name, I'm not particularly picky, so
> if
> you don't like CAP_CONTAINER_* then I'm fine with
> CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
> CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to
> give
> the ability to set a containerID to any process that is able to
> do
> audit logging (such as vsftpd) and similarly we don't want to
> give
> the orchestrator the ability to control the setup of the audit
> daemon.
 A long time ago, we were debating what should guard against rouge
 processes from setting the loginuid. Casey argued that the
 ability to
 set the loginuid means they have the ability to control the audit
 trail. That means that it should be guarded by CAP_AUDIT_CONTROL.
 I
 think the same logic applies today. 
>>> The difference is that with loginuid you needed to give processes
>>> able
>>> to audit also the ability to change it. You do not want to tie the
>>> ability to change container ids to the ability to audit. You want
>>> to be
>>> able to do audit stuff (within the container) without allowing it
>>> to
>>> change the container id.
>> Without a *kernel* policy on containerIDs you can't say what
>> security policy is being exempted.
> The policy has been basically stated earlier.

No. The expected user space behavior has been stated.

> A way to track a set of processes from a specific point in time
> forward. The name used is "container id", but it could be anything.

Then you want Jose Bollo's PTAGS. It's insane to add yet another
arbitrary ID to the task for a special purpose. Add a general tagging
mechanism instead. We could add a gazillion new id's, each with it's
own capability if we head down this road.

> This marker is mostly used by user space to track process hierarchies
> without races, these processes can be very privileged, and must not be
> allowed to change the marker themselves when granted the current common
> capabilities.

Let's be clear. What happens in user space stays in user space.
The kernel does not give a fig about user space policy. There has
to be a kernel policy involved that a capability can exempt.

> Is this a good enough description ? If not can you clarify your
> expectations ?

The kernel enforces kernel policy. Capabilities provide a mechanism
to mark a process as exempt from some aspect of kernel policy. If
you don't have a kernel policy, you don't get a capability. Clear?

>
>>  Without that you can't say what capability is (or isn't)
>> appropriate.
> See if the above is sufficient please.
>
>> You need a reason to have a capability check that makes sense in the
>> context of the kernel security policy.
> I think the proposal had a reason, we may debate on whether that reason
> is good enough.
>
>> Since we don't know what a container is in the kernel,
> Please do not fixate on the word container.
>
>>  that's pretty hard. We don't create "fuzzy" capabilities
>> based on the trendy application behavior of the moment. If the
>> behavior is not related it audit, there's no reason for it, and
>> if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
>> in your application security model I suggest that is where you
>> need to make changes.
> The authors of the proposal came to the conclusion that kernel
> assistance is needed. It would be nice to discuss the merits of it.
> If you do not understand why the request has been made it would be more
> useful to ask specific questions to understand what and why is the ask.

I understand pretty darn well.

> Pushing back is fine, if you have understood the problem and have valid
> arguments against a kernel level solution (and possibly suggestions for
> a working user space solution), otherwise you are not adding value to
> the discussion.

The presumption is that the request is reasonable. Adding a capability
in support of an undefined behavior is unreasonable. Based on the discussion,
CAP_AUDIT_CONTROL is completely rational. I understand that it would be
difficult to support your application privilege model. I would like to look
into helping out with that, but have too many burning knives in the air
just now.

>
> Simo.
>

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread James Bottomley

On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
> > Without a *kernel* policy on containerIDs you can't say what
> > security policy is being exempted.
> 
> The policy has been basically stated earlier.
> 
> A way to track a set of processes from a specific point in time
> forward. The name used is "container id", but it could be anything.
> This marker is mostly used by user space to track process hierarchies
> without races, these processes can be very privileged, and must not
> be allowed to change the marker themselves when granted the current
> common capabilities.
> 
> Is this a good enough description ? If not can you clarify your
> expectations ?

I think you mean you want to be able to apply a label to a process
which is inherited across forks.  The label should only be susceptible
to modification by something possessing a capability (which one TBD).
 The idea is that processes spawned into a container would be labelled
by the container orchestration system.  It's unclear what should happen
to processes using nsenter after the fact, but policy for that should
be up to the orchestration system.

The label will be used as a tag for audit information.

I think you were missing label inheritance above.

The security implications are that anything that can change the label
could also hide itself and its doings from the audit system and thus
would be used as a means to evade detection.  I actually think this
means the label should be write once (once you've set it, you can't
change it) and orchestration systems should begin as unlabelled
processes allowing them to do arbitrary forks.

For nested containers, I actually think the label should be
hierarchical, so you can add a label for the new nested container but
it still also contains its parents label as well.

James

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread Simo Sorce

On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote:
> On 10/17/2017 5:31 AM, Simo Sorce wrote:
> > On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
> > > On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs
> > > wrote:
> > > > There is such a thing, but the kernel doesn't know about it
> > > > yet.  This same situation exists for loginuid and sessionid
> > > > which
> > > > are userspace concepts that the kernel tracks for the
> > > > convenience
> > > > of userspace.  As for its name, I'm not particularly picky, so
> > > > if
> > > > you don't like CAP_CONTAINER_* then I'm fine with
> > > > CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
> > > > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to
> > > > give
> > > > the ability to set a containerID to any process that is able to
> > > > do
> > > > audit logging (such as vsftpd) and similarly we don't want to
> > > > give
> > > > the orchestrator the ability to control the setup of the audit
> > > > daemon.
> > > 
> > > A long time ago, we were debating what should guard against rouge
> > > processes from setting the loginuid. Casey argued that the
> > > ability to
> > > set the loginuid means they have the ability to control the audit
> > > trail. That means that it should be guarded by CAP_AUDIT_CONTROL.
> > > I
> > > think the same logic applies today. 
> > 
> > The difference is that with loginuid you needed to give processes
> > able
> > to audit also the ability to change it. You do not want to tie the
> > ability to change container ids to the ability to audit. You want
> > to be
> > able to do audit stuff (within the container) without allowing it
> > to
> > change the container id.
> 
> Without a *kernel* policy on containerIDs you can't say what
> security policy is being exempted.

The policy has been basically stated earlier.

A way to track a set of processes from a specific point in time
forward. The name used is "container id", but it could be anything.
This marker is mostly used by user space to track process hierarchies
without races, these processes can be very privileged, and must not be
allowed to change the marker themselves when granted the current common
capabilities.

Is this a good enough description ? If not can you clarify your
expectations ?

>  Without that you can't say what capability is (or isn't)
> appropriate.

See if the above is sufficient please.

> You need a reason to have a capability check that makes sense in the
> context of the kernel security policy.

I think the proposal had a reason, we may debate on whether that reason
is good enough.

> Since we don't know what a container is in the kernel,

Please do not fixate on the word container.

>  that's pretty hard. We don't create "fuzzy" capabilities
> based on the trendy application behavior of the moment. If the
> behavior is not related it audit, there's no reason for it, and
> if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
> in your application security model I suggest that is where you
> need to make changes.

The authors of the proposal came to the conclusion that kernel
assistance is needed. It would be nice to discuss the merits of it.
If you do not understand why the request has been made it would be more
useful to ask specific questions to understand what and why is the ask.

Pushing back is fine, if you have understood the problem and have valid
arguments against a kernel level solution (and possibly suggestions for
a working user space solution), otherwise you are not adding value to
the discussion. 

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread Simo Sorce

On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
> On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote:

> > There is such a thing, but the kernel doesn't know about it
> > yet.  This same situation exists for loginuid and sessionid which
> > are userspace concepts that the kernel tracks for the convenience
> > of userspace.  As for its name, I'm not particularly picky, so if
> > you don't like CAP_CONTAINER_* then I'm fine with
> > CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
> > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to give
> > the ability to set a containerID to any process that is able to do
> > audit logging (such as vsftpd) and similarly we don't want to give
> > the orchestrator the ability to control the setup of the audit
> > daemon.
> 
> A long time ago, we were debating what should guard against rouge
> processes from setting the loginuid. Casey argued that the ability to
> set the loginuid means they have the ability to control the audit
> trail. That means that it should be guarded by CAP_AUDIT_CONTROL. I
> think the same logic applies today. 

The difference is that with loginuid you needed to give processes able
to audit also the ability to change it. You do not want to tie the
ability to change container ids to the ability to audit. You want to be
able to do audit stuff (within the container) without allowing it to
change the container id.
Of course if we made container id a write-once property maybe there is
no need for controls at all, but I'm pretty sure there will be
situations where write-once may not be usable in practice.

> The ability to arbitrarily set a container ID means the process has
> the ability to indirectly control the audit trail.

The container Id can be used also for authorization purposes (by other
processes on the host), not just audit, I think this is why a separate
control has been proposed.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-16 Thread Steve Grubb

On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote:
> On 2017-10-12 16:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> > > Containers are a userspace concept.  The kernel knows nothing of them.
> > > 
> > > The Linux audit system needs a way to be able to track the container
> > > provenance of events and actions.  Audit needs the kernel's help to do
> > > this.
> > > 
> > > Since the concept of a container is entirely a userspace concept, a
> > > registration from the userspace container orchestration system initiates
> > > this.  This will define a point in time and a set of resources
> > > associated with a particular container with an audit container ID.
> > > 
> > > The registration is a pseudo filesystem (proc, since PID tree already
> > > exists) write of a u8[16] UUID representing the container ID to a file
> > > representing a process that will become the first process in a new
> > > container.  This write might place restrictions on mount namespaces
> > > required to define a container, or at least careful checking of
> > > namespaces in the kernel to verify permissions of the orchestrator so it
> > > can't change its own container ID.  A bind mount of nsfs may be
> > > necessary in the container orchestrator's mntNS.
> > > Note: Use a 128-bit scalar rather than a string to make compares faster
> > > and simpler.
> > > 
> > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > > registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> 
> There is such a thing, but the kernel doesn't know about it yet.  This
> same situation exists for loginuid and sessionid which are userspace
> concepts that the kernel tracks for the convenience of userspace.  As
> for its name, I'm not particularly picky, so if you don't like
> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> don't want to give the ability to set a containerID to any process that
> is able to do audit logging (such as vsftpd) and similarly we don't want
> to give the orchestrator the ability to control the setup of the audit
> daemon.

A long time ago, we were debating what should guard against rouge processes 
from setting the loginuid. Casey argued that the ability to set the loginuid 
means they have the ability to control the audit trail. That means that it 
should be guarded by CAP_AUDIT_CONTROL. I think the same logic applies today. 

The ability to arbitrarily set a container ID means the process has the 
ability to indirectly control the audit trail.

-Steve

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-16 Thread Casey Schaufler

On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
> On 2017-10-12 16:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions.  Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this.  This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
> There is such a thing, but the kernel doesn't know about it yet.

Then how can it be the kernel's place to control access to a
container resource, that is, the containerID.

>   This
> same situation exists for loginuid and sessionid which are userspace
> concepts that the kernel tracks for the convenience of userspace.

Ah, no. Loginuid identifies a user, which is a kernel concept in
that a user is defined by the uid. The session ID has well defined
kernel semantics. You're trying to say that the containerID is an
opaque value that is meaningless to the kernel, but you still want
the kernel to protect it. How can the kernel know if it is protecting
it correctly?

>   As
> for its name, I'm not particularly picky, so if you don't like
> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> don't want to give the ability to set a containerID to any process that
> is able to do audit logging (such as vsftpd) and similarly we don't want
> to give the orchestrator the ability to control the setup of the audit
> daemon.

Sorry, but what aspect of the kernel security policy is this
capability supposed to protect? That's what capabilities are
for, not the undefined support of undefined user-space behavior.

If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
more than audit behavior you have to define what system security
policy you're dealing with in order to pick the right capability.

We get this request pretty regularly. "I need my own capability
because I have a niche thing that isn't part of the system security
policy but that is important!" Fit the containerID into the
system security policy, and if that results in using CAP_SYS_ADMIN,
oh well.

>>>   At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
>>>
>>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>>> container ID present on an auditable action or event.
>>>
>>> Forked and cloned processes inherit their parent's container ID,
>>> referenced in the process' task_struct.
>>>
>>> Mimic setns(2) and return an error if the process has already initiated
>>> threading or forked since this registration should happen before the
>>> process execution is started by the orchestrator and hence should not
>>> yet have any threads or children.  If this is deemed overly restrictive,
>>> switch all threads and children to the new containerID.
>>>
>>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>>>
>>> Log the creation of every namespace, inheriting/adding its spawning
>>> process' containerID(s), if applicable.  Include the spawning and
>>> spawned namespace IDs (device and inode number tuples).
>>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>>> Note: At this point it appears only network namespaces may need to track
>>> container IDs apart from processes since incoming packets may cause an
>>> auditable event before being associated with a

Re: RFC(v2): Audit Kernel Container IDs

2017-10-16 Thread Richard Guy Briggs

On 2017-10-12 16:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> > Containers are a userspace concept.  The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions.  Audit needs the kernel's help to do
> > this.
> >
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this.  This will define a point in time and a set of resources
> > associated with a particular container with an audit container ID.
> >
> > The registration is a pseudo filesystem (proc, since PID tree already
> > exists) write of a u8[16] UUID representing the container ID to a file
> > representing a process that will become the first process in a new
> > container.  This write might place restrictions on mount namespaces
> > required to define a container, or at least careful checking of
> > namespaces in the kernel to verify permissions of the orchestrator so it
> > can't change its own container ID.  A bind mount of nsfs may be
> > necessary in the container orchestrator's mntNS.
> > Note: Use a 128-bit scalar rather than a string to make compares faster
> > and simpler.
> >
> > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > registration.
> 
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?

There is such a thing, but the kernel doesn't know about it yet.  This
same situation exists for loginuid and sessionid which are userspace
concepts that the kernel tracks for the convenience of userspace.  As
for its name, I'm not particularly picky, so if you don't like
CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
don't want to give the ability to set a containerID to any process that
is able to do audit logging (such as vsftpd) and similarly we don't want
to give the orchestrator the ability to control the setup of the audit
daemon.
> 
> >   At that time, record the target container's user-supplied
> > container identifier along with the target container's first process
> > (which may become the target container's "init" process) process ID
> > (referenced from the initial PID namespace), all namespace IDs (in the
> > form of a nsfs device number and inode number tuple) in a new auxilliary
> > record AUDIT_CONTAINER with a qualifying op=$action field.
> >
> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> >
> > Forked and cloned processes inherit their parent's container ID,
> > referenced in the process' task_struct.
> >
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children.  If this is deemed overly restrictive,
> > switch all threads and children to the new containerID.
> >
> > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> >
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' containerID(s), if applicable.  Include the spawning and
> > spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process.
> >
> > Log the destruction of every namespace when it is no longer used by any
> > process, include the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >
> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> >
> > When a container ceases to exist because the last process in that
> > container has exited and hence the last namespace has been destroyed and
> > its refcount dropping to zero, log the fact.
> > (This latter is likely needed for certification accountability.)  A
> > container object may need a list of processes and/or namespaces.
> >
> > A namespace cannot directly migrate from one container to another but
> > could be assigned to a newly spawned container.  A namespace can be
> > moved from one container to another indirectly by having that namespace
> > used in a second process in another

Re: RFC(v2): Audit Kernel Container IDs

2017-10-13 Thread Alan Cox

On Thu, 12 Oct 2017 10:14:00 -0400
Richard Guy Briggs  wrote:

> Containers are a userspace concept.  The kernel knows nothing of them.
> 
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
> 
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.

I don't think this has anything to do with containers directly. If i
read it right you need a subtree of stuff to be asigned a (possibly
irrevocable) magic identifier that you can use for other purposes.

Traditional Unix in the more 'secure' space had that decades ago in the
form of luid. At login time you did a setluid() and that set an
irrevocable tag onthe session which was (traditionally) the uid of the
login process so that audit and other related tools always knew how to
tie the process back to the login session.

That doesn't quite work as of itself (if you login you'd get luid set and
not be able to change it for the container), but it seems something
similarly trivial like a "setauditid(void)" would do the trick providing
the kernel picked the UUID randomly [otherwise I can copy another known
UUID to confuse or hide].

As you say a container is a userspace concept. So IMHO any audit
interface should be about auditing and what needs tracking, not about
containers. If the container management tool wants to set a suitable tag
then let it. If not then it doesn't.

Then it's a simple as checking CAP_AUDIT_WRITE to see if you are allowed
to setauditit(), generating a random uuid and a matching getauditid() to
copy it back.

Alan

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-12 Thread Eric W. Biederman

Richard Guy Briggs  writes:

> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.

Ugh no.  The semantics here are way too mushy.  We need a clean crisp
unambiguous definition or it will be impossible to get this correct and
impossible to use for any security purpose.

I understand the challenge.  Some of the container managers share
namespaces between containers.  Leading to things that are not really
contained.

Please make this concept like an indellibale die.  Once you are stained
with it you can not escape.  If you don't meet all of the criteria you
aren't stained.

The justification that I heard, and that seems legitimate is that it is
not timely and it is hard to make the connection between the distinct
unshare, setns, and clone events and what is happening in the kernel.

With that justification definitely the network namespace needs to be
stained if it is appropriate.

I also don't see why this can't be a special dedicated audit message.
I just looked at the code in the kernel and nlmsg_type is a u16.  There
are only a handful of audit message types defined.  There is absolutely
no reason to bring proc into this.

I have the same reservation as the others about defining a new cap for
this.  It should be enough to make setting the container id a one time
thing for a set of processes and namespaces.

If this is going to be security it needs to be very simple and very well 
defined.

Eric

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-12 Thread Casey Schaufler

On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.
>
> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
>
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.

Hang on. If containers are a user space concept, how can
you want CAP_CONTAINER_ANYTHING? If there's not such thing as
a container, how can you be asking for a capability to manage
them?

>   At that time, record the target container's user-supplied
> container identifier along with the target container's first process
> (which may become the target container's "init" process) process ID
> (referenced from the initial PID namespace), all namespace IDs (in the
> form of a nsfs device number and inode number tuple) in a new auxilliary
> record AUDIT_CONTAINER with a qualifying op=$action field.
>
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' task_struct.
>
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children.  If this is deemed overly restrictive,
> switch all threads and children to the new containerID.
>
> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable.  Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
>
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.)  A
> container object may need a list of processes and/or namespaces.
>
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.
>
> (v2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and children 
> into same container
>
> - RGB
>
> --
> Richard Guy Briggs 
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit
>

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

2017-10-12 Thread Steve Grubb

On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
> 
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
> 
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.

The requirements for common criteria around containers should be very closely 
modeled on the requirements for virtualization. It would be the container 
manager that is responsible for logging the resource assignment events.


> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
> 
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.

Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.


> At that time, record the target container's user-supplied
> container identifier along with the target container's first process
> (which may become the target container's "init" process) process ID
> (referenced from the initial PID namespace), all namespace IDs (in the
> form of a nsfs device number and inode number tuple) in a new auxilliary
> record AUDIT_CONTAINER with a qualifying op=$action field.

This would be in addition to the normal audit fields.

> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
> 
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' task_struct.
> 
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children.  If this is deemed overly restrictive,
> switch all threads and children to the new containerID.
> 
> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> 
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable.  Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
> 
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]

In the virtualization requirements, we only log removal of resources when 
something is removed by intention. If the VM shuts down, the manager issues a 
VIRT_CONTROL stop event and the user space utilities knows this means all 
resources have been unassigned.

> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
> 
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.)  A
> container object may need a list of processes and/or namespaces.
> 
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.

I'm thinking that there needs to be a clear delineation between what the 
container manager is responsible for and what the kernel needs to do. The 
kernel needs the registration system and to associate an identifier with 
events inside the container.

But would the container manager be mostly responsible for auditing

RFC(v2): Audit Kernel Container IDs

2017-10-12 Thread Richard Guy Briggs

Containers are a userspace concept.  The kernel knows nothing of them.

The Linux audit system needs a way to be able to track the container
provenance of events and actions.  Audit needs the kernel's help to do
this.

Since the concept of a container is entirely a userspace concept, a
registration from the userspace container orchestration system initiates
this.  This will define a point in time and a set of resources
associated with a particular container with an audit container ID.

The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator so it
can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.  At that time, record the target container's user-supplied
container identifier along with the target container's first process
(which may become the target container's "init" process) process ID
(referenced from the initial PID namespace), all namespace IDs (in the
form of a nsfs device number and inode number tuple) in a new auxilliary
record AUDIT_CONTAINER with a qualifying op=$action field.

Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
container ID present on an auditable action or event.

Forked and cloned processes inherit their parent's container ID,
referenced in the process' task_struct.

Mimic setns(2) and return an error if the process has already initiated
threading or forked since this registration should happen before the
process execution is started by the orchestrator and hence should not
yet have any threads or children.  If this is deemed overly restrictive,
switch all threads and children to the new containerID.

Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.

Log the creation of every namespace, inheriting/adding its spawning
process' containerID(s), if applicable.  Include the spawning and
spawned namespace IDs (device and inode number tuples).
[AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
Note: At this point it appears only network namespaces may need to track
container IDs apart from processes since incoming packets may cause an
auditable event before being associated with a process.

Log the destruction of every namespace when it is no longer used by any
process, include the namespace IDs (device and inode number tuples).
[AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]

Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
the parent and child namespace IDs for any changes to a process'
namespaces. [setns(2)]
Note: It may be possible to combine AUDIT_NS_* record formats and
distinguish them with an op=$action field depending on the fields
required for each message type.

When a container ceases to exist because the last process in that
container has exited and hence the last namespace has been destroyed and
its refcount dropping to zero, log the fact.
(This latter is likely needed for certification accountability.)  A
container object may need a list of processes and/or namespaces.

A namespace cannot directly migrate from one container to another but
could be assigned to a newly spawned container.  A namespace can be
moved from one container to another indirectly by having that namespace
used in a second process in another container and then ending all the
processes in the first container.

(v2)
- switch from u64 to u128 UUID
- switch from "signal" and "trigger" to "register"
- restrict registration to single process or force all threads and children 
into same container

- RGB

--
Richard Guy Briggs 
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

Re: RFC(v2): Audit Kernel Container IDs

RFC(v2): Audit Kernel Container IDs

35 matches

Site Navigation

Mail list logo

Footer information