Re: RFC(v2): Audit Kernel Container IDs
On Monday, December 11, 2017 11:30:57 AM EST Eric Paris wrote: > > Because a container doesn't have to use namespaces to be a container > > you still need a mechanism for a process to declare that it is in > > fact > > in a container, and to identify the container. > > I like the idea but I'm still tossing it around in my head (and > thinking about Casey's statement too). Lets say we have a 'docker-like' > container with pid=100 netns=X,userns=Y,mountns=Z. If I'm on the host > in all init namespaces and I run > nsenter -t 100 -n ip link set eth0 promisc on > How should this be logged? If it is a normal process, then everything would match the init name space and you wouldn't have entered a container. If it were a container, any generated event should have the container ID from registration attached to it. > Did this command run in it's own 'container' unrelated to the 'docker-like' > container? That should be determined by what's in the task struct. -Steve -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 12/11/2017 8:30 AM, Eric Paris wrote: > On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote: >> Because a container doesn't have to use namespaces to be a container >> you still need a mechanism for a process to declare that it is in >> fact >> in a container, and to identify the container. > I like the idea but I'm still tossing it around in my head (and > thinking about Casey's statement too). Lets say we have a 'docker-like' > container with pid=100 netns=X,userns=Y,mountns=Z. If I'm on the host > in all init namespaces and I run > nsenter -t 100 -n ip link set eth0 promisc on > How should this be logged? Did this command run in it's own 'container' > unrelated to the 'docker-like' container? Jose Bollo's PTAGS ( https://gitlab.com/jobol/ptags ) would be prefect. Any time you declare something to be a container or enter a namespace you slap a tag on it. Identifying nested containers would be easy, you'd have multiple tags. PTAGS unfortunately needs module stacking, but how hard could that be? > -Eric -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote: > On 12/9/2017 2:20 AM, Micka�l Sala�n wrote: > > What about automatically create > > and assign an ID to a process when it enters a namespace different > > than > > one of its parent process? This delegates the (permission) > > responsibility to the use of namespaces (e.g. /proc/sys/user/max_* > > limit). > > That gets ugly when you have a container that uses user, filesystem, > network and whatever else namespaces. If all containers used the same > set of namespaces I think this would be a fine idea, but they don't. > > > One interesting side effect of this approach would be to be able to > > identify which processes are in the same set of namespaces, even if > > not > > spawn from the container but entered after its creation (i.e. using > > setns), by creating container IDs as a (deterministic) checksum > > from the > > /proc/self/ns/* IDs. > > > > Since the concern is to identify a container, I think the ability > > to > > audit the switch from one container ID to another is enough. I > > don't > > think we need nested IDs. > > Because a container doesn't have to use namespaces to be a container > you still need a mechanism for a process to declare that it is in > fact > in a container, and to identify the container. I like the idea but I'm still tossing it around in my head (and thinking about Casey's statement too). Lets say we have a 'docker-like' container with pid=100 netns=X,userns=Y,mountns=Z. If I'm on the host in all init namespaces and I run nsenter -t 100 -n ip link set eth0 promisc on How should this be logged? Did this command run in it's own 'container' unrelated to the 'docker-like' container? -Eric -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 2017-12-09 11:20, Mickaël Salaün wrote: > > On 12/10/2017 18:33, Casey Schaufler wrote: > > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: > >> Containers are a userspace concept. The kernel knows nothing of them. > >> > >> The Linux audit system needs a way to be able to track the container > >> provenance of events and actions. Audit needs the kernel's help to do > >> this. > >> > >> Since the concept of a container is entirely a userspace concept, a > >> registration from the userspace container orchestration system initiates > >> this. This will define a point in time and a set of resources > >> associated with a particular container with an audit container ID. > >> > >> The registration is a pseudo filesystem (proc, since PID tree already > >> exists) write of a u8[16] UUID representing the container ID to a file > >> representing a process that will become the first process in a new > >> container. This write might place restrictions on mount namespaces > >> required to define a container, or at least careful checking of > >> namespaces in the kernel to verify permissions of the orchestrator so it > >> can't change its own container ID. A bind mount of nsfs may be > >> necessary in the container orchestrator's mntNS. > >> Note: Use a 128-bit scalar rather than a string to make compares faster > >> and simpler. > >> > >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the > >> registration. > > > > Hang on. If containers are a user space concept, how can > > you want CAP_CONTAINER_ANYTHING? If there's not such thing as > > a container, how can you be asking for a capability to manage > > them? > > > >> At that time, record the target container's user-supplied > >> container identifier along with the target container's first process > >> (which may become the target container's "init" process) process ID > >> (referenced from the initial PID namespace), all namespace IDs (in the > >> form of a nsfs device number and inode number tuple) in a new auxilliary > >> record AUDIT_CONTAINER with a qualifying op=$action field. > > Here is an idea to avoid privilege problems or the need for a new > capability: make it automatic. What makes a container a container seems > to be the use of at least a namespace. What about automatically create > and assign an ID to a process when it enters a namespace different than > one of its parent process? This delegates the (permission) > responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit). A container doesn't imply a namespace and vice versa. > One interesting side effect of this approach would be to be able to > identify which processes are in the same set of namespaces, even if not > spawn from the container but entered after its creation (i.e. using > setns), by creating container IDs as a (deterministic) checksum from the > /proc/self/ns/* IDs. This would be really helpful, but it isn't the case. > Since the concern is to identify a container, I think the ability to > audit the switch from one container ID to another is enough. I don't > think we need nested IDs. Since container namespace membership is arbitrary between container orchestrators, this needs a registration process and a way for the container orchestrator to know the ID. I completely agree with Casey here. > As a side note, you may want to take a look at the Linux-VServer's XID. > > Regards, > Mickaël - RGB -- Richard Guy BriggsSr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635 -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 12/10/2017 18:33, Casey Schaufler wrote: > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: >> Containers are a userspace concept. The kernel knows nothing of them. >> >> The Linux audit system needs a way to be able to track the container >> provenance of events and actions. Audit needs the kernel's help to do >> this. >> >> Since the concept of a container is entirely a userspace concept, a >> registration from the userspace container orchestration system initiates >> this. This will define a point in time and a set of resources >> associated with a particular container with an audit container ID. >> >> The registration is a pseudo filesystem (proc, since PID tree already >> exists) write of a u8[16] UUID representing the container ID to a file >> representing a process that will become the first process in a new >> container. This write might place restrictions on mount namespaces >> required to define a container, or at least careful checking of >> namespaces in the kernel to verify permissions of the orchestrator so it >> can't change its own container ID. A bind mount of nsfs may be >> necessary in the container orchestrator's mntNS. >> Note: Use a 128-bit scalar rather than a string to make compares faster >> and simpler. >> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the >> registration. > > Hang on. If containers are a user space concept, how can > you want CAP_CONTAINER_ANYTHING? If there's not such thing as > a container, how can you be asking for a capability to manage > them? > >> At that time, record the target container's user-supplied >> container identifier along with the target container's first process >> (which may become the target container's "init" process) process ID >> (referenced from the initial PID namespace), all namespace IDs (in the >> form of a nsfs device number and inode number tuple) in a new auxilliary >> record AUDIT_CONTAINER with a qualifying op=$action field. Here is an idea to avoid privilege problems or the need for a new capability: make it automatic. What makes a container a container seems to be the use of at least a namespace. What about automatically create and assign an ID to a process when it enters a namespace different than one of its parent process? This delegates the (permission) responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit). One interesting side effect of this approach would be to be able to identify which processes are in the same set of namespaces, even if not spawn from the container but entered after its creation (i.e. using setns), by creating container IDs as a (deterministic) checksum from the /proc/self/ns/* IDs. Since the concern is to identify a container, I think the ability to audit the switch from one container ID to another is enough. I don't think we need nested IDs. As a side note, you may want to take a look at the Linux-VServer's XID. Regards, Mickaël signature.asc Description: OpenPGP digital signature -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 12/9/2017 2:20 AM, Micka�l Sala�n wrote: > On 12/10/2017 18:33, Casey Schaufler wrote: >> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: >>> Containers are a userspace concept. The kernel knows nothing of them. >>> >>> The Linux audit system needs a way to be able to track the container >>> provenance of events and actions. Audit needs the kernel's help to do >>> this. >>> >>> Since the concept of a container is entirely a userspace concept, a >>> registration from the userspace container orchestration system initiates >>> this. This will define a point in time and a set of resources >>> associated with a particular container with an audit container ID. >>> >>> The registration is a pseudo filesystem (proc, since PID tree already >>> exists) write of a u8[16] UUID representing the container ID to a file >>> representing a process that will become the first process in a new >>> container. This write might place restrictions on mount namespaces >>> required to define a container, or at least careful checking of >>> namespaces in the kernel to verify permissions of the orchestrator so it >>> can't change its own container ID. A bind mount of nsfs may be >>> necessary in the container orchestrator's mntNS. >>> Note: Use a 128-bit scalar rather than a string to make compares faster >>> and simpler. >>> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the >>> registration. >> Hang on. If containers are a user space concept, how can >> you want CAP_CONTAINER_ANYTHING? If there's not such thing as >> a container, how can you be asking for a capability to manage >> them? >> >>> At that time, record the target container's user-supplied >>> container identifier along with the target container's first process >>> (which may become the target container's "init" process) process ID >>> (referenced from the initial PID namespace), all namespace IDs (in the >>> form of a nsfs device number and inode number tuple) in a new auxilliary >>> record AUDIT_CONTAINER with a qualifying op=$action field. > Here is an idea to avoid privilege problems or the need for a new > capability: make it automatic. What makes a container a container seems > to be the use of at least a namespace. You might think so, but I am assured that you can have a container without using namespaces. Intel's "Clear Containers", which use virtualization technology, are one example. I have considered creating "Smack Containers" using mandatory access control technology, more to press the point that "containers" is a marketing concept, not technology. > What about automatically create > and assign an ID to a process when it enters a namespace different than > one of its parent process? This delegates the (permission) > responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit). That gets ugly when you have a container that uses user, filesystem, network and whatever else namespaces. If all containers used the same set of namespaces I think this would be a fine idea, but they don't. > One interesting side effect of this approach would be to be able to > identify which processes are in the same set of namespaces, even if not > spawn from the container but entered after its creation (i.e. using > setns), by creating container IDs as a (deterministic) checksum from the > /proc/self/ns/* IDs. > > Since the concern is to identify a container, I think the ability to > audit the switch from one container ID to another is enough. I don't > think we need nested IDs. Because a container doesn't have to use namespaces to be a container you still need a mechanism for a process to declare that it is in fact in a container, and to identify the container. > > As a side note, you may want to take a look at the Linux-VServer's XID. > > Regards, > Micka�l > -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Thursday, October 19, 2017 7:11:33 PM EDT Aleksa Sarai wrote: > >>> The registration is a pseudo filesystem (proc, since PID tree already > >>> exists) write of a u8[16] UUID representing the container ID to a file > >>> representing a process that will become the first process in a new > >>> container. This write might place restrictions on mount namespaces > >>> required to define a container, or at least careful checking of > >>> namespaces in the kernel to verify permissions of the orchestrator so it > >>> can't change its own container ID. A bind mount of nsfs may be > >>> necessary in the container orchestrator's mntNS. > >>> Note: Use a 128-bit scalar rather than a string to make compares faster > >>> and simpler. > >>> > >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the > >>> registration. > >> > >> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. > > > > No, because then any process with that capability (vsftpd) could change > > its own container ID. This is discussed more in other parts of the > > thread... For the record, I changed my mind. CAP_AUDIT_CONTROL is the correct capability. > Not if we make the container ID append-only (to support nesting), or > write-once (the other idea thrown around). Well...I like to use lessons learned if they can be applied. In the normal world without containers we have uid, auid, and session_id. uid is who you are now, auid is how you got into the system, session_id distinguishes individual auids. We have a default auid of -1 for system objects and a real number for people. I think there should be the equivalent of auid and session_id but tailored for containers. Loginuid == container id. It can be set, overridden, or appended to (we'll figure this out later) in very limited circumstances. Container_session == session which is tamper-proof. This way things can enter a container with the same ID but under a different session. And everything else gets to inherit the original ID. This way we can trace actions to something that entered the container rather than normal system activity in the container. What a security officer wants to know is what did people do inside the system / container. The system objects we typically don't care about. Sure they might get hacked and then work on behalf of someone, but they would almost always pop a shell so that they can have freedom. That should set off an AVC or create other activity that gets picked up. -Steve > In that case, you can't move "out" from a particular container ID, you can > only go "deeper". These semantics don't make sense for generic containers, > but since the point of this facility is *specifically* for audit I imagine > that not being able to move a process from a sub-container's ID is a > benefit. -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
The registration is a pseudo filesystem (proc, since PID tree already exists) write of a u8[16] UUID representing the container ID to a file representing a process that will become the first process in a new container. This write might place restrictions on mount namespaces required to define a container, or at least careful checking of namespaces in the kernel to verify permissions of the orchestrator so it can't change its own container ID. A bind mount of nsfs may be necessary in the container orchestrator's mntNS. Note: Use a 128-bit scalar rather than a string to make compares faster and simpler. Require a new CAP_CONTAINER_ADMIN to be able to carry out the registration. Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. No, because then any process with that capability (vsftpd) could change its own container ID. This is discussed more in other parts of the thread... Not if we make the container ID append-only (to support nesting), or write-once (the other idea thrown around). In that case, you can't move "out" from a particular container ID, you can only go "deeper". These semantics don't make sense for generic containers, but since the point of this facility is *specifically* for audit I imagine that not being able to move a process from a sub-container's ID is a benefit. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
The registration is a pseudo filesystem (proc, since PID tree already exists) write of a u8[16] UUID representing the container ID to a file representing a process that will become the first process in a new container. This write might place restrictions on mount namespaces required to define a container, or at least careful checking of namespaces in the kernel to verify permissions of the orchestrator so it can't change its own container ID. A bind mount of nsfs may be necessary in the container orchestrator's mntNS. Note: Use a 128-bit scalar rather than a string to make compares faster and simpler. Require a new CAP_CONTAINER_ADMIN to be able to carry out the registration. Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. No, because then any process with that capability (vsftpd) could change its own container ID. This is discussed more in other parts of the thread... Not if we make the container ID append-only (to support nesting), or write-once (the other idea thrown around). In that case, you can't move "out" from a particular container ID, you can only go "deeper". These semantics don't make sense for generic containers, but since the point of this facility is *specifically* for audit I imagine that not being able to move a process from a sub-container's ID is a benefit. [This assumes it's CAP_AUDIT_CONTROL which is what we are discussing in a sister thread.] -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 2017-10-12 15:45, Steve Grubb wrote: > On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote: > > Containers are a userspace concept. The kernel knows nothing of them. > > > > The Linux audit system needs a way to be able to track the container > > provenance of events and actions. Audit needs the kernel's help to do > > this. > > > > Since the concept of a container is entirely a userspace concept, a > > registration from the userspace container orchestration system initiates > > this. This will define a point in time and a set of resources > > associated with a particular container with an audit container ID. > > The requirements for common criteria around containers should be very closely > modeled on the requirements for virtualization. It would be the container > manager that is responsible for logging the resource assignment events. I suspect we are in violent agreement here. > > The registration is a pseudo filesystem (proc, since PID tree already > > exists) write of a u8[16] UUID representing the container ID to a file > > representing a process that will become the first process in a new > > container. This write might place restrictions on mount namespaces > > required to define a container, or at least careful checking of > > namespaces in the kernel to verify permissions of the orchestrator so it > > can't change its own container ID. A bind mount of nsfs may be > > necessary in the container orchestrator's mntNS. > > Note: Use a 128-bit scalar rather than a string to make compares faster > > and simpler. > > > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the > > registration. > > Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. No, because then any process with that capability (vsftpd) could change its own container ID. This is discussed more in other parts of the thread... > > At that time, record the target container's user-supplied > > container identifier along with the target container's first process > > (which may become the target container's "init" process) process ID > > (referenced from the initial PID namespace), all namespace IDs (in the > > form of a nsfs device number and inode number tuple) in a new auxilliary > > record AUDIT_CONTAINER with a qualifying op=$action field. > > This would be in addition to the normal audit fields. It was intended that this be an auxilliary record, but this issue is being debated in threads about other upstream issues currently so I won't cover that here. > > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid > > container ID present on an auditable action or event. > > > > Forked and cloned processes inherit their parent's container ID, > > referenced in the process' task_struct. > > > > Mimic setns(2) and return an error if the process has already initiated > > threading or forked since this registration should happen before the > > process execution is started by the orchestrator and hence should not > > yet have any threads or children. If this is deemed overly restrictive, > > switch all threads and children to the new containerID. > > > > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN. > > > > Log the creation of every namespace, inheriting/adding its spawning > > process' containerID(s), if applicable. Include the spawning and > > spawned namespace IDs (device and inode number tuples). > > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)] > > Note: At this point it appears only network namespaces may need to track > > container IDs apart from processes since incoming packets may cause an > > auditable event before being associated with a process. > > > > Log the destruction of every namespace when it is no longer used by any > > process, include the namespace IDs (device and inode number tuples). > > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)] > > In the virtualization requirements, we only log removal of resources when > something is removed by intention. If the VM shuts down, the manager issues a > VIRT_CONTROL stop event and the user space utilities knows this means all > resources have been unassigned. Ok, this assumes the orchestrator is waiting on that child process (and that it is in turn waiting on all its children) so it knows when that job has exited naturally or errored out. I don't know if there is any consensus or best practice with orchestrators out there now. The kernel should know, so it seemed reasonable to report what was known. Besides, in this case, I was talking specifically about namespace creation and destruction rather than containers. > > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action) > > the parent and child namespace IDs for any changes to a process' > > namespaces. [setns(2)] > > Note: It may be possible to combine AUDIT_NS_* record formats and > > distinguish them with an op=$action field depending on the fields > >
Re: RFC(v2): Audit Kernel Container IDs
On Thu, Oct 19, 2017 at 12:25 PM, Eric W. Biedermanwrote: > Paul Moore writes: > >> On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman >> wrote: >>> Aleksa Sarai writes: >> The security implications are that anything that can change the label >> could also hide itself and its doings from the audit system and thus >> would be used as a means to evade detection. I actually think this >> means the label should be write once (once you've set it, you can't >> change it) ... > > Richard and I have talked about a write once approach, but the > thinking was that you may want to allow a nested container > orchestrator (Why? I don't know, but people always want to do the > craziest things.) and a write-once policy makes that impossible. If > we punt on the nested orchestrator, I believe we can seriously think > about a write-once policy to simplify things. Nested containers are a very widely used use-case (see LXC system containers, inside of which people run other container runtimes). So I would definitely consider it something that "needs to be supported in some way". While the LXC guys might be a *tad* crazy, the use-case isn't. :P >> >> No worries, we're all a little crazy in our own special ways ;) >> >> Kidding aside, thanks for explaining the use case. >> >>> Of course some of that gets to running auditd inside a container which >>> we don't have yet either. >>> >>> So I think to start it is perfectly fine to figure out the non-nested >>> case first and what makes sense there. Then to sort out the nested >>> container case. >>> >>> The solution might be that a process gets at most one id per ``audit >>> namespace''. >> >> In an attempt to stay on-topic, let's try to stick with "audit >> container ID" or "container ID" if you must. I really want to avoid >> the term "audit namespace" simply because the term "namespace" implies >> some things which we aren't planning on doing. > > This is 100% on topic. I am saying that unless we are planing to have > auditd running in a container with it's own set of rules you probably > don't care about nested containers. Last time I heard a discussion > about that the term in use was audit namespace. So I was referring to > that support when I said audit namespace, even if the end result only > loosely fits the term namespace. My "stay on-topic" comment is directed at, and limited to, your choice of terminology, not the discussion about container nesting. I'm purposefully not using the term "audit namespace" to refer to anything that Richard has presented, and I'm kindly asking you to do the same, it simply doesn't fit. > I could be wrong of course. I don't fully understand what is driving > the desire to connect audit and containers. But my naive guess is that > one from an audit perspective you don't care about nested containers > unless there is also a nested auditd who is looking at it from a nested > perspective. Two motivations that are clear to me: the first is the desire to be able to associate events in the audit log with a container (much like how the session ID helped us associate events with a login session), the second is the desire for users to run an audit daemon instance in their containers to capture audit events generated by their container. There is also a security certification motivation, see some of Steve's comments for more on that. -- paul moore www.paul-moore.com -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biedermanwrote: > Aleksa Sarai writes: The security implications are that anything that can change the label could also hide itself and its doings from the audit system and thus would be used as a means to evade detection. I actually think this means the label should be write once (once you've set it, you can't change it) ... >>> >>> Richard and I have talked about a write once approach, but the >>> thinking was that you may want to allow a nested container >>> orchestrator (Why? I don't know, but people always want to do the >>> craziest things.) and a write-once policy makes that impossible. If >>> we punt on the nested orchestrator, I believe we can seriously think >>> about a write-once policy to simplify things. >> >> Nested containers are a very widely used use-case (see LXC system containers, >> inside of which people run other container runtimes). So I would definitely >> consider it something that "needs to be supported in some way". While the LXC >> guys might be a *tad* crazy, the use-case isn't. :P No worries, we're all a little crazy in our own special ways ;) Kidding aside, thanks for explaining the use case. > Of course some of that gets to running auditd inside a container which > we don't have yet either. > > So I think to start it is perfectly fine to figure out the non-nested > case first and what makes sense there. Then to sort out the nested > container case. > > The solution might be that a process gets at most one id per ``audit > namespace''. In an attempt to stay on-topic, let's try to stick with "audit container ID" or "container ID" if you must. I really want to avoid the term "audit namespace" simply because the term "namespace" implies some things which we aren't planning on doing. -- paul moore www.paul-moore.com -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
Paul Moorewrites: > On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman > wrote: >> Aleksa Sarai writes: > The security implications are that anything that can change the label > could also hide itself and its doings from the audit system and thus > would be used as a means to evade detection. I actually think this > means the label should be write once (once you've set it, you can't > change it) ... Richard and I have talked about a write once approach, but the thinking was that you may want to allow a nested container orchestrator (Why? I don't know, but people always want to do the craziest things.) and a write-once policy makes that impossible. If we punt on the nested orchestrator, I believe we can seriously think about a write-once policy to simplify things. >>> >>> Nested containers are a very widely used use-case (see LXC system >>> containers, >>> inside of which people run other container runtimes). So I would definitely >>> consider it something that "needs to be supported in some way". While the >>> LXC >>> guys might be a *tad* crazy, the use-case isn't. :P > > No worries, we're all a little crazy in our own special ways ;) > > Kidding aside, thanks for explaining the use case. > >> Of course some of that gets to running auditd inside a container which >> we don't have yet either. >> >> So I think to start it is perfectly fine to figure out the non-nested >> case first and what makes sense there. Then to sort out the nested >> container case. >> >> The solution might be that a process gets at most one id per ``audit >> namespace''. > > In an attempt to stay on-topic, let's try to stick with "audit > container ID" or "container ID" if you must. I really want to avoid > the term "audit namespace" simply because the term "namespace" implies > some things which we aren't planning on doing. This is 100% on topic. I am saying that unless we are planing to have auditd running in a container with it's own set of rules you probably don't care about nested containers. Last time I heard a discussion about that the term in use was audit namespace. So I was referring to that support when I said audit namespace, even if the end result only loosely fits the term namespace. I could be wrong of course. I don't fully understand what is driving the desire to connect audit and containers. But my naive guess is that one from an audit perspective you don't care about nested containers unless there is also a nested auditd who is looking at it from a nested perspective. So far we have established with the term container that we are talking about a running instance of processes, not a filesystem instance that Docker and friends ship around. Beyond that I am not certain what you care about. Eric -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Thu, Oct 19, 2017 at 9:32 AM, Casey Schauflerwrote: > On 10/18/2017 5:05 PM, Richard Guy Briggs wrote: >> On 2017-10-17 01:10, Casey Schaufler wrote: >>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote: On 2017-10-12 16:33, Casey Schaufler wrote: > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: >> Containers are a userspace concept. The kernel knows nothing of them. >> >> The Linux audit system needs a way to be able to track the container >> provenance of events and actions. Audit needs the kernel's help to do >> this. >> >> Since the concept of a container is entirely a userspace concept, a >> registration from the userspace container orchestration system initiates >> this. This will define a point in time and a set of resources >> associated with a particular container with an audit container ID. >> >> The registration is a pseudo filesystem (proc, since PID tree already >> exists) write of a u8[16] UUID representing the container ID to a file >> representing a process that will become the first process in a new >> container. This write might place restrictions on mount namespaces >> required to define a container, or at least careful checking of >> namespaces in the kernel to verify permissions of the orchestrator so it >> can't change its own container ID. A bind mount of nsfs may be >> necessary in the container orchestrator's mntNS. >> Note: Use a 128-bit scalar rather than a string to make compares faster >> and simpler. >> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the >> registration. > Hang on. If containers are a user space concept, how can > you want CAP_CONTAINER_ANYTHING? If there's not such thing as > a container, how can you be asking for a capability to manage > them? There is such a thing, but the kernel doesn't know about it yet. >>> Then how can it be the kernel's place to control access to a >>> container resource, that is, the containerID. >> Ok, let me try to address your objections. >> >> The kernel can know enough that if it is already set to not allow it to >> be set again. Or if the user doesn't have permission to set it that the >> user be denied this action. How is this different from loginuid and >> sessionid? This same situation exists for loginuid and sessionid which are userspace concepts that the kernel tracks for the convenience of userspace. >>> Ah, no. Loginuid identifies a user, which is a kernel concept in >>> that a user is defined by the uid. >> This simple explanation doesn't help me. What makes that a kernel >> concept? The fact that it is stored and compared in more than one >> place? >> >>> The session ID has well defined kernel semantics. You're trying to say >>> that the containerID is an opaque value that is meaningless to the >>> kernel, but you still want the kernel to protect it. How can the >>> kernel know if it is protecting it correctly? >> How so? A userspace process triggers this. Does the kernel know what >> these values mean? Does it do anything with them other than report >> them or allow audit to filter them? It is given some instructions on >> how to treat it. >> >> This is what we're trying to do with the containerID. >> As for its name, I'm not particularly picky, so if you don't like CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID. It really needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to give the ability to set a containerID to any process that is able to do audit logging (such as vsftpd) and similarly we don't want to give the orchestrator the ability to control the setup of the audit daemon. >>> Sorry, but what aspect of the kernel security policy is this >>> capability supposed to protect? That's what capabilities are >>> for, not the undefined support of undefined user-space behavior. >> Similarly, loginuids and sessionIDs are only used for audit tracking and >> filtering. > > Tell me again why you're not reusing either of these? Ah, granularity arguments, welcome back old friend :) Once again, we're still trying to sort all this out so I reserve the right to change my mind, but my current thinking is as follows ... CAP_AUDIT_WRITE exists to control which applications can submit userspace generated audit records to the kernel, CAP_AUDIT_CONTROL exists to control which applications can manage the in-kernel audit configuration (e.g. filter rules) and the current task's loginuid value. Reusing CAP_AUDIT_WRITE here would allow any application that can submit userspace audit records the ability to change the audit container ID; this would be bad, we don't allow CAP_AUDIT_WRITE to change the loginuid, it would be even worse to allow it to change the audit container ID. Reusing CAP_AUDIT_CONTROL is less worse than than CAP_AUDIT_WRITE, but it gets
Re: RFC(v2): Audit Kernel Container IDs
On 10/18/2017 5:05 PM, Richard Guy Briggs wrote: > On 2017-10-17 01:10, Casey Schaufler wrote: >> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote: >>> On 2017-10-12 16:33, Casey Schaufler wrote: On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: > Containers are a userspace concept. The kernel knows nothing of them. > > The Linux audit system needs a way to be able to track the container > provenance of events and actions. Audit needs the kernel's help to do > this. > > Since the concept of a container is entirely a userspace concept, a > registration from the userspace container orchestration system initiates > this. This will define a point in time and a set of resources > associated with a particular container with an audit container ID. > > The registration is a pseudo filesystem (proc, since PID tree already > exists) write of a u8[16] UUID representing the container ID to a file > representing a process that will become the first process in a new > container. This write might place restrictions on mount namespaces > required to define a container, or at least careful checking of > namespaces in the kernel to verify permissions of the orchestrator so it > can't change its own container ID. A bind mount of nsfs may be > necessary in the container orchestrator's mntNS. > Note: Use a 128-bit scalar rather than a string to make compares faster > and simpler. > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the > registration. Hang on. If containers are a user space concept, how can you want CAP_CONTAINER_ANYTHING? If there's not such thing as a container, how can you be asking for a capability to manage them? >>> There is such a thing, but the kernel doesn't know about it yet. >> Then how can it be the kernel's place to control access to a >> container resource, that is, the containerID. > Ok, let me try to address your objections. > > The kernel can know enough that if it is already set to not allow it to > be set again. Or if the user doesn't have permission to set it that the > user be denied this action. How is this different from loginuid and > sessionid? >>> This >>> same situation exists for loginuid and sessionid which are userspace >>> concepts that the kernel tracks for the convenience of userspace. >> Ah, no. Loginuid identifies a user, which is a kernel concept in >> that a user is defined by the uid. > This simple explanation doesn't help me. What makes that a kernel > concept? The fact that it is stored and compared in more than one > place? > >> The session ID has well defined kernel semantics. You're trying to say >> that the containerID is an opaque value that is meaningless to the >> kernel, but you still want the kernel to protect it. How can the >> kernel know if it is protecting it correctly? > How so? A userspace process triggers this. Does the kernel know what > these values mean? Does it do anything with them other than report > them or allow audit to filter them? It is given some instructions on > how to treat it. > > This is what we're trying to do with the containerID. > >>> As >>> for its name, I'm not particularly picky, so if you don't like >>> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID. It really >>> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we >>> don't want to give the ability to set a containerID to any process that >>> is able to do audit logging (such as vsftpd) and similarly we don't want >>> to give the orchestrator the ability to control the setup of the audit >>> daemon. >> Sorry, but what aspect of the kernel security policy is this >> capability supposed to protect? That's what capabilities are >> for, not the undefined support of undefined user-space behavior. > Similarly, loginuids and sessionIDs are only used for audit tracking and > filtering. Tell me again why you're not reusing either of these? > >> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's >> more than audit behavior you have to define what system security >> policy you're dealing with in order to pick the right capability. > It isn't audit behaviour (yet), it is audit reporting information, a > level above simply writing logs and a level below controlling daemon > behaviour. You are changing audit information. That's CAP_AUDIT_CONTROL. > >> We get this request pretty regularly. "I need my own capability >> because I have a niche thing that isn't part of the system security >> policy but that is important!" Fit the containerID into the >> system security policy, and if that results in using CAP_SYS_ADMIN, >> oh well. > There's far too much piled in to CAP_SYS_ADMIN already, which is making > capabilites less and less useful. No. The value of capabilities is in separating privilege from DAC. Granularity is a bonus. The current granularity is too fine in some cases and too coarse in others. > I
Re: RFC(v2): Audit Kernel Container IDs
Aleksa Saraiwrites: >>> The security implications are that anything that can change the label >>> could also hide itself and its doings from the audit system and thus >>> would be used as a means to evade detection. I actually think this >>> means the label should be write once (once you've set it, you can't >>> change it) ... >> >> Richard and I have talked about a write once approach, but the >> thinking was that you may want to allow a nested container >> orchestrator (Why? I don't know, but people always want to do the >> craziest things.) and a write-once policy makes that impossible. If >> we punt on the nested orchestrator, I believe we can seriously think >> about a write-once policy to simplify things. > > Nested containers are a very widely used use-case (see LXC system containers, > inside of which people run other container runtimes). So I would definitely > consider it something that "needs to be supported in some way". While the LXC > guys might be a *tad* crazy, the use-case isn't. :P Of course some of that gets to running auditd inside a container which we don't have yet either. So I think to start it is perfectly fine to figure out the non-nested case first and what makes sense there. Then to sort out the nested container case. The solution might be that a process gets at most one id per ``audit namespace''. Eric -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
The security implications are that anything that can change the label could also hide itself and its doings from the audit system and thus would be used as a means to evade detection. I actually think this means the label should be write once (once you've set it, you can't change it) ... Richard and I have talked about a write once approach, but the thinking was that you may want to allow a nested container orchestrator (Why? I don't know, but people always want to do the craziest things.) and a write-once policy makes that impossible. If we punt on the nested orchestrator, I believe we can seriously think about a write-once policy to simplify things. Nested containers are a very widely used use-case (see LXC system containers, inside of which people run other container runtimes). So I would definitely consider it something that "needs to be supported in some way". While the LXC guys might be a *tad* crazy, the use-case isn't. :P ... and orchestration systems should begin as unlabelled processes allowing them to do arbitrary forks. My current thinking is that the default state is to start unlabeled (I just vomited a little into my SELinux hat); in other words init/systemd/PID-1 in the host system starts with an "unset" audit container ID. This not only helps define the host system (anything that has an unset audit container ID) but provides a blank slate for the orchestrator(s). For nested containers, I actually think the label should be hierarchical, so you can add a label for the new nested container but it still also contains its parents label as well. I haven't made up my mind on this completely just yet, but I'm currently of the mindset that supporting multiple audit container IDs on a given process is not a good idea. As long as creating a new "container" (that is, changing a process's "audit container ID") is an audit event then I think that having a hierarchy be explicit is not necessary (userspace audit can figure out the hierarchy quite easily -- but also there are cases where thinking of it as being hierarchical isn't necessarily correct). -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Tue, Oct 17, 2017 at 11:44 AM, James Bottomleywrote: > On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote: >> > Without a *kernel* policy on containerIDs you can't say what >> > security policy is being exempted. >> >> The policy has been basically stated earlier. >> >> A way to track a set of processes from a specific point in time >> forward. The name used is "container id", but it could be anything. >> This marker is mostly used by user space to track process hierarchies >> without races, these processes can be very privileged, and must not >> be allowed to change the marker themselves when granted the current >> common capabilities. >> >> Is this a good enough description ? If not can you clarify your >> expectations ? > > I think you mean you want to be able to apply a label to a process > which is inherited across forks. The label should only be susceptible > to modification by something possessing a capability (which one TBD). > The idea is that processes spawned into a container would be labelled > by the container orchestration system. It's unclear what should happen > to processes using nsenter after the fact, but policy for that should > be up to the orchestration system. > > The label will be used as a tag for audit information. > > I think you were missing label inheritance above. That is a pretty good summary of what we want to do, and what Richard and I have discussed while brainstorming this offline. The details may not have translated well into those initial emails from Richard, but I think you've got the idea, even if some of the smaller details are still TBD. FWIW, right now I'm not as worried about the exact capability or the size of the audit container ID, I think those things will sort themselves out as we progress through the implementation, especially once we get to the next stage when we start to allow copies of the audit records to be routed to audit daemons running inside containers (note well that I said "copies", the host system still sees all). > The security implications are that anything that can change the label > could also hide itself and its doings from the audit system and thus > would be used as a means to evade detection. I actually think this > means the label should be write once (once you've set it, you can't > change it) ... Richard and I have talked about a write once approach, but the thinking was that you may want to allow a nested container orchestrator (Why? I don't know, but people always want to do the craziest things.) and a write-once policy makes that impossible. If we punt on the nested orchestrator, I believe we can seriously think about a write-once policy to simplify things. A bit off topic, but I've also wondered about not even implementing read access, just to help ensure the audit container ID wouldn't be abused, but I'm not sure how practical that will be. Something else to sort out during the RFC phase of the implementation with the container orchestrators. > ... and orchestration systems should begin as unlabelled > processes allowing them to do arbitrary forks. My current thinking is that the default state is to start unlabeled (I just vomited a little into my SELinux hat); in other words init/systemd/PID-1 in the host system starts with an "unset" audit container ID. This not only helps define the host system (anything that has an unset audit container ID) but provides a blank slate for the orchestrator(s). > For nested containers, I actually think the label should be > hierarchical, so you can add a label for the new nested container but > it still also contains its parents label as well. I haven't made up my mind on this completely just yet, but I'm currently of the mindset that supporting multiple audit container IDs on a given process is not a good idea. -- paul moore www.paul-moore.com -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Tue, Oct 17, 2017 at 8:31 AM, Simo Sorcewrote: > The container Id can be used also for authorization purposes (by other > processes on the host), not just audit, I think this is why a separate > control has been proposed. Apologies, but I'm just now getting a chance to work my way through this thread, and I wanted to make a quick comment on this point ... The audit container ID (note I said "audit container ID" not "container ID") is intended strictly for use by the audit subsystem at this point. Allowing other uses opens the door to a larger set of problems we are trying to avoid (e.g. handling migration across hosts). We would love to have a generic kernel facility that the audit subsystem could use to identify containers, but we don't, and previous attempts have failed, so we have to create our own. We are intentionally trying to limit its scope in an attempt to limit problems. If a more general solution appears in the future I think we would make every effect to migrate to that; keeping this initial effort small should make that easier. -- paul moore www.paul-moore.com -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Tuesday, October 17, 2017 1:57:43 PM EDT James Bottomley wrote: > > > > The idea is that processes spawned into a container would be > > > > labelled by the container orchestration system. It's unclear > > > > what should happen to processes using nsenter after the fact, but > > > > policy for that should be up to the orchestration system. > > > > > > I'm fine with that. The user space policy can be anything y'all > > > like. > > > > I think there should be a login event. > > I thought you wanted this for containers? Container creation doesn't > have login events. In an unprivileged orchestration system it may be > hard to synthetically manufacture them. I realize this. This work is very similar to problems we've solved 12 years ago. We'll figure out what the right name is for it down the road. But the concept is the same. If something enters a container, we need to know about it. It needs to get tagged and be associated with the container. The way this was solved for the loginuid problem was to add a session identifier so that new logins of the same loginuid can coexist and we can trace actions back to a specific login. I'd think we can apply lessons learned from a while back to make container identification act similarly. -Steve -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Tue, 2017-10-17 at 13:15 -0400, Steve Grubb wrote: > On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote: > > > > > > > > The idea is that processes spawned into a container would be > > > labelled by the container orchestration system. It's unclear > > > what should happen to processes using nsenter after the fact, but > > > policy for that should be up to the orchestration system. > > > > I'm fine with that. The user space policy can be anything y'all > > like. > > I think there should be a login event. I thought you wanted this for containers? Container creation doesn't have login events. In an unprivileged orchestration system it may be hard to synthetically manufacture them. James -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote: > > The idea is that processes spawned into a container would be labelled > > by the container orchestration system. It's unclear what should happen > > to processes using nsenter after the fact, but policy for that should > > be up to the orchestration system. > > I'm fine with that. The user space policy can be anything y'all like. I think there should be a login event. > > The label will be used as a tag for audit information. > > Deep breath ... > > Which *is* a kernel security policy mechanism. Since the "label" > is part of the audit information that the kernel is guaranteeing > changing it would be covered by CAP_AUDIT_CONTROL. If the kernel > does not use the "label" for any other purpose this is the only > capability that makes sense for it. I agree. The ability to set the container label grants the ability to evade rules or modify audit rules. CAP_AUDIT_CONTROL makes sense to me. > > I think you were missing label inheritance above. > > > > The security implications are that anything that can change the label > > could also hide itself and its doings from the audit system and thus > > would be used as a means to evade detection. Yes. We have the same problem with loginuid. There are restrictions on who can change it once set. And then we made an immutable flag so that people that want a hard guarantee can get that. -Steve -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 10/17/2017 8:44 AM, James Bottomley wrote: > On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote: >>> Without a *kernel* policy on containerIDs you can't say what >>> security policy is being exempted. >> The policy has been basically stated earlier. >> >> A way to track a set of processes from a specific point in time >> forward. The name used is "container id", but it could be anything. >> This marker is mostly used by user space to track process hierarchies >> without races, these processes can be very privileged, and must not >> be allowed to change the marker themselves when granted the current >> common capabilities. >> >> Is this a good enough description ? If not can you clarify your >> expectations ? > I think you mean you want to be able to apply a label to a process > which is inherited across forks. That would be PTAGS. I agree that such a general mechanism could be very useful for a variety of purposes, not just containers. I do not agree that a single integer (e.g. a containerID) warrants more than trivial mechanism. > The label should only be susceptible > to modification by something possessing a capability (which one TBD). I think that the reason we're going to have crying and gnashing of teeth is that whatever capability is used. There will always be an issue of the capability granted being less specific than the application security model would like. And no, we're not going down the 330 capabilities road. It's been done in the UNIX world. Application security models hate that just as much as they hate the coarser granularity. > The idea is that processes spawned into a container would be labelled > by the container orchestration system. It's unclear what should happen > to processes using nsenter after the fact, but policy for that should > be up to the orchestration system. I'm fine with that. The user space policy can be anything y'all like. > The label will be used as a tag for audit information. Deep breath ... Which *is* a kernel security policy mechanism. Since the "label" is part of the audit information that the kernel is guaranteeing changing it would be covered by CAP_AUDIT_CONTROL. If the kernel does not use the "label" for any other purpose this is the only capability that makes sense for it. > I think you were missing label inheritance above. > > The security implications are that anything that can change the label > could also hide itself and its doings from the audit system and thus > would be used as a means to evade detection. Yes. This is a consequence of the capability granularity. There is no way we can make the capability granularity sufficiently fine to prevent this. No one wants the 330 capabilities that Data General had in their secure UNIX system. > I actually think this > means the label should be write once (once you've set it, you can't > change it) and orchestration systems should begin as unlabelled > processes allowing them to do arbitrary forks. > > For nested containers, I actually think the label should be > hierarchical, so you can add a label for the new nested container but > it still also contains its parents label as well. You can't support this reasonably with a single containerID. You want PTAGS for this. I know that there is resistance to requiring anything beyond what's in the base kernel (and for good reasons) for containers. Especially something that is pending future work. But let's not jam something into the base kernel that isn't really going to address the issue. > James -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 10/17/2017 8:28 AM, Simo Sorce wrote: > On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote: >> On 10/17/2017 5:31 AM, Simo Sorce wrote: >>> On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote: On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote: > There is such a thing, but the kernel doesn't know about it > yet. This same situation exists for loginuid and sessionid > which > are userspace concepts that the kernel tracks for the > convenience > of userspace. As for its name, I'm not particularly picky, so > if > you don't like CAP_CONTAINER_* then I'm fine with > CAP_AUDIT_CONTAINERID. It really needs to be distinct from > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to > give > the ability to set a containerID to any process that is able to > do > audit logging (such as vsftpd) and similarly we don't want to > give > the orchestrator the ability to control the setup of the audit > daemon. A long time ago, we were debating what should guard against rouge processes from setting the loginuid. Casey argued that the ability to set the loginuid means they have the ability to control the audit trail. That means that it should be guarded by CAP_AUDIT_CONTROL. I think the same logic applies today. >>> The difference is that with loginuid you needed to give processes >>> able >>> to audit also the ability to change it. You do not want to tie the >>> ability to change container ids to the ability to audit. You want >>> to be >>> able to do audit stuff (within the container) without allowing it >>> to >>> change the container id. >> Without a *kernel* policy on containerIDs you can't say what >> security policy is being exempted. > The policy has been basically stated earlier. No. The expected user space behavior has been stated. > A way to track a set of processes from a specific point in time > forward. The name used is "container id", but it could be anything. Then you want Jose Bollo's PTAGS. It's insane to add yet another arbitrary ID to the task for a special purpose. Add a general tagging mechanism instead. We could add a gazillion new id's, each with it's own capability if we head down this road. > This marker is mostly used by user space to track process hierarchies > without races, these processes can be very privileged, and must not be > allowed to change the marker themselves when granted the current common > capabilities. Let's be clear. What happens in user space stays in user space. The kernel does not give a fig about user space policy. There has to be a kernel policy involved that a capability can exempt. > Is this a good enough description ? If not can you clarify your > expectations ? The kernel enforces kernel policy. Capabilities provide a mechanism to mark a process as exempt from some aspect of kernel policy. If you don't have a kernel policy, you don't get a capability. Clear? > >> Without that you can't say what capability is (or isn't) >> appropriate. > See if the above is sufficient please. > >> You need a reason to have a capability check that makes sense in the >> context of the kernel security policy. > I think the proposal had a reason, we may debate on whether that reason > is good enough. > >> Since we don't know what a container is in the kernel, > Please do not fixate on the word container. > >> that's pretty hard. We don't create "fuzzy" capabilities >> based on the trendy application behavior of the moment. If the >> behavior is not related it audit, there's no reason for it, and >> if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work >> in your application security model I suggest that is where you >> need to make changes. > The authors of the proposal came to the conclusion that kernel > assistance is needed. It would be nice to discuss the merits of it. > If you do not understand why the request has been made it would be more > useful to ask specific questions to understand what and why is the ask. I understand pretty darn well. > Pushing back is fine, if you have understood the problem and have valid > arguments against a kernel level solution (and possibly suggestions for > a working user space solution), otherwise you are not adding value to > the discussion. The presumption is that the request is reasonable. Adding a capability in support of an undefined behavior is unreasonable. Based on the discussion, CAP_AUDIT_CONTROL is completely rational. I understand that it would be difficult to support your application privilege model. I would like to look into helping out with that, but have too many burning knives in the air just now. > > Simo. > -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote: > > Without a *kernel* policy on containerIDs you can't say what > > security policy is being exempted. > > The policy has been basically stated earlier. > > A way to track a set of processes from a specific point in time > forward. The name used is "container id", but it could be anything. > This marker is mostly used by user space to track process hierarchies > without races, these processes can be very privileged, and must not > be allowed to change the marker themselves when granted the current > common capabilities. > > Is this a good enough description ? If not can you clarify your > expectations ? I think you mean you want to be able to apply a label to a process which is inherited across forks. The label should only be susceptible to modification by something possessing a capability (which one TBD). The idea is that processes spawned into a container would be labelled by the container orchestration system. It's unclear what should happen to processes using nsenter after the fact, but policy for that should be up to the orchestration system. The label will be used as a tag for audit information. I think you were missing label inheritance above. The security implications are that anything that can change the label could also hide itself and its doings from the audit system and thus would be used as a means to evade detection. I actually think this means the label should be write once (once you've set it, you can't change it) and orchestration systems should begin as unlabelled processes allowing them to do arbitrary forks. For nested containers, I actually think the label should be hierarchical, so you can add a label for the new nested container but it still also contains its parents label as well. James -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote: > On 10/17/2017 5:31 AM, Simo Sorce wrote: > > On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote: > > > On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs > > > wrote: > > > > There is such a thing, but the kernel doesn't know about it > > > > yet. This same situation exists for loginuid and sessionid > > > > which > > > > are userspace concepts that the kernel tracks for the > > > > convenience > > > > of userspace. As for its name, I'm not particularly picky, so > > > > if > > > > you don't like CAP_CONTAINER_* then I'm fine with > > > > CAP_AUDIT_CONTAINERID. It really needs to be distinct from > > > > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to > > > > give > > > > the ability to set a containerID to any process that is able to > > > > do > > > > audit logging (such as vsftpd) and similarly we don't want to > > > > give > > > > the orchestrator the ability to control the setup of the audit > > > > daemon. > > > > > > A long time ago, we were debating what should guard against rouge > > > processes from setting the loginuid. Casey argued that the > > > ability to > > > set the loginuid means they have the ability to control the audit > > > trail. That means that it should be guarded by CAP_AUDIT_CONTROL. > > > I > > > think the same logic applies today. > > > > The difference is that with loginuid you needed to give processes > > able > > to audit also the ability to change it. You do not want to tie the > > ability to change container ids to the ability to audit. You want > > to be > > able to do audit stuff (within the container) without allowing it > > to > > change the container id. > > Without a *kernel* policy on containerIDs you can't say what > security policy is being exempted. The policy has been basically stated earlier. A way to track a set of processes from a specific point in time forward. The name used is "container id", but it could be anything. This marker is mostly used by user space to track process hierarchies without races, these processes can be very privileged, and must not be allowed to change the marker themselves when granted the current common capabilities. Is this a good enough description ? If not can you clarify your expectations ? > Without that you can't say what capability is (or isn't) > appropriate. See if the above is sufficient please. > You need a reason to have a capability check that makes sense in the > context of the kernel security policy. I think the proposal had a reason, we may debate on whether that reason is good enough. > Since we don't know what a container is in the kernel, Please do not fixate on the word container. > that's pretty hard. We don't create "fuzzy" capabilities > based on the trendy application behavior of the moment. If the > behavior is not related it audit, there's no reason for it, and > if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work > in your application security model I suggest that is where you > need to make changes. The authors of the proposal came to the conclusion that kernel assistance is needed. It would be nice to discuss the merits of it. If you do not understand why the request has been made it would be more useful to ask specific questions to understand what and why is the ask. Pushing back is fine, if you have understood the problem and have valid arguments against a kernel level solution (and possibly suggestions for a working user space solution), otherwise you are not adding value to the discussion. Simo. -- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote: > On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote: > > There is such a thing, but the kernel doesn't know about it > > yet. This same situation exists for loginuid and sessionid which > > are userspace concepts that the kernel tracks for the convenience > > of userspace. As for its name, I'm not particularly picky, so if > > you don't like CAP_CONTAINER_* then I'm fine with > > CAP_AUDIT_CONTAINERID. It really needs to be distinct from > > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to give > > the ability to set a containerID to any process that is able to do > > audit logging (such as vsftpd) and similarly we don't want to give > > the orchestrator the ability to control the setup of the audit > > daemon. > > A long time ago, we were debating what should guard against rouge > processes from setting the loginuid. Casey argued that the ability to > set the loginuid means they have the ability to control the audit > trail. That means that it should be guarded by CAP_AUDIT_CONTROL. I > think the same logic applies today. The difference is that with loginuid you needed to give processes able to audit also the ability to change it. You do not want to tie the ability to change container ids to the ability to audit. You want to be able to do audit stuff (within the container) without allowing it to change the container id. Of course if we made container id a write-once property maybe there is no need for controls at all, but I'm pretty sure there will be situations where write-once may not be usable in practice. > The ability to arbitrarily set a container ID means the process has > the ability to indirectly control the audit trail. The container Id can be used also for authorization purposes (by other processes on the host), not just audit, I think this is why a separate control has been proposed. Simo. -- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote: > On 2017-10-12 16:33, Casey Schaufler wrote: > > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: > > > Containers are a userspace concept. The kernel knows nothing of them. > > > > > > The Linux audit system needs a way to be able to track the container > > > provenance of events and actions. Audit needs the kernel's help to do > > > this. > > > > > > Since the concept of a container is entirely a userspace concept, a > > > registration from the userspace container orchestration system initiates > > > this. This will define a point in time and a set of resources > > > associated with a particular container with an audit container ID. > > > > > > The registration is a pseudo filesystem (proc, since PID tree already > > > exists) write of a u8[16] UUID representing the container ID to a file > > > representing a process that will become the first process in a new > > > container. This write might place restrictions on mount namespaces > > > required to define a container, or at least careful checking of > > > namespaces in the kernel to verify permissions of the orchestrator so it > > > can't change its own container ID. A bind mount of nsfs may be > > > necessary in the container orchestrator's mntNS. > > > Note: Use a 128-bit scalar rather than a string to make compares faster > > > and simpler. > > > > > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the > > > registration. > > > > Hang on. If containers are a user space concept, how can > > you want CAP_CONTAINER_ANYTHING? If there's not such thing as > > a container, how can you be asking for a capability to manage > > them? > > There is such a thing, but the kernel doesn't know about it yet. This > same situation exists for loginuid and sessionid which are userspace > concepts that the kernel tracks for the convenience of userspace. As > for its name, I'm not particularly picky, so if you don't like > CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID. It really > needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we > don't want to give the ability to set a containerID to any process that > is able to do audit logging (such as vsftpd) and similarly we don't want > to give the orchestrator the ability to control the setup of the audit > daemon. A long time ago, we were debating what should guard against rouge processes from setting the loginuid. Casey argued that the ability to set the loginuid means they have the ability to control the audit trail. That means that it should be guarded by CAP_AUDIT_CONTROL. I think the same logic applies today. The ability to arbitrarily set a container ID means the process has the ability to indirectly control the audit trail. -Steve -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 10/16/2017 5:33 PM, Richard Guy Briggs wrote: > On 2017-10-12 16:33, Casey Schaufler wrote: >> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: >>> Containers are a userspace concept. The kernel knows nothing of them. >>> >>> The Linux audit system needs a way to be able to track the container >>> provenance of events and actions. Audit needs the kernel's help to do >>> this. >>> >>> Since the concept of a container is entirely a userspace concept, a >>> registration from the userspace container orchestration system initiates >>> this. This will define a point in time and a set of resources >>> associated with a particular container with an audit container ID. >>> >>> The registration is a pseudo filesystem (proc, since PID tree already >>> exists) write of a u8[16] UUID representing the container ID to a file >>> representing a process that will become the first process in a new >>> container. This write might place restrictions on mount namespaces >>> required to define a container, or at least careful checking of >>> namespaces in the kernel to verify permissions of the orchestrator so it >>> can't change its own container ID. A bind mount of nsfs may be >>> necessary in the container orchestrator's mntNS. >>> Note: Use a 128-bit scalar rather than a string to make compares faster >>> and simpler. >>> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the >>> registration. >> Hang on. If containers are a user space concept, how can >> you want CAP_CONTAINER_ANYTHING? If there's not such thing as >> a container, how can you be asking for a capability to manage >> them? > There is such a thing, but the kernel doesn't know about it yet. Then how can it be the kernel's place to control access to a container resource, that is, the containerID. > This > same situation exists for loginuid and sessionid which are userspace > concepts that the kernel tracks for the convenience of userspace. Ah, no. Loginuid identifies a user, which is a kernel concept in that a user is defined by the uid. The session ID has well defined kernel semantics. You're trying to say that the containerID is an opaque value that is meaningless to the kernel, but you still want the kernel to protect it. How can the kernel know if it is protecting it correctly? > As > for its name, I'm not particularly picky, so if you don't like > CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID. It really > needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we > don't want to give the ability to set a containerID to any process that > is able to do audit logging (such as vsftpd) and similarly we don't want > to give the orchestrator the ability to control the setup of the audit > daemon. Sorry, but what aspect of the kernel security policy is this capability supposed to protect? That's what capabilities are for, not the undefined support of undefined user-space behavior. If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's more than audit behavior you have to define what system security policy you're dealing with in order to pick the right capability. We get this request pretty regularly. "I need my own capability because I have a niche thing that isn't part of the system security policy but that is important!" Fit the containerID into the system security policy, and if that results in using CAP_SYS_ADMIN, oh well. >>> At that time, record the target container's user-supplied >>> container identifier along with the target container's first process >>> (which may become the target container's "init" process) process ID >>> (referenced from the initial PID namespace), all namespace IDs (in the >>> form of a nsfs device number and inode number tuple) in a new auxilliary >>> record AUDIT_CONTAINER with a qualifying op=$action field. >>> >>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid >>> container ID present on an auditable action or event. >>> >>> Forked and cloned processes inherit their parent's container ID, >>> referenced in the process' task_struct. >>> >>> Mimic setns(2) and return an error if the process has already initiated >>> threading or forked since this registration should happen before the >>> process execution is started by the orchestrator and hence should not >>> yet have any threads or children. If this is deemed overly restrictive, >>> switch all threads and children to the new containerID. >>> >>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN. >>> >>> Log the creation of every namespace, inheriting/adding its spawning >>> process' containerID(s), if applicable. Include the spawning and >>> spawned namespace IDs (device and inode number tuples). >>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)] >>> Note: At this point it appears only network namespaces may need to track >>> container IDs apart from processes since incoming packets may cause an >>> auditable event before being associated with a
Re: RFC(v2): Audit Kernel Container IDs
On 2017-10-12 16:33, Casey Schaufler wrote: > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: > > Containers are a userspace concept. The kernel knows nothing of them. > > > > The Linux audit system needs a way to be able to track the container > > provenance of events and actions. Audit needs the kernel's help to do > > this. > > > > Since the concept of a container is entirely a userspace concept, a > > registration from the userspace container orchestration system initiates > > this. This will define a point in time and a set of resources > > associated with a particular container with an audit container ID. > > > > The registration is a pseudo filesystem (proc, since PID tree already > > exists) write of a u8[16] UUID representing the container ID to a file > > representing a process that will become the first process in a new > > container. This write might place restrictions on mount namespaces > > required to define a container, or at least careful checking of > > namespaces in the kernel to verify permissions of the orchestrator so it > > can't change its own container ID. A bind mount of nsfs may be > > necessary in the container orchestrator's mntNS. > > Note: Use a 128-bit scalar rather than a string to make compares faster > > and simpler. > > > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the > > registration. > > Hang on. If containers are a user space concept, how can > you want CAP_CONTAINER_ANYTHING? If there's not such thing as > a container, how can you be asking for a capability to manage > them? There is such a thing, but the kernel doesn't know about it yet. This same situation exists for loginuid and sessionid which are userspace concepts that the kernel tracks for the convenience of userspace. As for its name, I'm not particularly picky, so if you don't like CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID. It really needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to give the ability to set a containerID to any process that is able to do audit logging (such as vsftpd) and similarly we don't want to give the orchestrator the ability to control the setup of the audit daemon. > > > At that time, record the target container's user-supplied > > container identifier along with the target container's first process > > (which may become the target container's "init" process) process ID > > (referenced from the initial PID namespace), all namespace IDs (in the > > form of a nsfs device number and inode number tuple) in a new auxilliary > > record AUDIT_CONTAINER with a qualifying op=$action field. > > > > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid > > container ID present on an auditable action or event. > > > > Forked and cloned processes inherit their parent's container ID, > > referenced in the process' task_struct. > > > > Mimic setns(2) and return an error if the process has already initiated > > threading or forked since this registration should happen before the > > process execution is started by the orchestrator and hence should not > > yet have any threads or children. If this is deemed overly restrictive, > > switch all threads and children to the new containerID. > > > > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN. > > > > Log the creation of every namespace, inheriting/adding its spawning > > process' containerID(s), if applicable. Include the spawning and > > spawned namespace IDs (device and inode number tuples). > > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)] > > Note: At this point it appears only network namespaces may need to track > > container IDs apart from processes since incoming packets may cause an > > auditable event before being associated with a process. > > > > Log the destruction of every namespace when it is no longer used by any > > process, include the namespace IDs (device and inode number tuples). > > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)] > > > > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action) > > the parent and child namespace IDs for any changes to a process' > > namespaces. [setns(2)] > > Note: It may be possible to combine AUDIT_NS_* record formats and > > distinguish them with an op=$action field depending on the fields > > required for each message type. > > > > When a container ceases to exist because the last process in that > > container has exited and hence the last namespace has been destroyed and > > its refcount dropping to zero, log the fact. > > (This latter is likely needed for certification accountability.) A > > container object may need a list of processes and/or namespaces. > > > > A namespace cannot directly migrate from one container to another but > > could be assigned to a newly spawned container. A namespace can be > > moved from one container to another indirectly by having that namespace > > used in a second process in another
Re: RFC(v2): Audit Kernel Container IDs
On Thu, 12 Oct 2017 10:14:00 -0400 Richard Guy Briggswrote: > Containers are a userspace concept. The kernel knows nothing of them. > > The Linux audit system needs a way to be able to track the container > provenance of events and actions. Audit needs the kernel's help to do > this. > > Since the concept of a container is entirely a userspace concept, a > registration from the userspace container orchestration system initiates > this. This will define a point in time and a set of resources > associated with a particular container with an audit container ID. I don't think this has anything to do with containers directly. If i read it right you need a subtree of stuff to be asigned a (possibly irrevocable) magic identifier that you can use for other purposes. Traditional Unix in the more 'secure' space had that decades ago in the form of luid. At login time you did a setluid() and that set an irrevocable tag onthe session which was (traditionally) the uid of the login process so that audit and other related tools always knew how to tie the process back to the login session. That doesn't quite work as of itself (if you login you'd get luid set and not be able to change it for the container), but it seems something similarly trivial like a "setauditid(void)" would do the trick providing the kernel picked the UUID randomly [otherwise I can copy another known UUID to confuse or hide]. As you say a container is a userspace concept. So IMHO any audit interface should be about auditing and what needs tracking, not about containers. If the container management tool wants to set a suitable tag then let it. If not then it doesn't. Then it's a simple as checking CAP_AUDIT_WRITE to see if you are allowed to setauditit(), generating a random uuid and a matching getauditid() to copy it back. Alan -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
Richard Guy Briggswrites: > A namespace cannot directly migrate from one container to another but > could be assigned to a newly spawned container. A namespace can be > moved from one container to another indirectly by having that namespace > used in a second process in another container and then ending all the > processes in the first container. Ugh no. The semantics here are way too mushy. We need a clean crisp unambiguous definition or it will be impossible to get this correct and impossible to use for any security purpose. I understand the challenge. Some of the container managers share namespaces between containers. Leading to things that are not really contained. Please make this concept like an indellibale die. Once you are stained with it you can not escape. If you don't meet all of the criteria you aren't stained. The justification that I heard, and that seems legitimate is that it is not timely and it is hard to make the connection between the distinct unshare, setns, and clone events and what is happening in the kernel. With that justification definitely the network namespace needs to be stained if it is appropriate. I also don't see why this can't be a special dedicated audit message. I just looked at the code in the kernel and nlmsg_type is a u16. There are only a handful of audit message types defined. There is absolutely no reason to bring proc into this. I have the same reservation as the others about defining a new cap for this. It should be enough to make setting the container id a one time thing for a set of processes and namespaces. If this is going to be security it needs to be very simple and very well defined. Eric -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On 10/12/2017 7:14 AM, Richard Guy Briggs wrote: > Containers are a userspace concept. The kernel knows nothing of them. > > The Linux audit system needs a way to be able to track the container > provenance of events and actions. Audit needs the kernel's help to do > this. > > Since the concept of a container is entirely a userspace concept, a > registration from the userspace container orchestration system initiates > this. This will define a point in time and a set of resources > associated with a particular container with an audit container ID. > > The registration is a pseudo filesystem (proc, since PID tree already > exists) write of a u8[16] UUID representing the container ID to a file > representing a process that will become the first process in a new > container. This write might place restrictions on mount namespaces > required to define a container, or at least careful checking of > namespaces in the kernel to verify permissions of the orchestrator so it > can't change its own container ID. A bind mount of nsfs may be > necessary in the container orchestrator's mntNS. > Note: Use a 128-bit scalar rather than a string to make compares faster > and simpler. > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the > registration. Hang on. If containers are a user space concept, how can you want CAP_CONTAINER_ANYTHING? If there's not such thing as a container, how can you be asking for a capability to manage them? > At that time, record the target container's user-supplied > container identifier along with the target container's first process > (which may become the target container's "init" process) process ID > (referenced from the initial PID namespace), all namespace IDs (in the > form of a nsfs device number and inode number tuple) in a new auxilliary > record AUDIT_CONTAINER with a qualifying op=$action field. > > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid > container ID present on an auditable action or event. > > Forked and cloned processes inherit their parent's container ID, > referenced in the process' task_struct. > > Mimic setns(2) and return an error if the process has already initiated > threading or forked since this registration should happen before the > process execution is started by the orchestrator and hence should not > yet have any threads or children. If this is deemed overly restrictive, > switch all threads and children to the new containerID. > > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN. > > Log the creation of every namespace, inheriting/adding its spawning > process' containerID(s), if applicable. Include the spawning and > spawned namespace IDs (device and inode number tuples). > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)] > Note: At this point it appears only network namespaces may need to track > container IDs apart from processes since incoming packets may cause an > auditable event before being associated with a process. > > Log the destruction of every namespace when it is no longer used by any > process, include the namespace IDs (device and inode number tuples). > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)] > > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action) > the parent and child namespace IDs for any changes to a process' > namespaces. [setns(2)] > Note: It may be possible to combine AUDIT_NS_* record formats and > distinguish them with an op=$action field depending on the fields > required for each message type. > > When a container ceases to exist because the last process in that > container has exited and hence the last namespace has been destroyed and > its refcount dropping to zero, log the fact. > (This latter is likely needed for certification accountability.) A > container object may need a list of processes and/or namespaces. > > A namespace cannot directly migrate from one container to another but > could be assigned to a newly spawned container. A namespace can be > moved from one container to another indirectly by having that namespace > used in a second process in another container and then ending all the > processes in the first container. > > (v2) > - switch from u64 to u128 UUID > - switch from "signal" and "trigger" to "register" > - restrict registration to single process or force all threads and children > into same container > > - RGB > > -- > Richard Guy Briggs> Sr. S/W Engineer, Kernel Security, Base Operating Systems > Remote, Ottawa, Red Hat Canada > IRC: rgb, SunRaycer > Voice: +1.647.777.2635, Internal: (81) 32635 > > -- > Linux-audit mailing list > Linux-audit@redhat.com > https://www.redhat.com/mailman/listinfo/linux-audit > -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit
Re: RFC(v2): Audit Kernel Container IDs
On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote: > Containers are a userspace concept. The kernel knows nothing of them. > > The Linux audit system needs a way to be able to track the container > provenance of events and actions. Audit needs the kernel's help to do > this. > > Since the concept of a container is entirely a userspace concept, a > registration from the userspace container orchestration system initiates > this. This will define a point in time and a set of resources > associated with a particular container with an audit container ID. The requirements for common criteria around containers should be very closely modeled on the requirements for virtualization. It would be the container manager that is responsible for logging the resource assignment events. > The registration is a pseudo filesystem (proc, since PID tree already > exists) write of a u8[16] UUID representing the container ID to a file > representing a process that will become the first process in a new > container. This write might place restrictions on mount namespaces > required to define a container, or at least careful checking of > namespaces in the kernel to verify permissions of the orchestrator so it > can't change its own container ID. A bind mount of nsfs may be > necessary in the container orchestrator's mntNS. > Note: Use a 128-bit scalar rather than a string to make compares faster > and simpler. > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the > registration. Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. > At that time, record the target container's user-supplied > container identifier along with the target container's first process > (which may become the target container's "init" process) process ID > (referenced from the initial PID namespace), all namespace IDs (in the > form of a nsfs device number and inode number tuple) in a new auxilliary > record AUDIT_CONTAINER with a qualifying op=$action field. This would be in addition to the normal audit fields. > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid > container ID present on an auditable action or event. > > Forked and cloned processes inherit their parent's container ID, > referenced in the process' task_struct. > > Mimic setns(2) and return an error if the process has already initiated > threading or forked since this registration should happen before the > process execution is started by the orchestrator and hence should not > yet have any threads or children. If this is deemed overly restrictive, > switch all threads and children to the new containerID. > > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN. > > Log the creation of every namespace, inheriting/adding its spawning > process' containerID(s), if applicable. Include the spawning and > spawned namespace IDs (device and inode number tuples). > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)] > Note: At this point it appears only network namespaces may need to track > container IDs apart from processes since incoming packets may cause an > auditable event before being associated with a process. > > Log the destruction of every namespace when it is no longer used by any > process, include the namespace IDs (device and inode number tuples). > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)] In the virtualization requirements, we only log removal of resources when something is removed by intention. If the VM shuts down, the manager issues a VIRT_CONTROL stop event and the user space utilities knows this means all resources have been unassigned. > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action) > the parent and child namespace IDs for any changes to a process' > namespaces. [setns(2)] > Note: It may be possible to combine AUDIT_NS_* record formats and > distinguish them with an op=$action field depending on the fields > required for each message type. > > When a container ceases to exist because the last process in that > container has exited and hence the last namespace has been destroyed and > its refcount dropping to zero, log the fact. > (This latter is likely needed for certification accountability.) A > container object may need a list of processes and/or namespaces. > > A namespace cannot directly migrate from one container to another but > could be assigned to a newly spawned container. A namespace can be > moved from one container to another indirectly by having that namespace > used in a second process in another container and then ending all the > processes in the first container. I'm thinking that there needs to be a clear delineation between what the container manager is responsible for and what the kernel needs to do. The kernel needs the registration system and to associate an identifier with events inside the container. But would the container manager be mostly responsible for auditing
RFC(v2): Audit Kernel Container IDs
Containers are a userspace concept. The kernel knows nothing of them. The Linux audit system needs a way to be able to track the container provenance of events and actions. Audit needs the kernel's help to do this. Since the concept of a container is entirely a userspace concept, a registration from the userspace container orchestration system initiates this. This will define a point in time and a set of resources associated with a particular container with an audit container ID. The registration is a pseudo filesystem (proc, since PID tree already exists) write of a u8[16] UUID representing the container ID to a file representing a process that will become the first process in a new container. This write might place restrictions on mount namespaces required to define a container, or at least careful checking of namespaces in the kernel to verify permissions of the orchestrator so it can't change its own container ID. A bind mount of nsfs may be necessary in the container orchestrator's mntNS. Note: Use a 128-bit scalar rather than a string to make compares faster and simpler. Require a new CAP_CONTAINER_ADMIN to be able to carry out the registration. At that time, record the target container's user-supplied container identifier along with the target container's first process (which may become the target container's "init" process) process ID (referenced from the initial PID namespace), all namespace IDs (in the form of a nsfs device number and inode number tuple) in a new auxilliary record AUDIT_CONTAINER with a qualifying op=$action field. Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid container ID present on an auditable action or event. Forked and cloned processes inherit their parent's container ID, referenced in the process' task_struct. Mimic setns(2) and return an error if the process has already initiated threading or forked since this registration should happen before the process execution is started by the orchestrator and hence should not yet have any threads or children. If this is deemed overly restrictive, switch all threads and children to the new containerID. Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN. Log the creation of every namespace, inheriting/adding its spawning process' containerID(s), if applicable. Include the spawning and spawned namespace IDs (device and inode number tuples). [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)] Note: At this point it appears only network namespaces may need to track container IDs apart from processes since incoming packets may cause an auditable event before being associated with a process. Log the destruction of every namespace when it is no longer used by any process, include the namespace IDs (device and inode number tuples). [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)] Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action) the parent and child namespace IDs for any changes to a process' namespaces. [setns(2)] Note: It may be possible to combine AUDIT_NS_* record formats and distinguish them with an op=$action field depending on the fields required for each message type. When a container ceases to exist because the last process in that container has exited and hence the last namespace has been destroyed and its refcount dropping to zero, log the fact. (This latter is likely needed for certification accountability.) A container object may need a list of processes and/or namespaces. A namespace cannot directly migrate from one container to another but could be assigned to a newly spawned container. A namespace can be moved from one container to another indirectly by having that namespace used in a second process in another container and then ending all the processes in the first container. (v2) - switch from u64 to u128 UUID - switch from "signal" and "trigger" to "register" - restrict registration to single process or force all threads and children into same container - RGB -- Richard Guy BriggsSr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635 -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit