Re: [PATCH 0/2] namespaces: log namespaces per task
Le 06/05/2014 23:15, Richard Guy Briggs a écrit : On 14/05/05, Nicolas Dichtel wrote: Le 02/05/2014 16:28, Richard Guy Briggs a ?crit : On 14/05/02, Serge E. Hallyn wrote: Quoting Richard Guy Briggs (r...@redhat.com): I saw no replies to my questions when I replied a year after Aris' posting, so I don't know if it was ignored or got lost in stale threads: https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html) https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html I've tried to answer a number of questions that were raised in that thread. The goal is not quite identical to Aris' patchset. The purpose is to track namespaces in use by logged processes from the perspective of init_*_ns. The first patch defines a function to list them. The second patch provides an example of usage for audit_log_task_info() which is used by syscall audits, among others. audit_log_task() and audit_common_recv_message() would be other potential use cases. Use a serial number per namespace (unique across one boot of one kernel) instead of the inode number (which is claimed to have had the right to change reserved and is not necessarily unique if there is more than one proc fs). It could be argued that the inode numbers have now become a defacto interface and can't change now, but I'm proposing this approach to see if this helps address some of the objections to the earlier patchset. There could also have messages added to track the creation and the destruction of namespaces, listing the parent for hierarchical namespaces such as pidns, userns, and listing other ids for non-hierarchical namespaces, as well as other information to help identify a namespace. There has been some progress made for audit in net namespaces and pid namespaces since this previous thread. net namespaces are now served as peers by one auditd in the init_net namespace with processes in a non-init_net namespace being able to write records if they are in the init_user_ns and have CAP_AUDIT_WRITE. Processes in a non-init_pid_ns can now similarly write records. As for CAP_AUDIT_READ, I just posted a patchset to check capabilities of userspace processes that try to join netlink broadcast groups. Questions: Is there a way to link serial numbers of namespaces involved in migration of a container to another kernel? (I had a brief look at CRIU.) Is there a unique identifier for each running instance of a kernel? Or at least some identifier within the container migration realm? Eric Biederman has always been adamantly opposed to adding new namespaces of namespaces, so the fact that you're asking this question concerns me. I have seen that position and I don't fully understand the justification for it other than added complexity. Just FYI, have you seen this thread: http://thread.gmane.org/gmane.linux.network/286572/ There is some explanations/examples about this topic. Thanks for that reference. I read it through, but will need to do so again to get it to sink in. I think audit has the same problematic than x-netns netdevice: beeing able to identify a peer netns, when a userland apps "read" a message from the kernel. The main problem with file descriptor is that you cannot use them when you broadcast a message from kernel to userland. Maybe we can use the local names concept (like file descriptors but without their constraints), ie having an identifier of a peer (net)ns which is only valid the current (net)ns. When the kernel needs to identify a peer (net)ns, it uses this identifier (or allocate it the first time). After that, the userland apps may reuse this identifier to configure things in the peer (net)ns. Eric, any thoughts about this? Regards, Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On Tue, 2014-05-06 at 17:41 -0400, Richard Guy Briggs wrote: > On 14/05/05, James Bottomley wrote: > > On May 5, 2014 3:36:38 PM PDT, Serge Hallyn wrote: > > >Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > >> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote: > > >> > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > > >> > > > On 14/05/05, Serge E. Hallyn wrote: > > >> > > > > Quoting James Bottomley > > >(james.bottom...@hansenpartnership.com): > > >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs > > >wrote: > > >> > > > > > > Questions: > > >> > > > > > > Is there a way to link serial numbers of namespaces > > >involved in migration of a > > >> > > > > > > container to another kernel? (I had a brief look at > > >CRIU.) Is there a unique > > >> > > > > > > identifier for each running instance of a kernel? Or at > > >least some identifier > > >> > > > > > > within the container migration realm? > > >> > > > > > > > >> > > > > > Are you asking for a way of distinguishing an migrated > > >container from an > > >> > > > > > unmigrated one? The answer is pretty much "no" because the > > >job of > > >> > > > > > migration is to restore to the same state as much as > > >possible. > > >> > > > > > > > >> > > > > > Reading between the lines, I think your goal is to > > >correlate audit > > >> > > > > > information across a container migration, right? Ideally > > >the management > > >> > > > > > system should be able to cough up an audit trail for a > > >container > > >> > > > > > wherever it's running and however many times it's been > > >migrated? > > >> > > > > > > > >> > > > > > In that case, I think your idea of a numeric serial number > > >in a dense > > >> > > > > > range is wrong. Because the range is dense you're > > >obviously never going > > >> > > > > > to be able to use the same serial number across a > > >migration. However, > > >> > > > > > > >> > > > > Ah, but I was being silly before, we can actually address > > >this pretty > > >> > > > > simply. If we just (for instance) add > > >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the > > >serial number > > >> > > > > for the relevant ns for the task, then criu can dump this > > >info at > > >> > > > > checkpoint. Then at restart it can dump an audit message per > > >task and > > >> > > > > ns saying old_serial=%x,new_serial=%x. That way the audit > > >log reader > > >> > > > > can if it cares keep track. > > >> > > > > > >> > > > This is the sort of idea I had in mind... > > >> > > > > >> > > OK, but I don't understand then why you need a serial number. > > >There are > > >> > > plenty of things we preserve across a migration, like namespace > > >name for > > >> > > instance. Could you explain what function it performs because I > > >think I > > >> > > might be missing something. > > >> > > > >> > We're looking ahead to a time when audit is namespaced, and a > > >container > > >> > can keep its own audit logs (without limiting what the host audits > > >of > > >> > course). So if a container is auditing suspicious activity by some > > >> > task in a sub-namesapce, then the whole parent container gets > > >migrated, > > >> > after migration we want to continue being able to correlate the > > >namespaces. > > >> > > > >> > We're also looking at audit trails on a host that is up for years. > > >We > > >> > would like every namespace to be uniquely logged there. That is > > >why > > >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a > > >generation > > >> > # (which would end more complicated, not less, than a serial #). > > >> > > >> Right, but when the contaner has an audit namespace, that namespace > > >has > > >> a name, > > > > > >What ns has a name? > > > > The netns for instance. > > > > > The audit ns can be tied to 50 pid namespaces, and > > >we > > >want to log which pidns is responsible for something. > > > > > >If you mean the pidns has a name, that's the problem... it does not, > > >it > > >only has a inode # which may later be re-use. > > > > I still think there's a miscommunication somewhere: I believe you just > > need a stable id to tie the audit to, so why not just give the audit > > namespace a name like net? The id would then be durable across > > migrations. > > Audit does not have its own namespace (yet). So it would make the most sense to do this if audit were a separately attachable capability the orchestrator would like to control. I'm not sure about that so I'll consider some use cases below. > That idea is being > considered, but we would prefer to avoid it if it makes sense to tie it > in with an existing namespace. The pid and user namespaces, being > heierarchical seem to make the most sense so far, but we are proceeding > very carefully to avoid creating a security nightmare in the process. pid ns might be. You need th
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/05, James Bottomley wrote: > On Tue, 2014-05-06 at 03:27 +, Serge Hallyn wrote: > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > >> Right, but when the contaner has an audit namespace, that namespace > > > >has > > > >> a name, > > > > > > > >What ns has a name? > > > > > > The netns for instance. > > > > And what is its name? > > As I think you know ip netns list will show you all of them. The way > they're applied is via mapped files in /var/run/netns/ which hold the > names. > > > The only name I know that we could log in an > > audit message is the /proc/self/ns/net inode number (which does not > > suffice) > > OK, so I think this is the confusion: You're thinking the container > itself doesn't know what name the namespace has been given by the > system, all it knows is the inode number corresponding to a file which > it may or may not be able to see, right? I guess if that container hasn't mounted /proc, it couldn't find out. The same would be true of /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq to find out its namespace serial numbers, but that doesn't stop that container from initiating an audit message with the information it knows, which can be supplemented by information the kernel already knows about it. > I'm thinking that the system > that set up the container gave those files names and usually they're the > same name for all the namespaces. The point is that the orchestration > system (whatever set up the container) will be responsible for the > migration. It will be the thing that has a unique handle for the > container. The handle is usually ascii representable, either a human > readable name or some uuid/guid. It's that handle that we should be > using to prefix the audit message, It is now possible to send audit messages while in another non-init namespace, so from there, it could record that handle and have the namespace serial numbers from the kernel logged with that message. This would be recorded by the host audit daemon, not the container audit daemon. The container management system could talk to this host audit daemon to re-assemble an audit record trail for that container. > so when you set up an audit > namespace, it gets supplied with a prefix string corresponding to the > well known name for the container. This is the string we'd preserve > across migration as part of the audit namespace state ... so the audit > messages all correlate to the container wherever it's migrated to; no > need to do complex tracking of changes to serial numbers. That is a further step: having a container have its own audit daemon. > > > > The audit ns can be tied to 50 pid namespaces, and > > > >we > > > >want to log which pidns is responsible for something. > > > > > > > >If you mean the pidns has a name, that's the problem... it does not, > > > >it > > > >only has a inode # which may later be re-use. > > > > > > I still think there's a miscommunication somewhere: I believe you just > > > need a stable id to tie the audit to, so why not just give the audit > > > namespace a name like net? The id would then be durable across > > > migrations. > > > > Maybe this is where we're confusing each other - I'm not talking > > about giving the audit ns a name. I'm talking about being able to > > identify the other namespaces inside an audit message. In a way > > that (a) is unique across bare metals' entire uptime, and (b) > > can be tracked across migrations. > > OK, so that is different from what I'm thinking. I'm thinking unique > name for migrateable entity, you want a unique name for each component > of the migrateable entity? Yes. > My instinct still tells me the orchestration > system is going to have a unique identifier for each different sub > container. So what is a sub container? A nested container? We still want to track component namespaces of each nested container. > However, I have to point out that a serial number isn't what you want > either if you really mean bare metal. We do a lot of deployments where > the containers run in a hypervisor, there the serial numbers won't be > unique per box (only per vm) and we'll have to do vm correlation > separately. whereas a scheme which allows the orchestration system to > supply the names would still be unique in that situation. Unique per _running kernel_ was my intention. I don't care if it is bare metal or not. > James - RGB -- Richard Guy Briggs Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/05, James Bottomley wrote: > On May 5, 2014 3:36:38 PM PDT, Serge Hallyn wrote: > >Quoting James Bottomley (james.bottom...@hansenpartnership.com): > >> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote: > >> > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > >> > > > On 14/05/05, Serge E. Hallyn wrote: > >> > > > > Quoting James Bottomley > >(james.bottom...@hansenpartnership.com): > >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs > >wrote: > >> > > > > > > Questions: > >> > > > > > > Is there a way to link serial numbers of namespaces > >involved in migration of a > >> > > > > > > container to another kernel? (I had a brief look at > >CRIU.) Is there a unique > >> > > > > > > identifier for each running instance of a kernel? Or at > >least some identifier > >> > > > > > > within the container migration realm? > >> > > > > > > >> > > > > > Are you asking for a way of distinguishing an migrated > >container from an > >> > > > > > unmigrated one? The answer is pretty much "no" because the > >job of > >> > > > > > migration is to restore to the same state as much as > >possible. > >> > > > > > > >> > > > > > Reading between the lines, I think your goal is to > >correlate audit > >> > > > > > information across a container migration, right? Ideally > >the management > >> > > > > > system should be able to cough up an audit trail for a > >container > >> > > > > > wherever it's running and however many times it's been > >migrated? > >> > > > > > > >> > > > > > In that case, I think your idea of a numeric serial number > >in a dense > >> > > > > > range is wrong. Because the range is dense you're > >obviously never going > >> > > > > > to be able to use the same serial number across a > >migration. However, > >> > > > > > >> > > > > Ah, but I was being silly before, we can actually address > >this pretty > >> > > > > simply. If we just (for instance) add > >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the > >serial number > >> > > > > for the relevant ns for the task, then criu can dump this > >info at > >> > > > > checkpoint. Then at restart it can dump an audit message per > >task and > >> > > > > ns saying old_serial=%x,new_serial=%x. That way the audit > >log reader > >> > > > > can if it cares keep track. > >> > > > > >> > > > This is the sort of idea I had in mind... > >> > > > >> > > OK, but I don't understand then why you need a serial number. > >There are > >> > > plenty of things we preserve across a migration, like namespace > >name for > >> > > instance. Could you explain what function it performs because I > >think I > >> > > might be missing something. > >> > > >> > We're looking ahead to a time when audit is namespaced, and a > >container > >> > can keep its own audit logs (without limiting what the host audits > >of > >> > course). So if a container is auditing suspicious activity by some > >> > task in a sub-namesapce, then the whole parent container gets > >migrated, > >> > after migration we want to continue being able to correlate the > >namespaces. > >> > > >> > We're also looking at audit trails on a host that is up for years. > >We > >> > would like every namespace to be uniquely logged there. That is > >why > >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a > >generation > >> > # (which would end more complicated, not less, than a serial #). > >> > >> Right, but when the contaner has an audit namespace, that namespace > >has > >> a name, > > > >What ns has a name? > > The netns for instance. > > > The audit ns can be tied to 50 pid namespaces, and > >we > >want to log which pidns is responsible for something. > > > >If you mean the pidns has a name, that's the problem... it does not, > >it > >only has a inode # which may later be re-use. > > I still think there's a miscommunication somewhere: I believe you just > need a stable id to tie the audit to, so why not just give the audit > namespace a name like net? The id would then be durable across > migrations. Audit does not have its own namespace (yet). That idea is being considered, but we would prefer to avoid it if it makes sense to tie it in with an existing namespace. The pid and user namespaces, being heierarchical seem to make the most sense so far, but we are proceeding very carefully to avoid creating a security nightmare in the process. >From the kernel's perspective, none of the namespaces have a name. A container concept of a group of namespaces may have been assigned one, but that isn't apparent to the layer that is logging this information. > >> which CRIU would migrate, so why not use that name for the > >> log .. no need for numbers (unless you make the name a number, of > >> course)? There would certainly need to be a way to tie these namespace identifiers to container names in log messages. > >> James > > > >Sorry if I'm being
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/05, Nicolas Dichtel wrote: > Le 02/05/2014 16:28, Richard Guy Briggs a ?crit : > >On 14/05/02, Serge E. Hallyn wrote: > >>Quoting Richard Guy Briggs (r...@redhat.com): > >>>I saw no replies to my questions when I replied a year after Aris' > >>>posting, so > >>>I don't know if it was ignored or got lost in stale threads: > >>> > >>> https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html > >>> > >>> https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html > >>> > >>> (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html) > >>> > >>> https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html > >>> > >>>I've tried to answer a number of questions that were raised in that thread. > >>> > >>>The goal is not quite identical to Aris' patchset. > >>> > >>>The purpose is to track namespaces in use by logged processes from the > >>>perspective of init_*_ns. The first patch defines a function to list them. > >>>The second patch provides an example of usage for audit_log_task_info() > >>>which > >>>is used by syscall audits, among others. audit_log_task() and > >>>audit_common_recv_message() would be other potential use cases. > >>> > >>>Use a serial number per namespace (unique across one boot of one kernel) > >>>instead of the inode number (which is claimed to have had the right to > >>>change > >>>reserved and is not necessarily unique if there is more than one proc fs). > >>> It > >>>could be argued that the inode numbers have now become a defacto interface > >>>and > >>>can't change now, but I'm proposing this approach to see if this helps > >>>address > >>>some of the objections to the earlier patchset. > >>> > >>>There could also have messages added to track the creation and the > >>>destruction > >>>of namespaces, listing the parent for hierarchical namespaces such as > >>>pidns, > >>>userns, and listing other ids for non-hierarchical namespaces, as well as > >>>other > >>>information to help identify a namespace. > >>> > >>>There has been some progress made for audit in net namespaces and pid > >>>namespaces since this previous thread. net namespaces are now served as > >>>peers > >>>by one auditd in the init_net namespace with processes in a non-init_net > >>>namespace being able to write records if they are in the init_user_ns and > >>>have > >>>CAP_AUDIT_WRITE. Processes in a non-init_pid_ns can now similarly write > >>>records. As for CAP_AUDIT_READ, I just posted a patchset to check > >>>capabilities > >>>of userspace processes that try to join netlink broadcast groups. > >>> > >>> > >>>Questions: > >>>Is there a way to link serial numbers of namespaces involved in migration > >>>of a > >>>container to another kernel? (I had a brief look at CRIU.) Is there a > >>>unique > >>>identifier for each running instance of a kernel? Or at least some > >>>identifier > >>>within the container migration realm? > >> > >>Eric Biederman has always been adamantly opposed to adding new namespaces > >>of namespaces, so the fact that you're asking this question concerns me. > > > >I have seen that position and I don't fully understand the justification > >for it other than added complexity. > Just FYI, have you seen this thread: > http://thread.gmane.org/gmane.linux.network/286572/ > > There is some explanations/examples about this topic. Thanks for that reference. I read it through, but will need to do so again to get it to sink in. > Nicolas - RGB -- Richard Guy Briggs Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
Quoting James Bottomley (james.bottom...@hansenpartnership.com): > On Tue, 2014-05-06 at 03:27 +, Serge Hallyn wrote: > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > >> Right, but when the contaner has an audit namespace, that namespace > > > >has > > > >> a name, > > > > > > > >What ns has a name? > > > > > > The netns for instance. > > > > And what is its name? > > As I think you know ip netns list will show you all of them. The way Ah. Now I see, thanks :) I never actually use that feature (other than when debugging how mounts propagation affects how that's implemented) which is why it completely did not occur to me that this might be what you meant. However these names are (a) not in the kernel, (b) not unique per-boot, and (c) not applicable to other namespaces (without more userspace tweaking). So these are not a substitute for what Richard is proposing. > they're applied is via mapped files in /var/run/netns/ which hold the > names. > > > The only name I know that we could log in an > > audit message is the /proc/self/ns/net inode number (which does not > > suffice) > > OK, so I think this is the confusion: You're thinking the container > itself doesn't know what name the namespace has been given by the > system, all it knows is the inode number corresponding to a file which > it may or may not be able to see, right? I'm thinking that the system > that set up the container gave those files names and usually they're the > same name for all the namespaces. The point is that the orchestration > system (whatever set up the container) will be responsible for the > migration. It will be the thing that has a unique handle for the > container. (Several things to reply to there but I'll pick just one,) We are not looking for a unique name for a container, that's far too coarse. Within that container there may be many daemons which have unshared their own namespaces, i.e. cgmanager unshared a mntns, vsftpd unshared a netns, etc. We want the namespace identified in the audit messages. We want, within an audit record for a system boot, for each namespace to be *uniquely* identified. I don't know how many people are still doing capp/lspp type installs, but that's the level I'm thinking at for this. It's not syslog, it's audit. > The handle is usually ascii representable, either a human > readable name or some uuid/guid. It's that handle that we should be > using to prefix the audit message, so when you set up an audit > namespace, it gets supplied with a prefix string corresponding to the > well known name for the container. This is the string we'd preserve > across migration as part of the audit namespace state ... so the audit > messages all correlate to the container wherever it's migrated to; no > need to do complex tracking of changes to serial numbers. > > > > > The audit ns can be tied to 50 pid namespaces, and > > > >we > > > >want to log which pidns is responsible for something. > > > > > > > >If you mean the pidns has a name, that's the problem... it does not, > > > >it > > > >only has a inode # which may later be re-use. > > > > > > I still think there's a miscommunication somewhere: I believe you just > > > need a stable id to tie the audit to, so why not just give the audit > > > namespace a name like net? The id would then be durable across > > > migrations. > > > > Maybe this is where we're confusing each other - I'm not talking > > about giving the audit ns a name. I'm talking about being able to > > identify the other namespaces inside an audit message. In a way > > that (a) is unique across bare metals' entire uptime, and (b) > > can be tracked across migrations. > > OK, so that is different from what I'm thinking. I'm thinking unique > name for migrateable entity, you want a unique name for each component > of the migrateable entity? My instinct still tells me the orchestration > system is going to have a unique identifier for each different sub > container. > > However, I have to point out that a serial number isn't what you want > either if you really mean bare metal. We do a lot of deployments where > the containers run in a hypervisor, there the serial numbers won't be > unique per box (only per vm) and we'll have to do vm correlation > separately. whereas a scheme which allows the orchestration system to > supply the names would still be unique in that situation. > > James > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/06, Serge Hallyn wrote: > Quoting Richard Guy Briggs (r...@redhat.com): > > On 14/05/03, James Bottomley wrote: > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > > Questions: > > > > Is there a way to link serial numbers of namespaces involved in > > > > migration of a > > > > container to another kernel? (I had a brief look at CRIU.) Is there a > > > > unique > > > > identifier for each running instance of a kernel? Or at least some > > > > identifier > > > > within the container migration realm? > > > > > > Are you asking for a way of distinguishing an migrated container from an > > > unmigrated one? The answer is pretty much "no" because the job of > > > migration is to restore to the same state as much as possible. > > > > I hadn't thought to distinguish a migrated container from an unmigrated > > one, but rather I'm more interested in the underlying namespaces. The > > use of a generation number to identify a migrated namespace may be > > useful along with the logging to tie them together. > > > > > Reading between the lines, I think your goal is to correlate audit > > > information across a container migration, right? Ideally the management > > > system should be able to cough up an audit trail for a container > > > wherever it's running and however many times it's been migrated? > > > > The original intent was to track the underlying namespaces themselves. > > This sounds like another layer on top of that which sounds useful but > > that I had not yet considered. > > > > But yes, that sounds like a good eventual goal. > > Right and we don't need that now, all *I* wanted to convince myself of > was that a serial # as you were using it was not going to be a roadlbock > to that, since once we introduce a serial #, we're stuck with that as > user-space facing api. Understood. If a container gets migrated somewhere along with its namespace, the namespace elsewhere is going to have a new serial number, but the migration log is going to hopefully show both serial numbers. If that container gets migrated back, the supporting namespace will get yet a new serial number, with its log trail connecting the previous remote one. Those logs can be used by a higher layer audit aggregator to piece together those log crumbs. The serial number was intended to be an alternative to the inode numbers which had the issues of needing a qualifying device number accompanying it, plus the reservation that that inode number could change in the future to solve unforseen technical problems. I saw no other stable identifiers common to all namespace types with which I could work. Containers may have their own names, but I didn't see any consistent way to identify namespace instances. > > > In that case, I think your idea of a numeric serial number in a dense > > > range is wrong. Because the range is dense you're obviously never going > > > to be able to use the same serial number across a migration. However, > > > if you look at all the management systems for containers, they all have > > > a concept of some unique ID per container, be it name, UUID or even > > > GUID. I suspect it's that you should be using to tag the audit trail > > > with. > > > > That does sound potentially useful but for the fact that several > > containers could share one or more types of namespaces. > > > > Would logging just a container ID be sufficient for audit purposes? I'm > > going to have to dig a bit to understand that one because I was unaware > > each container had a unique ID. > > They don't :) Ok, so I'd be looking in vain... > > I did originally consider a UUID/GUID for namespaces. > > So I think that apart from resending to address the serial # overflow > comment, I'm happy to ack the patches. Then we probably need to convicne > Eric that we're not torturing kittens. I've already fixed the overflow issues. I'll resend with the fixes. This patch pair was intended to sort out some of my understanding of the problem I perceived, and has helped me understand there are other layers that need work too to make it useful, but this is a good base. A subsequent piece would be to expose that serial number in the proc filesystem. > -serge - RGB -- Richard Guy Briggs Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
Le 06/05/2014 01:23, James Bottomley a écrit : On May 5, 2014 3:36:38 PM PDT, Serge Hallyn wrote: Quoting James Bottomley (james.bottom...@hansenpartnership.com): On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote: Quoting James Bottomley (james.bottom...@hansenpartnership.com): On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: On 14/05/05, Serge E. Hallyn wrote: Quoting James Bottomley (james.bottom...@hansenpartnership.com): On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: Questions: Is there a way to link serial numbers of namespaces involved in migration of a container to another kernel? (I had a brief look at CRIU.) Is there a unique identifier for each running instance of a kernel? Or at least some identifier within the container migration realm? Are you asking for a way of distinguishing an migrated container from an unmigrated one? The answer is pretty much "no" because the job of migration is to restore to the same state as much as possible. Reading between the lines, I think your goal is to correlate audit information across a container migration, right? Ideally the management system should be able to cough up an audit trail for a container wherever it's running and however many times it's been migrated? In that case, I think your idea of a numeric serial number in a dense range is wrong. Because the range is dense you're obviously never going to be able to use the same serial number across a migration. However, Ah, but I was being silly before, we can actually address this pretty simply. If we just (for instance) add /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number for the relevant ns for the task, then criu can dump this info at checkpoint. Then at restart it can dump an audit message per task and ns saying old_serial=%x,new_serial=%x. That way the audit log reader can if it cares keep track. This is the sort of idea I had in mind... OK, but I don't understand then why you need a serial number. There are plenty of things we preserve across a migration, like namespace name for instance. Could you explain what function it performs because I think I might be missing something. We're looking ahead to a time when audit is namespaced, and a container can keep its own audit logs (without limiting what the host audits of course). So if a container is auditing suspicious activity by some task in a sub-namesapce, then the whole parent container gets migrated, after migration we want to continue being able to correlate the namespaces. We're also looking at audit trails on a host that is up for years. We would like every namespace to be uniquely logged there. That is why inode #s on /proc/self/ns/* are not sufficient, unless we add a generation # (which would end more complicated, not less, than a serial #). Right, but when the contaner has an audit namespace, that namespace has a name, What ns has a name? The netns for instance. netns does not have names. iproute2 uses names (a filename in fact, to hold a reference on the netns), but the kernel never got this name. It only get a file descriptor (or a pid). Regards, Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On Tue, 2014-05-06 at 03:27 +, Serge Hallyn wrote: > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > >> Right, but when the contaner has an audit namespace, that namespace > > >has > > >> a name, > > > > > >What ns has a name? > > > > The netns for instance. > > And what is its name? As I think you know ip netns list will show you all of them. The way they're applied is via mapped files in /var/run/netns/ which hold the names. > The only name I know that we could log in an > audit message is the /proc/self/ns/net inode number (which does not > suffice) OK, so I think this is the confusion: You're thinking the container itself doesn't know what name the namespace has been given by the system, all it knows is the inode number corresponding to a file which it may or may not be able to see, right? I'm thinking that the system that set up the container gave those files names and usually they're the same name for all the namespaces. The point is that the orchestration system (whatever set up the container) will be responsible for the migration. It will be the thing that has a unique handle for the container. The handle is usually ascii representable, either a human readable name or some uuid/guid. It's that handle that we should be using to prefix the audit message, so when you set up an audit namespace, it gets supplied with a prefix string corresponding to the well known name for the container. This is the string we'd preserve across migration as part of the audit namespace state ... so the audit messages all correlate to the container wherever it's migrated to; no need to do complex tracking of changes to serial numbers. > > > The audit ns can be tied to 50 pid namespaces, and > > >we > > >want to log which pidns is responsible for something. > > > > > >If you mean the pidns has a name, that's the problem... it does not, > > >it > > >only has a inode # which may later be re-use. > > > > I still think there's a miscommunication somewhere: I believe you just need > > a stable id to tie the audit to, so why not just give the audit namespace a > > name like net? The id would then be durable across migrations. > > Maybe this is where we're confusing each other - I'm not talking > about giving the audit ns a name. I'm talking about being able to > identify the other namespaces inside an audit message. In a way > that (a) is unique across bare metals' entire uptime, and (b) > can be tracked across migrations. OK, so that is different from what I'm thinking. I'm thinking unique name for migrateable entity, you want a unique name for each component of the migrateable entity? My instinct still tells me the orchestration system is going to have a unique identifier for each different sub container. However, I have to point out that a serial number isn't what you want either if you really mean bare metal. We do a lot of deployments where the containers run in a hypervisor, there the serial numbers won't be unique per box (only per vm) and we'll have to do vm correlation separately. whereas a scheme which allows the orchestration system to supply the names would still be unique in that situation. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
Quoting Richard Guy Briggs (r...@redhat.com): > On 14/05/03, James Bottomley wrote: > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > Questions: > > > Is there a way to link serial numbers of namespaces involved in migration > > > of a > > > container to another kernel? (I had a brief look at CRIU.) Is there a > > > unique > > > identifier for each running instance of a kernel? Or at least some > > > identifier > > > within the container migration realm? > > > > Are you asking for a way of distinguishing an migrated container from an > > unmigrated one? The answer is pretty much "no" because the job of > > migration is to restore to the same state as much as possible. > > I hadn't thought to distinguish a migrated container from an unmigrated > one, but rather I'm more interested in the underlying namespaces. The > use of a generation number to identify a migrated namespace may be > useful along with the logging to tie them together. > > > Reading between the lines, I think your goal is to correlate audit > > information across a container migration, right? Ideally the management > > system should be able to cough up an audit trail for a container > > wherever it's running and however many times it's been migrated? > > The original intent was to track the underlying namespaces themselves. > This sounds like another layer on top of that which sounds useful but > that I had not yet considered. > > But yes, that sounds like a good eventual goal. Right and we don't need that now, all *I* wanted to convince myself of was that a serial # as you were using it was not going to be a roadlbock to that, since once we introduce a serial #, we're stuck with that as user-space facing api. > > In that case, I think your idea of a numeric serial number in a dense > > range is wrong. Because the range is dense you're obviously never going > > to be able to use the same serial number across a migration. However, > > if you look at all the management systems for containers, they all have > > a concept of some unique ID per container, be it name, UUID or even > > GUID. I suspect it's that you should be using to tag the audit trail > > with. > > That does sound potentially useful but for the fact that several > containers could share one or more types of namespaces. > > Would logging just a container ID be sufficient for audit purposes? I'm > going to have to dig a bit to understand that one because I was unaware > each container had a unique ID. They don't :) > I did originally consider a UUID/GUID for namespaces. So I think that apart from resending to address the serial # overflow comment, I'm happy to ack the patches. Then we probably need to convicne Eric that we're not torturing kittens. -serge -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > On May 5, 2014 3:36:38 PM PDT, Serge Hallyn wrote: > >Quoting James Bottomley (james.bottom...@hansenpartnership.com): > >> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote: > >> > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > >> > > > On 14/05/05, Serge E. Hallyn wrote: > >> > > > > Quoting James Bottomley > >(james.bottom...@hansenpartnership.com): > >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs > >wrote: > >> > > > > > > Questions: > >> > > > > > > Is there a way to link serial numbers of namespaces > >involved in migration of a > >> > > > > > > container to another kernel? (I had a brief look at > >CRIU.) Is there a unique > >> > > > > > > identifier for each running instance of a kernel? Or at > >least some identifier > >> > > > > > > within the container migration realm? > >> > > > > > > >> > > > > > Are you asking for a way of distinguishing an migrated > >container from an > >> > > > > > unmigrated one? The answer is pretty much "no" because the > >job of > >> > > > > > migration is to restore to the same state as much as > >possible. > >> > > > > > > >> > > > > > Reading between the lines, I think your goal is to > >correlate audit > >> > > > > > information across a container migration, right? Ideally > >the management > >> > > > > > system should be able to cough up an audit trail for a > >container > >> > > > > > wherever it's running and however many times it's been > >migrated? > >> > > > > > > >> > > > > > In that case, I think your idea of a numeric serial number > >in a dense > >> > > > > > range is wrong. Because the range is dense you're > >obviously never going > >> > > > > > to be able to use the same serial number across a > >migration. However, > >> > > > > > >> > > > > Ah, but I was being silly before, we can actually address > >this pretty > >> > > > > simply. If we just (for instance) add > >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the > >serial number > >> > > > > for the relevant ns for the task, then criu can dump this > >info at > >> > > > > checkpoint. Then at restart it can dump an audit message per > >task and > >> > > > > ns saying old_serial=%x,new_serial=%x. That way the audit > >log reader > >> > > > > can if it cares keep track. > >> > > > > >> > > > This is the sort of idea I had in mind... > >> > > > >> > > OK, but I don't understand then why you need a serial number. > >There are > >> > > plenty of things we preserve across a migration, like namespace > >name for > >> > > instance. Could you explain what function it performs because I > >think I > >> > > might be missing something. > >> > > >> > We're looking ahead to a time when audit is namespaced, and a > >container > >> > can keep its own audit logs (without limiting what the host audits > >of > >> > course). So if a container is auditing suspicious activity by some > >> > task in a sub-namesapce, then the whole parent container gets > >migrated, > >> > after migration we want to continue being able to correlate the > >namespaces. > >> > > >> > We're also looking at audit trails on a host that is up for years. > >We > >> > would like every namespace to be uniquely logged there. That is > >why > >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a > >generation > >> > # (which would end more complicated, not less, than a serial #). > >> > >> Right, but when the contaner has an audit namespace, that namespace > >has > >> a name, > > > >What ns has a name? > > The netns for instance. And what is its name? The only name I know that we could log in an audit message is the /proc/self/ns/net inode number (which does not suffice) > > The audit ns can be tied to 50 pid namespaces, and > >we > >want to log which pidns is responsible for something. > > > >If you mean the pidns has a name, that's the problem... it does not, > >it > >only has a inode # which may later be re-use. > > I still think there's a miscommunication somewhere: I believe you just need a > stable id to tie the audit to, so why not just give the audit namespace a > name like net? The id would then be durable across migrations. Maybe this is where we're confusing each other - I'm not talking about giving the audit ns a name. I'm talking about being able to identify the other namespaces inside an audit message. In a way that (a) is unique across bare metals' entire uptime, and (b) can be tracked across migrations. And again we don't need to actually implement all that now - all I wanted to make sure of was that the serial # as proposed by Richard could be made to work for those purposes, and I now believe they can. > >> which CRIU would migrate, so why not use that name for the > >> log .. no need for numbers (unless you make the name a number, of > >> course)? > >> > >> James > > > >Sorry
Re: [PATCH 0/2] namespaces: log namespaces per task
On May 5, 2014 3:36:38 PM PDT, Serge Hallyn wrote: >Quoting James Bottomley (james.bottom...@hansenpartnership.com): >> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote: >> > Quoting James Bottomley (james.bottom...@hansenpartnership.com): >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: >> > > > On 14/05/05, Serge E. Hallyn wrote: >> > > > > Quoting James Bottomley >(james.bottom...@hansenpartnership.com): >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs >wrote: >> > > > > > > Questions: >> > > > > > > Is there a way to link serial numbers of namespaces >involved in migration of a >> > > > > > > container to another kernel? (I had a brief look at >CRIU.) Is there a unique >> > > > > > > identifier for each running instance of a kernel? Or at >least some identifier >> > > > > > > within the container migration realm? >> > > > > > >> > > > > > Are you asking for a way of distinguishing an migrated >container from an >> > > > > > unmigrated one? The answer is pretty much "no" because the >job of >> > > > > > migration is to restore to the same state as much as >possible. >> > > > > > >> > > > > > Reading between the lines, I think your goal is to >correlate audit >> > > > > > information across a container migration, right? Ideally >the management >> > > > > > system should be able to cough up an audit trail for a >container >> > > > > > wherever it's running and however many times it's been >migrated? >> > > > > > >> > > > > > In that case, I think your idea of a numeric serial number >in a dense >> > > > > > range is wrong. Because the range is dense you're >obviously never going >> > > > > > to be able to use the same serial number across a >migration. However, >> > > > > >> > > > > Ah, but I was being silly before, we can actually address >this pretty >> > > > > simply. If we just (for instance) add >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the >serial number >> > > > > for the relevant ns for the task, then criu can dump this >info at >> > > > > checkpoint. Then at restart it can dump an audit message per >task and >> > > > > ns saying old_serial=%x,new_serial=%x. That way the audit >log reader >> > > > > can if it cares keep track. >> > > > >> > > > This is the sort of idea I had in mind... >> > > >> > > OK, but I don't understand then why you need a serial number. >There are >> > > plenty of things we preserve across a migration, like namespace >name for >> > > instance. Could you explain what function it performs because I >think I >> > > might be missing something. >> > >> > We're looking ahead to a time when audit is namespaced, and a >container >> > can keep its own audit logs (without limiting what the host audits >of >> > course). So if a container is auditing suspicious activity by some >> > task in a sub-namesapce, then the whole parent container gets >migrated, >> > after migration we want to continue being able to correlate the >namespaces. >> > >> > We're also looking at audit trails on a host that is up for years. >We >> > would like every namespace to be uniquely logged there. That is >why >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a >generation >> > # (which would end more complicated, not less, than a serial #). >> >> Right, but when the contaner has an audit namespace, that namespace >has >> a name, > >What ns has a name? The netns for instance. > The audit ns can be tied to 50 pid namespaces, and >we >want to log which pidns is responsible for something. > >If you mean the pidns has a name, that's the problem... it does not, >it >only has a inode # which may later be re-use. I still think there's a miscommunication somewhere: I believe you just need a stable id to tie the audit to, so why not just give the audit namespace a name like net? The id would then be durable across migrations. >> which CRIU would migrate, so why not use that name for the >> log .. no need for numbers (unless you make the name a number, of >> course)? >> >> James > >Sorry if I'm being dense... No I think our assumptions are mismatched. I just can't figure out where. James -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
Quoting James Bottomley (james.bottom...@hansenpartnership.com): > On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote: > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > > > > On 14/05/05, Serge E. Hallyn wrote: > > > > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > > > > > Questions: > > > > > > > Is there a way to link serial numbers of namespaces involved in > > > > > > > migration of a > > > > > > > container to another kernel? (I had a brief look at CRIU.) Is > > > > > > > there a unique > > > > > > > identifier for each running instance of a kernel? Or at least > > > > > > > some identifier > > > > > > > within the container migration realm? > > > > > > > > > > > > Are you asking for a way of distinguishing an migrated container > > > > > > from an > > > > > > unmigrated one? The answer is pretty much "no" because the job of > > > > > > migration is to restore to the same state as much as possible. > > > > > > > > > > > > Reading between the lines, I think your goal is to correlate audit > > > > > > information across a container migration, right? Ideally the > > > > > > management > > > > > > system should be able to cough up an audit trail for a container > > > > > > wherever it's running and however many times it's been migrated? > > > > > > > > > > > > In that case, I think your idea of a numeric serial number in a > > > > > > dense > > > > > > range is wrong. Because the range is dense you're obviously never > > > > > > going > > > > > > to be able to use the same serial number across a migration. > > > > > > However, > > > > > > > > > > Ah, but I was being silly before, we can actually address this pretty > > > > > simply. If we just (for instance) add > > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial > > > > > number > > > > > for the relevant ns for the task, then criu can dump this info at > > > > > checkpoint. Then at restart it can dump an audit message per task and > > > > > ns saying old_serial=%x,new_serial=%x. That way the audit log reader > > > > > can if it cares keep track. > > > > > > > > This is the sort of idea I had in mind... > > > > > > OK, but I don't understand then why you need a serial number. There are > > > plenty of things we preserve across a migration, like namespace name for > > > instance. Could you explain what function it performs because I think I > > > might be missing something. > > > > We're looking ahead to a time when audit is namespaced, and a container > > can keep its own audit logs (without limiting what the host audits of > > course). So if a container is auditing suspicious activity by some > > task in a sub-namesapce, then the whole parent container gets migrated, > > after migration we want to continue being able to correlate the namespaces. > > > > We're also looking at audit trails on a host that is up for years. We > > would like every namespace to be uniquely logged there. That is why > > inode #s on /proc/self/ns/* are not sufficient, unless we add a generation > > # (which would end more complicated, not less, than a serial #). > > Right, but when the contaner has an audit namespace, that namespace has > a name, What ns has a name? The audit ns can be tied to 50 pid namespaces, and we want to log which pidns is responsible for something. If you mean the pidns has a name, that's the problem... it does not, it only has a inode # which may later be re-use. > which CRIU would migrate, so why not use that name for the > log .. no need for numbers (unless you make the name a number, of > course)? > > James Sorry if I'm being dense... -serge -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote: > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > > > On 14/05/05, Serge E. Hallyn wrote: > > > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > > > > Questions: > > > > > > Is there a way to link serial numbers of namespaces involved in > > > > > > migration of a > > > > > > container to another kernel? (I had a brief look at CRIU.) Is > > > > > > there a unique > > > > > > identifier for each running instance of a kernel? Or at least some > > > > > > identifier > > > > > > within the container migration realm? > > > > > > > > > > Are you asking for a way of distinguishing an migrated container from > > > > > an > > > > > unmigrated one? The answer is pretty much "no" because the job of > > > > > migration is to restore to the same state as much as possible. > > > > > > > > > > Reading between the lines, I think your goal is to correlate audit > > > > > information across a container migration, right? Ideally the > > > > > management > > > > > system should be able to cough up an audit trail for a container > > > > > wherever it's running and however many times it's been migrated? > > > > > > > > > > In that case, I think your idea of a numeric serial number in a dense > > > > > range is wrong. Because the range is dense you're obviously never > > > > > going > > > > > to be able to use the same serial number across a migration. However, > > > > > > > > Ah, but I was being silly before, we can actually address this pretty > > > > simply. If we just (for instance) add > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number > > > > for the relevant ns for the task, then criu can dump this info at > > > > checkpoint. Then at restart it can dump an audit message per task and > > > > ns saying old_serial=%x,new_serial=%x. That way the audit log reader > > > > can if it cares keep track. > > > > > > This is the sort of idea I had in mind... > > > > OK, but I don't understand then why you need a serial number. There are > > plenty of things we preserve across a migration, like namespace name for > > instance. Could you explain what function it performs because I think I > > might be missing something. > > We're looking ahead to a time when audit is namespaced, and a container > can keep its own audit logs (without limiting what the host audits of > course). So if a container is auditing suspicious activity by some > task in a sub-namesapce, then the whole parent container gets migrated, > after migration we want to continue being able to correlate the namespaces. > > We're also looking at audit trails on a host that is up for years. We > would like every namespace to be uniquely logged there. That is why > inode #s on /proc/self/ns/* are not sufficient, unless we add a generation > # (which would end more complicated, not less, than a serial #). Right, but when the contaner has an audit namespace, that namespace has a name, which CRIU would migrate, so why not use that name for the log .. no need for numbers (unless you make the name a number, of course)? James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
Quoting James Bottomley (james.bottom...@hansenpartnership.com): > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > > On 14/05/05, Serge E. Hallyn wrote: > > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > > > Questions: > > > > > Is there a way to link serial numbers of namespaces involved in > > > > > migration of a > > > > > container to another kernel? (I had a brief look at CRIU.) Is there > > > > > a unique > > > > > identifier for each running instance of a kernel? Or at least some > > > > > identifier > > > > > within the container migration realm? > > > > > > > > Are you asking for a way of distinguishing an migrated container from an > > > > unmigrated one? The answer is pretty much "no" because the job of > > > > migration is to restore to the same state as much as possible. > > > > > > > > Reading between the lines, I think your goal is to correlate audit > > > > information across a container migration, right? Ideally the management > > > > system should be able to cough up an audit trail for a container > > > > wherever it's running and however many times it's been migrated? > > > > > > > > In that case, I think your idea of a numeric serial number in a dense > > > > range is wrong. Because the range is dense you're obviously never going > > > > to be able to use the same serial number across a migration. However, > > > > > > Ah, but I was being silly before, we can actually address this pretty > > > simply. If we just (for instance) add > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number > > > for the relevant ns for the task, then criu can dump this info at > > > checkpoint. Then at restart it can dump an audit message per task and > > > ns saying old_serial=%x,new_serial=%x. That way the audit log reader > > > can if it cares keep track. > > > > This is the sort of idea I had in mind... > > OK, but I don't understand then why you need a serial number. There are > plenty of things we preserve across a migration, like namespace name for > instance. Could you explain what function it performs because I think I > might be missing something. We're looking ahead to a time when audit is namespaced, and a container can keep its own audit logs (without limiting what the host audits of course). So if a container is auditing suspicious activity by some task in a sub-namesapce, then the whole parent container gets migrated, after migration we want to continue being able to correlate the namespaces. We're also looking at audit trails on a host that is up for years. We would like every namespace to be uniquely logged there. That is why inode #s on /proc/self/ns/* are not sufficient, unless we add a generation # (which would end more complicated, not less, than a serial #). -serge -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On Mon, 2014-05-05 at 18:11 -0400, Richard Guy Briggs wrote: > On 14/05/05, James Bottomley wrote: > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > > > On 14/05/05, Serge E. Hallyn wrote: > > > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > > > > Questions: > > > > > > Is there a way to link serial numbers of namespaces involved in > > > > > > migration of a > > > > > > container to another kernel? (I had a brief look at CRIU.) Is > > > > > > there a unique > > > > > > identifier for each running instance of a kernel? Or at least some > > > > > > identifier > > > > > > within the container migration realm? > > > > > > > > > > Are you asking for a way of distinguishing an migrated container from > > > > > an > > > > > unmigrated one? The answer is pretty much "no" because the job of > > > > > migration is to restore to the same state as much as possible. > > > > > > > > > > Reading between the lines, I think your goal is to correlate audit > > > > > information across a container migration, right? Ideally the > > > > > management > > > > > system should be able to cough up an audit trail for a container > > > > > wherever it's running and however many times it's been migrated? > > > > > > > > > > In that case, I think your idea of a numeric serial number in a dense > > > > > range is wrong. Because the range is dense you're obviously never > > > > > going > > > > > to be able to use the same serial number across a migration. However, > > > > > > > > Ah, but I was being silly before, we can actually address this pretty > > > > simply. If we just (for instance) add > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number > > > > for the relevant ns for the task, then criu can dump this info at > > > > checkpoint. Then at restart it can dump an audit message per task and > > > > ns saying old_serial=%x,new_serial=%x. That way the audit log reader > > > > can if it cares keep track. > > > > > > This is the sort of idea I had in mind... > > > > OK, but I don't understand then why you need a serial number. There are > > plenty of things we preserve across a migration, like namespace name for > > instance. Could you explain what function it performs because I think I > > might be missing something. > > If a container was defined as an entity with 6 namespaces to itself, > this would make sense. As Eric P. put it, containers/namespaces seem to > be a bucket of semi-related nuts and bolts, with any namespace being > optional depending on the application. That's right. An IaaS container has a well defined composition, since it has to contain a full OS, but an application container is variable. It's the usual procedure with container management systems to have one name for the container and give this name to all the namespaces, but I agree, it doesn't have to. > My understanding is a > container could be migrated to another host requiring the creation of > (none,) some or all of its namespaces, potentially leaving behind some > of its shared namespaces and/or clashing names of namespaces on the > destination host. Well, no, the environment gets migrated as well so when the migration is over, the namespaces the migrated entity is in will look the same as before the migration ... if they didn't exist on the recipient, they'll be created. If a namespace already exists the restore fails ... this is because we support the usual container case where you're migrating to a disjoint set of namespaces. Even if there were some reason for supporting shared namespaces, the fundamental invariant is still the namespace names (i.e. the namespaces have the same names before and after migration), so how does a serial number help? James > > James > > > > > > -serge > > > > > > > > (Another, more heavyweight approach would be to track all ns hierarchies > > > > and make the serial numbers per-namespace-instance. So my container's > > > > pidns serial might be 0x2, and if it clones a new pidns that would be > > > > "(0x2,0x1)" on the host, or just 0x1 inside the container. But we don't > > > > need that if the simple userspace approach suffices) > > > > > > This sounds manageable... > > > > > > - RGB > > > > > > -- > > > Richard Guy Briggs > > > Senior Software Engineer, Kernel Security, AMER ENG Base Operating > > > Systems, Red Hat > > > Remote, Ottawa, Canada > > > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 > > > > > > > > - RGB > > -- > Richard Guy Briggs > Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, > Red Hat > Remote, Ottawa, Canada > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/maj
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/05, James Bottomley wrote: > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > > On 14/05/05, Serge E. Hallyn wrote: > > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > > > Questions: > > > > > Is there a way to link serial numbers of namespaces involved in > > > > > migration of a > > > > > container to another kernel? (I had a brief look at CRIU.) Is there > > > > > a unique > > > > > identifier for each running instance of a kernel? Or at least some > > > > > identifier > > > > > within the container migration realm? > > > > > > > > Are you asking for a way of distinguishing an migrated container from an > > > > unmigrated one? The answer is pretty much "no" because the job of > > > > migration is to restore to the same state as much as possible. > > > > > > > > Reading between the lines, I think your goal is to correlate audit > > > > information across a container migration, right? Ideally the management > > > > system should be able to cough up an audit trail for a container > > > > wherever it's running and however many times it's been migrated? > > > > > > > > In that case, I think your idea of a numeric serial number in a dense > > > > range is wrong. Because the range is dense you're obviously never going > > > > to be able to use the same serial number across a migration. However, > > > > > > Ah, but I was being silly before, we can actually address this pretty > > > simply. If we just (for instance) add > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number > > > for the relevant ns for the task, then criu can dump this info at > > > checkpoint. Then at restart it can dump an audit message per task and > > > ns saying old_serial=%x,new_serial=%x. That way the audit log reader > > > can if it cares keep track. > > > > This is the sort of idea I had in mind... > > OK, but I don't understand then why you need a serial number. There are > plenty of things we preserve across a migration, like namespace name for > instance. Could you explain what function it performs because I think I > might be missing something. If a container was defined as an entity with 6 namespaces to itself, this would make sense. As Eric P. put it, containers/namespaces seem to be a bucket of semi-related nuts and bolts, with any namespace being optional depending on the application. My understanding is a container could be migrated to another host requiring the creation of (none,) some or all of its namespaces, potentially leaving behind some of its shared namespaces and/or clashing names of namespaces on the destination host. > James > > > > -serge > > > > > > (Another, more heavyweight approach would be to track all ns hierarchies > > > and make the serial numbers per-namespace-instance. So my container's > > > pidns serial might be 0x2, and if it clones a new pidns that would be > > > "(0x2,0x1)" on the host, or just 0x1 inside the container. But we don't > > > need that if the simple userspace approach suffices) > > > > This sounds manageable... > > > > - RGB > > > > -- > > Richard Guy Briggs > > Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, > > Red Hat > > Remote, Ottawa, Canada > > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 > > > - RGB -- Richard Guy Briggs Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote: > On 14/05/05, Serge E. Hallyn wrote: > > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > > Questions: > > > > Is there a way to link serial numbers of namespaces involved in > > > > migration of a > > > > container to another kernel? (I had a brief look at CRIU.) Is there a > > > > unique > > > > identifier for each running instance of a kernel? Or at least some > > > > identifier > > > > within the container migration realm? > > > > > > Are you asking for a way of distinguishing an migrated container from an > > > unmigrated one? The answer is pretty much "no" because the job of > > > migration is to restore to the same state as much as possible. > > > > > > Reading between the lines, I think your goal is to correlate audit > > > information across a container migration, right? Ideally the management > > > system should be able to cough up an audit trail for a container > > > wherever it's running and however many times it's been migrated? > > > > > > In that case, I think your idea of a numeric serial number in a dense > > > range is wrong. Because the range is dense you're obviously never going > > > to be able to use the same serial number across a migration. However, > > > > Ah, but I was being silly before, we can actually address this pretty > > simply. If we just (for instance) add > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number > > for the relevant ns for the task, then criu can dump this info at > > checkpoint. Then at restart it can dump an audit message per task and > > ns saying old_serial=%x,new_serial=%x. That way the audit log reader > > can if it cares keep track. > > This is the sort of idea I had in mind... OK, but I don't understand then why you need a serial number. There are plenty of things we preserve across a migration, like namespace name for instance. Could you explain what function it performs because I think I might be missing something. Thanks, James > > -serge > > > > (Another, more heavyweight approach would be to track all ns hierarchies > > and make the serial numbers per-namespace-instance. So my container's > > pidns serial might be 0x2, and if it clones a new pidns that would be > > "(0x2,0x1)" on the host, or just 0x1 inside the container. But we don't > > need that if the simple userspace approach suffices) > > This sounds manageable... > > - RGB > > -- > Richard Guy Briggs > Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, > Red Hat > Remote, Ottawa, Canada > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/05, Serge E. Hallyn wrote: > Quoting James Bottomley (james.bottom...@hansenpartnership.com): > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > > Questions: > > > Is there a way to link serial numbers of namespaces involved in migration > > > of a > > > container to another kernel? (I had a brief look at CRIU.) Is there a > > > unique > > > identifier for each running instance of a kernel? Or at least some > > > identifier > > > within the container migration realm? > > > > Are you asking for a way of distinguishing an migrated container from an > > unmigrated one? The answer is pretty much "no" because the job of > > migration is to restore to the same state as much as possible. > > > > Reading between the lines, I think your goal is to correlate audit > > information across a container migration, right? Ideally the management > > system should be able to cough up an audit trail for a container > > wherever it's running and however many times it's been migrated? > > > > In that case, I think your idea of a numeric serial number in a dense > > range is wrong. Because the range is dense you're obviously never going > > to be able to use the same serial number across a migration. However, > > Ah, but I was being silly before, we can actually address this pretty > simply. If we just (for instance) add > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number > for the relevant ns for the task, then criu can dump this info at > checkpoint. Then at restart it can dump an audit message per task and > ns saying old_serial=%x,new_serial=%x. That way the audit log reader > can if it cares keep track. This is the sort of idea I had in mind... > -serge > > (Another, more heavyweight approach would be to track all ns hierarchies > and make the serial numbers per-namespace-instance. So my container's > pidns serial might be 0x2, and if it clones a new pidns that would be > "(0x2,0x1)" on the host, or just 0x1 inside the container. But we don't > need that if the simple userspace approach suffices) This sounds manageable... - RGB -- Richard Guy Briggs Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/03, James Bottomley wrote: > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > Questions: > > Is there a way to link serial numbers of namespaces involved in migration > > of a > > container to another kernel? (I had a brief look at CRIU.) Is there a > > unique > > identifier for each running instance of a kernel? Or at least some > > identifier > > within the container migration realm? > > Are you asking for a way of distinguishing an migrated container from an > unmigrated one? The answer is pretty much "no" because the job of > migration is to restore to the same state as much as possible. I hadn't thought to distinguish a migrated container from an unmigrated one, but rather I'm more interested in the underlying namespaces. The use of a generation number to identify a migrated namespace may be useful along with the logging to tie them together. > Reading between the lines, I think your goal is to correlate audit > information across a container migration, right? Ideally the management > system should be able to cough up an audit trail for a container > wherever it's running and however many times it's been migrated? The original intent was to track the underlying namespaces themselves. This sounds like another layer on top of that which sounds useful but that I had not yet considered. But yes, that sounds like a good eventual goal. > In that case, I think your idea of a numeric serial number in a dense > range is wrong. Because the range is dense you're obviously never going > to be able to use the same serial number across a migration. However, > if you look at all the management systems for containers, they all have > a concept of some unique ID per container, be it name, UUID or even > GUID. I suspect it's that you should be using to tag the audit trail > with. That does sound potentially useful but for the fact that several containers could share one or more types of namespaces. Would logging just a container ID be sufficient for audit purposes? I'm going to have to dig a bit to understand that one because I was unaware each container had a unique ID. I did originally consider a UUID/GUID for namespaces. > James - RGB -- Richard Guy Briggs Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/02, Serge Hallyn wrote: > Quoting Richard Guy Briggs (r...@redhat.com): > > On 14/05/02, Serge E. Hallyn wrote: > > > Quoting Richard Guy Briggs (r...@redhat.com): > > > > I saw no replies to my questions when I replied a year after Aris' > > > > posting, so > > > > I don't know if it was ignored or got lost in stale threads: > > > > > > > > https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html > > > > > > > > https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html > > > > > > > > (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html) > > > > > > > > https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html > > > > > > > > I've tried to answer a number of questions that were raised in that > > > > thread. > > > > > > > > The goal is not quite identical to Aris' patchset. > > > > > > > > The purpose is to track namespaces in use by logged processes from the > > > > perspective of init_*_ns. The first patch defines a function to list > > > > them. > > > > The second patch provides an example of usage for audit_log_task_info() > > > > which > > > > is used by syscall audits, among others. audit_log_task() and > > > > audit_common_recv_message() would be other potential use cases. > > > > > > > > Use a serial number per namespace (unique across one boot of one kernel) > > > > instead of the inode number (which is claimed to have had the right to > > > > change > > > > reserved and is not necessarily unique if there is more than one proc > > > > fs). It > > > > could be argued that the inode numbers have now become a defacto > > > > interface and > > > > can't change now, but I'm proposing this approach to see if this helps > > > > address > > > > some of the objections to the earlier patchset. > > > > > > > > There could also have messages added to track the creation and the > > > > destruction > > > > of namespaces, listing the parent for hierarchical namespaces such as > > > > pidns, > > > > userns, and listing other ids for non-hierarchical namespaces, as well > > > > as other > > > > information to help identify a namespace. > > > > > > > > There has been some progress made for audit in net namespaces and pid > > > > namespaces since this previous thread. net namespaces are now served > > > > as peers > > > > by one auditd in the init_net namespace with processes in a non-init_net > > > > namespace being able to write records if they are in the init_user_ns > > > > and have > > > > CAP_AUDIT_WRITE. Processes in a non-init_pid_ns can now similarly write > > > > records. As for CAP_AUDIT_READ, I just posted a patchset to check > > > > capabilities > > > > of userspace processes that try to join netlink broadcast groups. > > > > > > > > > > > > Questions: > > > > Is there a way to link serial numbers of namespaces involved in > > > > migration of a > > > > container to another kernel? (I had a brief look at CRIU.) Is there a > > > > unique > > > > identifier for each running instance of a kernel? Or at least some > > > > identifier > > > > within the container migration realm? > > > > > > Eric Biederman has always been adamantly opposed to adding new namespaces > > > of namespaces, so the fact that you're asking this question concerns me. > > > > I have seen that position and I don't fully understand the justification > > for it other than added complexity. > > > > One way that occured to me to be able to identify a kernel instance was > > to look at CPU serial numbers or other CPU entity intended to be > > globally unique, but that isn't universally available. > > That's one issue, which is uniqueness of namespaces cross-machines. > > But it gets worse if we consider that after allowing in-container audit, > we'll have a nested container running, then have the parent container > migrated to another host (or just checkpointed and restarted); Now the > nexted container's indexes will all be changed. Is there any way audit > can track who's who after the migration? Presumably the namespace serial numbers before and after would be logged in one message to tie them together. > That's not an indictment of the serial # approach, since (a) we don't > have in-container audit yet and (b) we don't have c/r/migration of nested > containers. But it's worth considering whether we can solve the issue > with serial #s, and, if not, whether we can solve it with any other > approach. > > I guess one approach to solve it would be to allow userspace to request > a next serial #. Which will immediately lead us to a namespace of serial > #s (since the requested # might be lower than the last used one on the > new host). :P > As you've said inode #s for /proc/self/ns/* probably aren't sufficiently > unique, though perhaps we could attach a generation # for the sake of > audit. Then after a c/r/migration the generation # may be different, > but we may have a better shot at
Re: [PATCH 0/2] namespaces: log namespaces per task
Le 02/05/2014 16:28, Richard Guy Briggs a écrit : On 14/05/02, Serge E. Hallyn wrote: Quoting Richard Guy Briggs (r...@redhat.com): I saw no replies to my questions when I replied a year after Aris' posting, so I don't know if it was ignored or got lost in stale threads: https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html) https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html I've tried to answer a number of questions that were raised in that thread. The goal is not quite identical to Aris' patchset. The purpose is to track namespaces in use by logged processes from the perspective of init_*_ns. The first patch defines a function to list them. The second patch provides an example of usage for audit_log_task_info() which is used by syscall audits, among others. audit_log_task() and audit_common_recv_message() would be other potential use cases. Use a serial number per namespace (unique across one boot of one kernel) instead of the inode number (which is claimed to have had the right to change reserved and is not necessarily unique if there is more than one proc fs). It could be argued that the inode numbers have now become a defacto interface and can't change now, but I'm proposing this approach to see if this helps address some of the objections to the earlier patchset. There could also have messages added to track the creation and the destruction of namespaces, listing the parent for hierarchical namespaces such as pidns, userns, and listing other ids for non-hierarchical namespaces, as well as other information to help identify a namespace. There has been some progress made for audit in net namespaces and pid namespaces since this previous thread. net namespaces are now served as peers by one auditd in the init_net namespace with processes in a non-init_net namespace being able to write records if they are in the init_user_ns and have CAP_AUDIT_WRITE. Processes in a non-init_pid_ns can now similarly write records. As for CAP_AUDIT_READ, I just posted a patchset to check capabilities of userspace processes that try to join netlink broadcast groups. Questions: Is there a way to link serial numbers of namespaces involved in migration of a container to another kernel? (I had a brief look at CRIU.) Is there a unique identifier for each running instance of a kernel? Or at least some identifier within the container migration realm? Eric Biederman has always been adamantly opposed to adding new namespaces of namespaces, so the fact that you're asking this question concerns me. I have seen that position and I don't fully understand the justification for it other than added complexity. Just FYI, have you seen this thread: http://thread.gmane.org/gmane.linux.network/286572/ There is some explanations/examples about this topic. Regards, Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
Quoting James Bottomley (james.bottom...@hansenpartnership.com): > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > > Questions: > > Is there a way to link serial numbers of namespaces involved in migration > > of a > > container to another kernel? (I had a brief look at CRIU.) Is there a > > unique > > identifier for each running instance of a kernel? Or at least some > > identifier > > within the container migration realm? > > Are you asking for a way of distinguishing an migrated container from an > unmigrated one? The answer is pretty much "no" because the job of > migration is to restore to the same state as much as possible. > > Reading between the lines, I think your goal is to correlate audit > information across a container migration, right? Ideally the management > system should be able to cough up an audit trail for a container > wherever it's running and however many times it's been migrated? > > In that case, I think your idea of a numeric serial number in a dense > range is wrong. Because the range is dense you're obviously never going > to be able to use the same serial number across a migration. However, Ah, but I was being silly before, we can actually address this pretty simply. If we just (for instance) add /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number for the relevant ns for the task, then criu can dump this info at checkpoint. Then at restart it can dump an audit message per task and ns saying old_serial=%x,new_serial=%x. That way the audit log reader can if it cares keep track. -serge (Another, more heavyweight approach would be to track all ns hierarchies and make the serial numbers per-namespace-instance. So my container's pidns serial might be 0x2, and if it clones a new pidns that would be "(0x2,0x1)" on the host, or just 0x1 inside the container. But we don't need that if the simple userspace approach suffices) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote: > Questions: > Is there a way to link serial numbers of namespaces involved in migration of a > container to another kernel? (I had a brief look at CRIU.) Is there a unique > identifier for each running instance of a kernel? Or at least some identifier > within the container migration realm? Are you asking for a way of distinguishing an migrated container from an unmigrated one? The answer is pretty much "no" because the job of migration is to restore to the same state as much as possible. Reading between the lines, I think your goal is to correlate audit information across a container migration, right? Ideally the management system should be able to cough up an audit trail for a container wherever it's running and however many times it's been migrated? In that case, I think your idea of a numeric serial number in a dense range is wrong. Because the range is dense you're obviously never going to be able to use the same serial number across a migration. However, if you look at all the management systems for containers, they all have a concept of some unique ID per container, be it name, UUID or even GUID. I suspect it's that you should be using to tag the audit trail with. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] namespaces: log namespaces per task
Quoting Richard Guy Briggs (r...@redhat.com): > On 14/05/02, Serge E. Hallyn wrote: > > Quoting Richard Guy Briggs (r...@redhat.com): > > > I saw no replies to my questions when I replied a year after Aris' > > > posting, so > > > I don't know if it was ignored or got lost in stale threads: > > > > > > https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html > > > > > > https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html > > > > > > (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html) > > > > > > https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html > > > > > > I've tried to answer a number of questions that were raised in that > > > thread. > > > > > > The goal is not quite identical to Aris' patchset. > > > > > > The purpose is to track namespaces in use by logged processes from the > > > perspective of init_*_ns. The first patch defines a function to list > > > them. > > > The second patch provides an example of usage for audit_log_task_info() > > > which > > > is used by syscall audits, among others. audit_log_task() and > > > audit_common_recv_message() would be other potential use cases. > > > > > > Use a serial number per namespace (unique across one boot of one kernel) > > > instead of the inode number (which is claimed to have had the right to > > > change > > > reserved and is not necessarily unique if there is more than one proc > > > fs). It > > > could be argued that the inode numbers have now become a defacto > > > interface and > > > can't change now, but I'm proposing this approach to see if this helps > > > address > > > some of the objections to the earlier patchset. > > > > > > There could also have messages added to track the creation and the > > > destruction > > > of namespaces, listing the parent for hierarchical namespaces such as > > > pidns, > > > userns, and listing other ids for non-hierarchical namespaces, as well as > > > other > > > information to help identify a namespace. > > > > > > There has been some progress made for audit in net namespaces and pid > > > namespaces since this previous thread. net namespaces are now served as > > > peers > > > by one auditd in the init_net namespace with processes in a non-init_net > > > namespace being able to write records if they are in the init_user_ns and > > > have > > > CAP_AUDIT_WRITE. Processes in a non-init_pid_ns can now similarly write > > > records. As for CAP_AUDIT_READ, I just posted a patchset to check > > > capabilities > > > of userspace processes that try to join netlink broadcast groups. > > > > > > > > > Questions: > > > Is there a way to link serial numbers of namespaces involved in migration > > > of a > > > container to another kernel? (I had a brief look at CRIU.) Is there a > > > unique > > > identifier for each running instance of a kernel? Or at least some > > > identifier > > > within the container migration realm? > > > > Eric Biederman has always been adamantly opposed to adding new namespaces > > of namespaces, so the fact that you're asking this question concerns me. > > I have seen that position and I don't fully understand the justification > for it other than added complexity. > > One way that occured to me to be able to identify a kernel instance was > to look at CPU serial numbers or other CPU entity intended to be > globally unique, but that isn't universally available. That's one issue, which is uniqueness of namespaces cross-machines. But it gets worse if we consider that after allowing in-container audit, we'll have a nested container running, then have the parent container migrated to another host (or just checkpointed and restarted); Now the nexted container's indexes will all be changed. Is there any way audit can track who's who after the migration? That's not an indictment of the serial # approach, since (a) we don't have in-container audit yet and (b) we don't have c/r/migration of nested containers. But it's worth considering whether we can solve the issue with serial #s, and, if not, whether we can solve it with any other approach. I guess one approach to solve it would be to allow userspace to request a next serial #. Which will immediately lead us to a namespace of serial #s (since the requested # might be lower than the last used one on the new host). As you've said inode #s for /proc/self/ns/* probably aren't sufficiently unique, though perhaps we could attach a generation # for the sake of audit. Then after a c/r/migration the generation # may be different, but we may have a better shot at at least using the same ino#. > Another possibility was RTC reading at time of boot, but that isn't good > enough either. > > Both are dubious in VMs anyways. > > > The way things are right now, since audit belongs to the init userns, > > we can get away with saying if a container 'migrates', the new kernel > > will see a different set of serials,
Re: [PATCH 0/2] namespaces: log namespaces per task
On 14/05/02, Serge E. Hallyn wrote: > Quoting Richard Guy Briggs (r...@redhat.com): > > I saw no replies to my questions when I replied a year after Aris' posting, > > so > > I don't know if it was ignored or got lost in stale threads: > > https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html > > https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html > > > > (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html) > > > > https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html > > > > I've tried to answer a number of questions that were raised in that thread. > > > > The goal is not quite identical to Aris' patchset. > > > > The purpose is to track namespaces in use by logged processes from the > > perspective of init_*_ns. The first patch defines a function to list them. > > The second patch provides an example of usage for audit_log_task_info() > > which > > is used by syscall audits, among others. audit_log_task() and > > audit_common_recv_message() would be other potential use cases. > > > > Use a serial number per namespace (unique across one boot of one kernel) > > instead of the inode number (which is claimed to have had the right to > > change > > reserved and is not necessarily unique if there is more than one proc fs). > > It > > could be argued that the inode numbers have now become a defacto interface > > and > > can't change now, but I'm proposing this approach to see if this helps > > address > > some of the objections to the earlier patchset. > > > > There could also have messages added to track the creation and the > > destruction > > of namespaces, listing the parent for hierarchical namespaces such as pidns, > > userns, and listing other ids for non-hierarchical namespaces, as well as > > other > > information to help identify a namespace. > > > > There has been some progress made for audit in net namespaces and pid > > namespaces since this previous thread. net namespaces are now served as > > peers > > by one auditd in the init_net namespace with processes in a non-init_net > > namespace being able to write records if they are in the init_user_ns and > > have > > CAP_AUDIT_WRITE. Processes in a non-init_pid_ns can now similarly write > > records. As for CAP_AUDIT_READ, I just posted a patchset to check > > capabilities > > of userspace processes that try to join netlink broadcast groups. > > > > > > Questions: > > Is there a way to link serial numbers of namespaces involved in migration > > of a > > container to another kernel? (I had a brief look at CRIU.) Is there a > > unique > > identifier for each running instance of a kernel? Or at least some > > identifier > > within the container migration realm? > > Eric Biederman has always been adamantly opposed to adding new namespaces > of namespaces, so the fact that you're asking this question concerns me. I have seen that position and I don't fully understand the justification for it other than added complexity. One way that occured to me to be able to identify a kernel instance was to look at CPU serial numbers or other CPU entity intended to be globally unique, but that isn't universally available. Another possibility was RTC reading at time of boot, but that isn't good enough either. Both are dubious in VMs anyways. > The way things are right now, since audit belongs to the init userns, > we can get away with saying if a container 'migrates', the new kernel > will see a different set of serials, and noone should care. However, > if we're going to be allowing containers to have their own audit > namespace/layer/whatever, then this becomes more of a concern. Having a container have its own audit daemon (partitionned appropriately in the kernel) would be a long-term goal. > That said, I'll now look at the patches while pretending that problem > does not exist :) If I ack, it'll be on correctness of the code, but > we'll still have to deal with this issue. Getting some discussion about this migration challenge was a significant motivation for posting this patch, so I'm hoping others will weigh in. Thanks for your review, Serge. > > What additional events should list this information? > > > > Does this present any kind of information leak? Only CAP_AUDIT_CONTROL (and > > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the > > init namespace at the moment. > > > > > > Proposed output format: > > This differs slightly from Aristeu's patch because of the label conflict > > with > > "pid=" due to including it in existing records rather than it being a > > seperate > > record: > > type=SYSCALL msg=audit(1398112249.996:65): arch=c03e > > syscall=272 success=yes exit=0 a0=4000 a1= a2=0 a3=22 > > items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 > > egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" > > ex
Re: [PATCH 0/2] namespaces: log namespaces per task
Quoting Richard Guy Briggs (r...@redhat.com): > I saw no replies to my questions when I replied a year after Aris' posting, so > I don't know if it was ignored or got lost in stale threads: > https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html > https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html > > (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html) > https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html > > I've tried to answer a number of questions that were raised in that thread. > > The goal is not quite identical to Aris' patchset. > > The purpose is to track namespaces in use by logged processes from the > perspective of init_*_ns. The first patch defines a function to list them. > The second patch provides an example of usage for audit_log_task_info() which > is used by syscall audits, among others. audit_log_task() and > audit_common_recv_message() would be other potential use cases. > > Use a serial number per namespace (unique across one boot of one kernel) > instead of the inode number (which is claimed to have had the right to change > reserved and is not necessarily unique if there is more than one proc fs). It > could be argued that the inode numbers have now become a defacto interface and > can't change now, but I'm proposing this approach to see if this helps address > some of the objections to the earlier patchset. > > There could also have messages added to track the creation and the destruction > of namespaces, listing the parent for hierarchical namespaces such as pidns, > userns, and listing other ids for non-hierarchical namespaces, as well as > other > information to help identify a namespace. > > There has been some progress made for audit in net namespaces and pid > namespaces since this previous thread. net namespaces are now served as peers > by one auditd in the init_net namespace with processes in a non-init_net > namespace being able to write records if they are in the init_user_ns and have > CAP_AUDIT_WRITE. Processes in a non-init_pid_ns can now similarly write > records. As for CAP_AUDIT_READ, I just posted a patchset to check > capabilities > of userspace processes that try to join netlink broadcast groups. > > > Questions: > Is there a way to link serial numbers of namespaces involved in migration of a > container to another kernel? (I had a brief look at CRIU.) Is there a unique > identifier for each running instance of a kernel? Or at least some identifier > within the container migration realm? Eric Biederman has always been adamantly opposed to adding new namespaces of namespaces, so the fact that you're asking this question concerns me. The way things are right now, since audit belongs to the init userns, we can get away with saying if a container 'migrates', the new kernel will see a different set of serials, and noone should care. However, if we're going to be allowing containers to have their own audit namespace/layer/whatever, then this becomes more of a concern. That said, I'll now look at the patches while pretending that problem does not exist :) If I ack, it'll be on correctness of the code, but we'll still have to deal with this issue. > What additional events should list this information? > > Does this present any kind of information leak? Only CAP_AUDIT_CONTROL (and > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the > init namespace at the moment. > > > Proposed output format: > This differs slightly from Aristeu's patch because of the label conflict with > "pid=" due to including it in existing records rather than it being a seperate > record: > type=SYSCALL msg=audit(1398112249.996:65): arch=c03e syscall=272 > success=yes exit=0 a0=4000 a1= a2=0 a3=22 items=0 ppid=1 > pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 > fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" > exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 > userns=3 subj=system_u:system_r:init_t:s0 key=(null) > > > Note: This set does not try to solve the non-init namespace audit messages and > auditd problem yet. That will come later, likely with additional auditd > instances running in another namespace with a limited ability to influence the > master auditd. I echo Eric B's idea that messages destined for different > namespaces would have to be tailored for that namespace with references that > make sense (such as the right pid number reported to that pid namespace, and > not leaking info about parents or peers). > > > Richard Guy Briggs (2): > namespaces: give each namespace a serial number > audit: log namespace serial numbers > > fs/mount.h |1 + > fs/namespace.c |1 + > include/linux/audit.h |7 +++ > include/linux/ipc_namespace.h |1 + > include/linux/nsproxy