Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-07 Thread Nicolas Dichtel

Le 06/05/2014 23:15, Richard Guy Briggs a écrit :

On 14/05/05, Nicolas Dichtel wrote:

Le 02/05/2014 16:28, Richard Guy Briggs a ?crit :

On 14/05/02, Serge E. Hallyn wrote:

Quoting Richard Guy Briggs (r...@redhat.com):

I saw no replies to my questions when I replied a year after Aris' posting, so
I don't know if it was ignored or got lost in stale threads:
 https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
 https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html

(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
 https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html

I've tried to answer a number of questions that were raised in that thread.

The goal is not quite identical to Aris' patchset.

The purpose is to track namespaces in use by logged processes from the
perspective of init_*_ns.  The first patch defines a function to list them.
The second patch provides an example of usage for audit_log_task_info() which
is used by syscall audits, among others.  audit_log_task() and
audit_common_recv_message() would be other potential use cases.

Use a serial number per namespace (unique across one boot of one kernel)
instead of the inode number (which is claimed to have had the right to change
reserved and is not necessarily unique if there is more than one proc fs).  It
could be argued that the inode numbers have now become a defacto interface and
can't change now, but I'm proposing this approach to see if this helps address
some of the objections to the earlier patchset.

There could also have messages added to track the creation and the destruction
of namespaces, listing the parent for hierarchical namespaces such as pidns,
userns, and listing other ids for non-hierarchical namespaces, as well as other
information to help identify a namespace.

There has been some progress made for audit in net namespaces and pid
namespaces since this previous thread.  net namespaces are now served as peers
by one auditd in the init_net namespace with processes in a non-init_net
namespace being able to write records if they are in the init_user_ns and have
CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
of userspace processes that try to join netlink broadcast groups.


Questions:
Is there a way to link serial numbers of namespaces involved in migration of a
container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
identifier for each running instance of a kernel?  Or at least some identifier
within the container migration realm?


Eric Biederman has always been adamantly opposed to adding new namespaces
of namespaces, so the fact that you're asking this question concerns me.


I have seen that position and I don't fully understand the justification
for it other than added complexity.

Just FYI, have you seen this thread:
http://thread.gmane.org/gmane.linux.network/286572/

There is some explanations/examples about this topic.


Thanks for that reference.  I read it through, but will need to do so
again to get it to sink in.


I think audit has the same problematic than x-netns netdevice: beeing able to 
identify a peer netns, when a userland apps "read" a message from the kernel.


The main problem with file descriptor is that you cannot use them when you
broadcast a message from kernel to userland.

Maybe we can use the local names concept (like file descriptors but without
their constraints), ie having an identifier of a peer (net)ns which is only
valid the current (net)ns. When the kernel needs to identify a peer (net)ns, it
uses this identifier (or allocate it the first time). After that, the userland
apps may reuse this identifier to configure things in the peer (net)ns.

Eric, any thoughts about this?

Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-06 Thread James Bottomley
On Tue, 2014-05-06 at 17:41 -0400, Richard Guy Briggs wrote:
> On 14/05/05, James Bottomley wrote:
> > On May 5, 2014 3:36:38 PM PDT, Serge Hallyn  wrote:
> > >Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > >> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote:
> > >> > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > >> > > > On 14/05/05, Serge E. Hallyn wrote:
> > >> > > > > Quoting James Bottomley
> > >(james.bottom...@hansenpartnership.com):
> > >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> > >wrote:
> > >> > > > > > > Questions:
> > >> > > > > > > Is there a way to link serial numbers of namespaces
> > >involved in migration of a
> > >> > > > > > > container to another kernel?  (I had a brief look at
> > >CRIU.)  Is there a unique
> > >> > > > > > > identifier for each running instance of a kernel?  Or at
> > >least some identifier
> > >> > > > > > > within the container migration realm?
> > >> > > > > > 
> > >> > > > > > Are you asking for a way of distinguishing an migrated
> > >container from an
> > >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> > >job of
> > >> > > > > > migration is to restore to the same state as much as
> > >possible.
> > >> > > > > > 
> > >> > > > > > Reading between the lines, I think your goal is to
> > >correlate audit
> > >> > > > > > information across a container migration, right?  Ideally
> > >the management
> > >> > > > > > system should be able to cough up an audit trail for a
> > >container
> > >> > > > > > wherever it's running and however many times it's been
> > >migrated?
> > >> > > > > > 
> > >> > > > > > In that case, I think your idea of a numeric serial number
> > >in a dense
> > >> > > > > > range is wrong.  Because the range is dense you're
> > >obviously never going
> > >> > > > > > to be able to use the same serial number across a
> > >migration.  However,
> > >> > > > > 
> > >> > > > > Ah, but I was being silly before, we can actually address
> > >this pretty
> > >> > > > > simply.  If we just (for instance) add
> > >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> > >serial number
> > >> > > > > for the relevant ns for the task, then criu can dump this
> > >info at
> > >> > > > > checkpoint.  Then at restart it can dump an audit message per
> > >task and
> > >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> > >log reader
> > >> > > > > can if it cares keep track.
> > >> > > > 
> > >> > > > This is the sort of idea I had in mind...
> > >> > > 
> > >> > > OK, but I don't understand then why you need a serial number. 
> > >There are
> > >> > > plenty of things we preserve across a migration, like namespace
> > >name for
> > >> > > instance.  Could you explain what function it performs because I
> > >think I
> > >> > > might be missing something.
> > >> > 
> > >> > We're looking ahead to a time when audit is namespaced, and a
> > >container
> > >> > can keep its own audit logs (without limiting what the host audits
> > >of
> > >> > course).  So if a container is auditing suspicious activity by some
> > >> > task in a sub-namesapce, then the whole parent container gets
> > >migrated,
> > >> > after migration we want to continue being able to correlate the
> > >namespaces.
> > >> > 
> > >> > We're also looking at audit trails on a host that is up for years. 
> > >We
> > >> > would like every namespace to be uniquely logged there.  That is
> > >why
> > >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> > >generation
> > >> > # (which would end more complicated, not less, than a serial #).
> > >> 
> > >> Right, but when the contaner has an audit namespace, that namespace
> > >has
> > >> a name,
> > >
> > >What ns has a name?
> > 
> > The netns for instance.
> > 
> > >  The audit ns can be tied to 50 pid namespaces, and
> > >we
> > >want to log which pidns is responsible for something.
> > >
> > >If you mean the pidns has a name, that's the problem...  it does not,
> > >it
> > >only has a inode # which may later be re-use.
> > 
> > I still think there's a miscommunication somewhere: I believe you just
> > need a stable id to tie the audit to, so why not just give the audit
> > namespace a name like net?  The id would then be durable across
> > migrations.
> 
> Audit does not have its own namespace (yet).

So it would make the most sense to do this if audit were a separately
attachable capability the orchestrator would like to control.  I'm not
sure about that so I'll consider some use cases below.

>   That idea is being
> considered, but we would prefer to avoid it if it makes sense to tie it
> in with an existing namespace.  The pid and user namespaces, being
> heierarchical seem to make the most sense so far, but we are proceeding
> very carefully to avoid creating a security nightmare in the process.

pid ns might be.  You need th

Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-06 Thread Richard Guy Briggs
On 14/05/05, James Bottomley wrote:
> On Tue, 2014-05-06 at 03:27 +, Serge Hallyn wrote:
> > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > >> Right, but when the contaner has an audit namespace, that namespace
> > > >has
> > > >> a name,
> > > >
> > > >What ns has a name?
> > > 
> > > The netns for instance.
> > 
> > And what is its name?
> 
> As I think you know ip netns list will show you all of them.  The way
> they're applied is via mapped files in /var/run/netns/ which hold the
> names.
> 
> >   The only name I know that we could log in an
> > audit message is the /proc/self/ns/net inode number (which does not
> > suffice)
> 
> OK, so I think this is the confusion: You're thinking the container
> itself doesn't know what name the namespace has been given by the
> system, all it knows is the inode number corresponding to a file which
> it may or may not be able to see, right?

I guess if that container hasn't mounted /proc, it couldn't find out.
The same would be true of /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq to
find out its namespace serial numbers, but that doesn't stop that
container from initiating an audit message with the information it
knows, which can be supplemented by information the kernel already knows
about it.

> I'm thinking that the system
> that set up the container gave those files names and usually they're the
> same name for all the namespaces.  The point is that the orchestration
> system (whatever set up the container) will be responsible for the
> migration.  It will be the thing that has a unique handle for the
> container.  The handle is usually ascii representable, either a human
> readable name or some uuid/guid.  It's that handle that we should be
> using to prefix the audit message,

It is now possible to send audit messages while in another non-init
namespace, so from there, it could record that handle and have the
namespace serial numbers from the kernel logged with that message.  This
would be recorded by the host audit daemon, not the container audit
daemon.  The container management system could talk to this host audit
daemon to re-assemble an audit record trail for that container.

> so when you set up an audit
> namespace, it gets supplied with a prefix string corresponding to the
> well known name for the container.  This is the string we'd preserve
> across migration as part of the audit namespace state ... so the audit
> messages all correlate to the container wherever it's migrated to; no
> need to do complex tracking of changes to serial numbers.

That is a further step: having a container have its own audit daemon.

> > > >  The audit ns can be tied to 50 pid namespaces, and
> > > >we
> > > >want to log which pidns is responsible for something.
> > > >
> > > >If you mean the pidns has a name, that's the problem...  it does not,
> > > >it
> > > >only has a inode # which may later be re-use.
> > > 
> > > I still think there's a miscommunication somewhere: I believe you just 
> > > need a stable id to tie the audit to, so why not just give the audit 
> > > namespace a name like net?  The id would then be durable across 
> > > migrations.
> > 
> > Maybe this is where we're confusing each other - I'm not talking
> > about giving the audit ns a name.  I'm talking about being able to
> > identify the other namespaces inside an audit message.  In a way
> > that (a) is unique across bare metals' entire uptime, and (b)
> > can be tracked across migrations.
> 
> OK, so that is different from what I'm thinking.  I'm thinking unique
> name for migrateable entity, you want a unique name for each component
> of the migrateable entity?

Yes.

>  My instinct still tells me the orchestration
> system is going to have a unique identifier for each different sub
> container.

So what is a sub container?  A nested container?  We still want to track
component namespaces of each nested container.

> However, I have to point out that a serial number isn't what you want
> either if you really mean bare metal.  We do a lot of deployments where
> the containers run in a hypervisor, there the serial numbers won't be
> unique per box (only per vm) and we'll have to do vm correlation
> separately.  whereas a scheme which allows the orchestration system to
> supply the names would still be unique in that situation.

Unique per _running kernel_ was my intention.  I don't care if it is
bare metal or not.

> James

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-06 Thread Richard Guy Briggs
On 14/05/05, James Bottomley wrote:
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn  wrote:
> >Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> >> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote:
> >> > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> >> > > > On 14/05/05, Serge E. Hallyn wrote:
> >> > > > > Quoting James Bottomley
> >(james.bottom...@hansenpartnership.com):
> >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> >wrote:
> >> > > > > > > Questions:
> >> > > > > > > Is there a way to link serial numbers of namespaces
> >involved in migration of a
> >> > > > > > > container to another kernel?  (I had a brief look at
> >CRIU.)  Is there a unique
> >> > > > > > > identifier for each running instance of a kernel?  Or at
> >least some identifier
> >> > > > > > > within the container migration realm?
> >> > > > > > 
> >> > > > > > Are you asking for a way of distinguishing an migrated
> >container from an
> >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> >job of
> >> > > > > > migration is to restore to the same state as much as
> >possible.
> >> > > > > > 
> >> > > > > > Reading between the lines, I think your goal is to
> >correlate audit
> >> > > > > > information across a container migration, right?  Ideally
> >the management
> >> > > > > > system should be able to cough up an audit trail for a
> >container
> >> > > > > > wherever it's running and however many times it's been
> >migrated?
> >> > > > > > 
> >> > > > > > In that case, I think your idea of a numeric serial number
> >in a dense
> >> > > > > > range is wrong.  Because the range is dense you're
> >obviously never going
> >> > > > > > to be able to use the same serial number across a
> >migration.  However,
> >> > > > > 
> >> > > > > Ah, but I was being silly before, we can actually address
> >this pretty
> >> > > > > simply.  If we just (for instance) add
> >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> >serial number
> >> > > > > for the relevant ns for the task, then criu can dump this
> >info at
> >> > > > > checkpoint.  Then at restart it can dump an audit message per
> >task and
> >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> >log reader
> >> > > > > can if it cares keep track.
> >> > > > 
> >> > > > This is the sort of idea I had in mind...
> >> > > 
> >> > > OK, but I don't understand then why you need a serial number. 
> >There are
> >> > > plenty of things we preserve across a migration, like namespace
> >name for
> >> > > instance.  Could you explain what function it performs because I
> >think I
> >> > > might be missing something.
> >> > 
> >> > We're looking ahead to a time when audit is namespaced, and a
> >container
> >> > can keep its own audit logs (without limiting what the host audits
> >of
> >> > course).  So if a container is auditing suspicious activity by some
> >> > task in a sub-namesapce, then the whole parent container gets
> >migrated,
> >> > after migration we want to continue being able to correlate the
> >namespaces.
> >> > 
> >> > We're also looking at audit trails on a host that is up for years. 
> >We
> >> > would like every namespace to be uniquely logged there.  That is
> >why
> >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> >generation
> >> > # (which would end more complicated, not less, than a serial #).
> >> 
> >> Right, but when the contaner has an audit namespace, that namespace
> >has
> >> a name,
> >
> >What ns has a name?
> 
> The netns for instance.
> 
> >  The audit ns can be tied to 50 pid namespaces, and
> >we
> >want to log which pidns is responsible for something.
> >
> >If you mean the pidns has a name, that's the problem...  it does not,
> >it
> >only has a inode # which may later be re-use.
> 
> I still think there's a miscommunication somewhere: I believe you just
> need a stable id to tie the audit to, so why not just give the audit
> namespace a name like net?  The id would then be durable across
> migrations.

Audit does not have its own namespace (yet).  That idea is being
considered, but we would prefer to avoid it if it makes sense to tie it
in with an existing namespace.  The pid and user namespaces, being
heierarchical seem to make the most sense so far, but we are proceeding
very carefully to avoid creating a security nightmare in the process.

>From the kernel's perspective, none of the namespaces have a name.  A
container concept of a group of namespaces may have been assigned one,
but that isn't apparent to the layer that is logging this information.

> >> which CRIU would migrate, so why not use that name for the
> >> log .. no need for numbers (unless you make the name a number, of
> >> course)?

There would certainly need to be a way to tie these namespace
identifiers to container names in log messages.

> >> James
> >
> >Sorry if I'm being

Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-06 Thread Richard Guy Briggs
On 14/05/05, Nicolas Dichtel wrote:
> Le 02/05/2014 16:28, Richard Guy Briggs a ?crit :
> >On 14/05/02, Serge E. Hallyn wrote:
> >>Quoting Richard Guy Briggs (r...@redhat.com):
> >>>I saw no replies to my questions when I replied a year after Aris' 
> >>>posting, so
> >>>I don't know if it was ignored or got lost in stale threads:
> >>> 
> >>> https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> >>> 
> >>> https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> >>>   
> >>> (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> >>> 
> >>> https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> >>>
> >>>I've tried to answer a number of questions that were raised in that thread.
> >>>
> >>>The goal is not quite identical to Aris' patchset.
> >>>
> >>>The purpose is to track namespaces in use by logged processes from the
> >>>perspective of init_*_ns.  The first patch defines a function to list them.
> >>>The second patch provides an example of usage for audit_log_task_info() 
> >>>which
> >>>is used by syscall audits, among others.  audit_log_task() and
> >>>audit_common_recv_message() would be other potential use cases.
> >>>
> >>>Use a serial number per namespace (unique across one boot of one kernel)
> >>>instead of the inode number (which is claimed to have had the right to 
> >>>change
> >>>reserved and is not necessarily unique if there is more than one proc fs). 
> >>> It
> >>>could be argued that the inode numbers have now become a defacto interface 
> >>>and
> >>>can't change now, but I'm proposing this approach to see if this helps 
> >>>address
> >>>some of the objections to the earlier patchset.
> >>>
> >>>There could also have messages added to track the creation and the 
> >>>destruction
> >>>of namespaces, listing the parent for hierarchical namespaces such as 
> >>>pidns,
> >>>userns, and listing other ids for non-hierarchical namespaces, as well as 
> >>>other
> >>>information to help identify a namespace.
> >>>
> >>>There has been some progress made for audit in net namespaces and pid
> >>>namespaces since this previous thread.  net namespaces are now served as 
> >>>peers
> >>>by one auditd in the init_net namespace with processes in a non-init_net
> >>>namespace being able to write records if they are in the init_user_ns and 
> >>>have
> >>>CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> >>>records.  As for CAP_AUDIT_READ, I just posted a patchset to check 
> >>>capabilities
> >>>of userspace processes that try to join netlink broadcast groups.
> >>>
> >>>
> >>>Questions:
> >>>Is there a way to link serial numbers of namespaces involved in migration 
> >>>of a
> >>>container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> >>>unique
> >>>identifier for each running instance of a kernel?  Or at least some 
> >>>identifier
> >>>within the container migration realm?
> >>
> >>Eric Biederman has always been adamantly opposed to adding new namespaces
> >>of namespaces, so the fact that you're asking this question concerns me.
> >
> >I have seen that position and I don't fully understand the justification
> >for it other than added complexity.
> Just FYI, have you seen this thread:
> http://thread.gmane.org/gmane.linux.network/286572/
> 
> There is some explanations/examples about this topic.

Thanks for that reference.  I read it through, but will need to do so
again to get it to sink in.

> Nicolas

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-06 Thread Serge Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> On Tue, 2014-05-06 at 03:27 +, Serge Hallyn wrote:
> > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > >> Right, but when the contaner has an audit namespace, that namespace
> > > >has
> > > >> a name,
> > > >
> > > >What ns has a name?
> > > 
> > > The netns for instance.
> > 
> > And what is its name?
> 
> As I think you know ip netns list will show you all of them.  The way

Ah.  Now I see, thanks :)  I never actually use that feature (other
than when debugging how mounts propagation affects how that's implemented)
which is why it completely did not occur to me that this might be what you
meant.

However these names are (a) not in the kernel, (b) not unique per-boot,
and (c) not applicable to other namespaces (without more userspace
tweaking).  So these are not a substitute for what Richard is proposing.

> they're applied is via mapped files in /var/run/netns/ which hold the
> names.
> 
> >   The only name I know that we could log in an
> > audit message is the /proc/self/ns/net inode number (which does not
> > suffice)
> 
> OK, so I think this is the confusion: You're thinking the container
> itself doesn't know what name the namespace has been given by the
> system, all it knows is the inode number corresponding to a file which
> it may or may not be able to see, right?  I'm thinking that the system
> that set up the container gave those files names and usually they're the
> same name for all the namespaces.  The point is that the orchestration
> system (whatever set up the container) will be responsible for the
> migration.  It will be the thing that has a unique handle for the
> container.

(Several things to reply to there but I'll pick just one,)

We are not looking for a unique name for a container, that's far too
coarse.  Within that container there may be many daemons which have
unshared their own namespaces, i.e. cgmanager unshared a mntns,
vsftpd unshared a netns, etc.  We want the namespace identified in
the audit messages.  We want, within an audit record for a system
boot, for each namespace to be *uniquely* identified.  I don't know
how many people are still doing capp/lspp type installs, but that's
the level I'm thinking at for this.  It's not syslog, it's audit.

> The handle is usually ascii representable, either a human
> readable name or some uuid/guid.  It's that handle that we should be
> using to prefix the audit message, so when you set up an audit
> namespace, it gets supplied with a prefix string corresponding to the
> well known name for the container.  This is the string we'd preserve
> across migration as part of the audit namespace state ... so the audit
> messages all correlate to the container wherever it's migrated to; no
> need to do complex tracking of changes to serial numbers.
> 
> > > >  The audit ns can be tied to 50 pid namespaces, and
> > > >we
> > > >want to log which pidns is responsible for something.
> > > >
> > > >If you mean the pidns has a name, that's the problem...  it does not,
> > > >it
> > > >only has a inode # which may later be re-use.
> > > 
> > > I still think there's a miscommunication somewhere: I believe you just 
> > > need a stable id to tie the audit to, so why not just give the audit 
> > > namespace a name like net?  The id would then be durable across 
> > > migrations.
> > 
> > Maybe this is where we're confusing each other - I'm not talking
> > about giving the audit ns a name.  I'm talking about being able to
> > identify the other namespaces inside an audit message.  In a way
> > that (a) is unique across bare metals' entire uptime, and (b)
> > can be tracked across migrations.
> 
> OK, so that is different from what I'm thinking.  I'm thinking unique
> name for migrateable entity, you want a unique name for each component
> of the migrateable entity?  My instinct still tells me the orchestration
> system is going to have a unique identifier for each different sub
> container.
> 
> However, I have to point out that a serial number isn't what you want
> either if you really mean bare metal.  We do a lot of deployments where
> the containers run in a hypervisor, there the serial numbers won't be
> unique per box (only per vm) and we'll have to do vm correlation
> separately.  whereas a scheme which allows the orchestration system to
> supply the names would still be unique in that situation.
> 
> James
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-06 Thread Richard Guy Briggs
On 14/05/06, Serge Hallyn wrote:
> Quoting Richard Guy Briggs (r...@redhat.com):
> > On 14/05/03, James Bottomley wrote:
> > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > Questions:
> > > > Is there a way to link serial numbers of namespaces involved in 
> > > > migration of a
> > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > > > unique
> > > > identifier for each running instance of a kernel?  Or at least some 
> > > > identifier
> > > > within the container migration realm?
> > > 
> > > Are you asking for a way of distinguishing an migrated container from an
> > > unmigrated one?  The answer is pretty much "no" because the job of
> > > migration is to restore to the same state as much as possible.
> > 
> > I hadn't thought to distinguish a migrated container from an unmigrated
> > one, but rather I'm more interested in the underlying namespaces.  The
> > use of a generation number to identify a migrated namespace may be
> > useful along with the logging to tie them together.
> > 
> > > Reading between the lines, I think your goal is to correlate audit
> > > information across a container migration, right?  Ideally the management
> > > system should be able to cough up an audit trail for a container
> > > wherever it's running and however many times it's been migrated?
> > 
> > The original intent was to track the underlying namespaces themselves.
> > This sounds like another layer on top of that which sounds useful but
> > that I had not yet considered.
> > 
> > But yes, that sounds like a good eventual goal.
> 
> Right and we don't need that now, all *I* wanted to convince myself of
> was that a serial # as you were using it was not going to be a roadlbock
> to that, since once we introduce a serial #, we're stuck with that as
> user-space facing api.

Understood.  If a container gets migrated somewhere along with its
namespace, the namespace elsewhere is going to have a new serial number,
but the migration log is going to hopefully show both serial numbers.
If that container gets migrated back, the supporting namespace will get
yet a new serial number, with its log trail connecting the previous
remote one.  Those logs can be used by a higher layer audit aggregator
to piece together those log crumbs.

The serial number was intended to be an alternative to the inode numbers
which had the issues of needing a qualifying device number accompanying
it, plus the reservation that that inode number could change in the
future to solve unforseen technical problems.  I saw no other stable
identifiers common to all namespace types with which I could work.

Containers may have their own names, but I didn't see any consistent way
to identify namespace instances.

> > > In that case, I think your idea of a numeric serial number in a dense
> > > range is wrong.  Because the range is dense you're obviously never going
> > > to be able to use the same serial number across a migration.  However,
> > > if you look at all the management systems for containers, they all have
> > > a concept of some unique ID per container, be it name, UUID or even
> > > GUID.  I suspect it's that you should be using to tag the audit trail
> > > with.
> > 
> > That does sound potentially useful but for the fact that several
> > containers could share one or more types of namespaces.
> > 
> > Would logging just a container ID be sufficient for audit purposes?  I'm
> > going to have to dig a bit to understand that one because I was unaware
> > each container had a unique ID.
> 
> They don't :)

Ok, so I'd be looking in vain...

> > I did originally consider a UUID/GUID for namespaces.
> 
> So I think that apart from resending to address the serial # overflow
> comment, I'm happy to ack the patches.  Then we probably need to convicne
> Eric that we're not torturing kittens.

I've already fixed the overflow issues.  I'll resend with the fixes.

This patch pair was intended to sort out some of my understanding of the
problem I perceived, and has helped me understand there are other layers
that need work too to make it useful, but this is a good base.

A subsequent piece would be to expose that serial number in the proc
filesystem.

> -serge

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-06 Thread Nicolas Dichtel

Le 06/05/2014 01:23, James Bottomley a écrit :



On May 5, 2014 3:36:38 PM PDT, Serge Hallyn  wrote:

Quoting James Bottomley (james.bottom...@hansenpartnership.com):

On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote:

Quoting James Bottomley (james.bottom...@hansenpartnership.com):

On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:

On 14/05/05, Serge E. Hallyn wrote:

Quoting James Bottomley

(james.bottom...@hansenpartnership.com):

On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs

wrote:

Questions:
Is there a way to link serial numbers of namespaces

involved in migration of a

container to another kernel?  (I had a brief look at

CRIU.)  Is there a unique

identifier for each running instance of a kernel?  Or at

least some identifier

within the container migration realm?


Are you asking for a way of distinguishing an migrated

container from an

unmigrated one?  The answer is pretty much "no" because the

job of

migration is to restore to the same state as much as

possible.


Reading between the lines, I think your goal is to

correlate audit

information across a container migration, right?  Ideally

the management

system should be able to cough up an audit trail for a

container

wherever it's running and however many times it's been

migrated?


In that case, I think your idea of a numeric serial number

in a dense

range is wrong.  Because the range is dense you're

obviously never going

to be able to use the same serial number across a

migration.  However,


Ah, but I was being silly before, we can actually address

this pretty

simply.  If we just (for instance) add
/proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the

serial number

for the relevant ns for the task, then criu can dump this

info at

checkpoint.  Then at restart it can dump an audit message per

task and

ns saying old_serial=%x,new_serial=%x.  That way the audit

log reader

can if it cares keep track.


This is the sort of idea I had in mind...


OK, but I don't understand then why you need a serial number.

There are

plenty of things we preserve across a migration, like namespace

name for

instance.  Could you explain what function it performs because I

think I

might be missing something.


We're looking ahead to a time when audit is namespaced, and a

container

can keep its own audit logs (without limiting what the host audits

of

course).  So if a container is auditing suspicious activity by some
task in a sub-namesapce, then the whole parent container gets

migrated,

after migration we want to continue being able to correlate the

namespaces.


We're also looking at audit trails on a host that is up for years.

We

would like every namespace to be uniquely logged there.  That is

why

inode #s on /proc/self/ns/* are not sufficient, unless we add a

generation

# (which would end more complicated, not less, than a serial #).


Right, but when the contaner has an audit namespace, that namespace

has

a name,


What ns has a name?


The netns for instance.

netns does not have names. iproute2 uses names (a filename in fact, to hold a
reference on the netns), but the kernel never got this name. It only get a file
descriptor (or a pid).


Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread James Bottomley
On Tue, 2014-05-06 at 03:27 +, Serge Hallyn wrote:
> Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > >> Right, but when the contaner has an audit namespace, that namespace
> > >has
> > >> a name,
> > >
> > >What ns has a name?
> > 
> > The netns for instance.
> 
> And what is its name?

As I think you know ip netns list will show you all of them.  The way
they're applied is via mapped files in /var/run/netns/ which hold the
names.

>   The only name I know that we could log in an
> audit message is the /proc/self/ns/net inode number (which does not
> suffice)

OK, so I think this is the confusion: You're thinking the container
itself doesn't know what name the namespace has been given by the
system, all it knows is the inode number corresponding to a file which
it may or may not be able to see, right?  I'm thinking that the system
that set up the container gave those files names and usually they're the
same name for all the namespaces.  The point is that the orchestration
system (whatever set up the container) will be responsible for the
migration.  It will be the thing that has a unique handle for the
container.  The handle is usually ascii representable, either a human
readable name or some uuid/guid.  It's that handle that we should be
using to prefix the audit message, so when you set up an audit
namespace, it gets supplied with a prefix string corresponding to the
well known name for the container.  This is the string we'd preserve
across migration as part of the audit namespace state ... so the audit
messages all correlate to the container wherever it's migrated to; no
need to do complex tracking of changes to serial numbers.

> > >  The audit ns can be tied to 50 pid namespaces, and
> > >we
> > >want to log which pidns is responsible for something.
> > >
> > >If you mean the pidns has a name, that's the problem...  it does not,
> > >it
> > >only has a inode # which may later be re-use.
> > 
> > I still think there's a miscommunication somewhere: I believe you just need 
> > a stable id to tie the audit to, so why not just give the audit namespace a 
> > name like net?  The id would then be durable across migrations.
> 
> Maybe this is where we're confusing each other - I'm not talking
> about giving the audit ns a name.  I'm talking about being able to
> identify the other namespaces inside an audit message.  In a way
> that (a) is unique across bare metals' entire uptime, and (b)
> can be tracked across migrations.

OK, so that is different from what I'm thinking.  I'm thinking unique
name for migrateable entity, you want a unique name for each component
of the migrateable entity?  My instinct still tells me the orchestration
system is going to have a unique identifier for each different sub
container.

However, I have to point out that a serial number isn't what you want
either if you really mean bare metal.  We do a lot of deployments where
the containers run in a hypervisor, there the serial numbers won't be
unique per box (only per vm) and we'll have to do vm correlation
separately.  whereas a scheme which allows the orchestration system to
supply the names would still be unique in that situation.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Serge Hallyn
Quoting Richard Guy Briggs (r...@redhat.com):
> On 14/05/03, James Bottomley wrote:
> > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration 
> > > of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > > unique
> > > identifier for each running instance of a kernel?  Or at least some 
> > > identifier
> > > within the container migration realm?
> > 
> > Are you asking for a way of distinguishing an migrated container from an
> > unmigrated one?  The answer is pretty much "no" because the job of
> > migration is to restore to the same state as much as possible.
> 
> I hadn't thought to distinguish a migrated container from an unmigrated
> one, but rather I'm more interested in the underlying namespaces.  The
> use of a generation number to identify a migrated namespace may be
> useful along with the logging to tie them together.
> 
> > Reading between the lines, I think your goal is to correlate audit
> > information across a container migration, right?  Ideally the management
> > system should be able to cough up an audit trail for a container
> > wherever it's running and however many times it's been migrated?
> 
> The original intent was to track the underlying namespaces themselves.
> This sounds like another layer on top of that which sounds useful but
> that I had not yet considered.
> 
> But yes, that sounds like a good eventual goal.

Right and we don't need that now, all *I* wanted to convince myself of
was that a serial # as you were using it was not going to be a roadlbock
to that, since once we introduce a serial #, we're stuck with that as
user-space facing api.

> > In that case, I think your idea of a numeric serial number in a dense
> > range is wrong.  Because the range is dense you're obviously never going
> > to be able to use the same serial number across a migration.  However,
> > if you look at all the management systems for containers, they all have
> > a concept of some unique ID per container, be it name, UUID or even
> > GUID.  I suspect it's that you should be using to tag the audit trail
> > with.
> 
> That does sound potentially useful but for the fact that several
> containers could share one or more types of namespaces.
> 
> Would logging just a container ID be sufficient for audit purposes?  I'm
> going to have to dig a bit to understand that one because I was unaware
> each container had a unique ID.

They don't :)

> I did originally consider a UUID/GUID for namespaces.

So I think that apart from resending to address the serial # overflow
comment, I'm happy to ack the patches.  Then we probably need to convicne
Eric that we're not torturing kittens.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Serge Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> 
> 
> On May 5, 2014 3:36:38 PM PDT, Serge Hallyn  wrote:
> >Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> >> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote:
> >> > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> >> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> >> > > > On 14/05/05, Serge E. Hallyn wrote:
> >> > > > > Quoting James Bottomley
> >(james.bottom...@hansenpartnership.com):
> >> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
> >wrote:
> >> > > > > > > Questions:
> >> > > > > > > Is there a way to link serial numbers of namespaces
> >involved in migration of a
> >> > > > > > > container to another kernel?  (I had a brief look at
> >CRIU.)  Is there a unique
> >> > > > > > > identifier for each running instance of a kernel?  Or at
> >least some identifier
> >> > > > > > > within the container migration realm?
> >> > > > > > 
> >> > > > > > Are you asking for a way of distinguishing an migrated
> >container from an
> >> > > > > > unmigrated one?  The answer is pretty much "no" because the
> >job of
> >> > > > > > migration is to restore to the same state as much as
> >possible.
> >> > > > > > 
> >> > > > > > Reading between the lines, I think your goal is to
> >correlate audit
> >> > > > > > information across a container migration, right?  Ideally
> >the management
> >> > > > > > system should be able to cough up an audit trail for a
> >container
> >> > > > > > wherever it's running and however many times it's been
> >migrated?
> >> > > > > > 
> >> > > > > > In that case, I think your idea of a numeric serial number
> >in a dense
> >> > > > > > range is wrong.  Because the range is dense you're
> >obviously never going
> >> > > > > > to be able to use the same serial number across a
> >migration.  However,
> >> > > > > 
> >> > > > > Ah, but I was being silly before, we can actually address
> >this pretty
> >> > > > > simply.  If we just (for instance) add
> >> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
> >serial number
> >> > > > > for the relevant ns for the task, then criu can dump this
> >info at
> >> > > > > checkpoint.  Then at restart it can dump an audit message per
> >task and
> >> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
> >log reader
> >> > > > > can if it cares keep track.
> >> > > > 
> >> > > > This is the sort of idea I had in mind...
> >> > > 
> >> > > OK, but I don't understand then why you need a serial number. 
> >There are
> >> > > plenty of things we preserve across a migration, like namespace
> >name for
> >> > > instance.  Could you explain what function it performs because I
> >think I
> >> > > might be missing something.
> >> > 
> >> > We're looking ahead to a time when audit is namespaced, and a
> >container
> >> > can keep its own audit logs (without limiting what the host audits
> >of
> >> > course).  So if a container is auditing suspicious activity by some
> >> > task in a sub-namesapce, then the whole parent container gets
> >migrated,
> >> > after migration we want to continue being able to correlate the
> >namespaces.
> >> > 
> >> > We're also looking at audit trails on a host that is up for years. 
> >We
> >> > would like every namespace to be uniquely logged there.  That is
> >why
> >> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
> >generation
> >> > # (which would end more complicated, not less, than a serial #).
> >> 
> >> Right, but when the contaner has an audit namespace, that namespace
> >has
> >> a name,
> >
> >What ns has a name?
> 
> The netns for instance.

And what is its name?  The only name I know that we could log in an
audit message is the /proc/self/ns/net inode number (which does not
suffice)

> >  The audit ns can be tied to 50 pid namespaces, and
> >we
> >want to log which pidns is responsible for something.
> >
> >If you mean the pidns has a name, that's the problem...  it does not,
> >it
> >only has a inode # which may later be re-use.
> 
> I still think there's a miscommunication somewhere: I believe you just need a 
> stable id to tie the audit to, so why not just give the audit namespace a 
> name like net?  The id would then be durable across migrations.

Maybe this is where we're confusing each other - I'm not talking
about giving the audit ns a name.  I'm talking about being able to
identify the other namespaces inside an audit message.  In a way
that (a) is unique across bare metals' entire uptime, and (b)
can be tracked across migrations.

And again we don't need to actually implement all that now - all
I wanted to make sure of was that the serial # as proposed by Richard
could be made to work for those purposes, and I now believe they can.

> >> which CRIU would migrate, so why not use that name for the
> >> log .. no need for numbers (unless you make the name a number, of
> >> course)?
> >> 
> >> James
> >
> >Sorry 

Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread James Bottomley


On May 5, 2014 3:36:38 PM PDT, Serge Hallyn  wrote:
>Quoting James Bottomley (james.bottom...@hansenpartnership.com):
>> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote:
>> > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
>> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
>> > > > On 14/05/05, Serge E. Hallyn wrote:
>> > > > > Quoting James Bottomley
>(james.bottom...@hansenpartnership.com):
>> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs
>wrote:
>> > > > > > > Questions:
>> > > > > > > Is there a way to link serial numbers of namespaces
>involved in migration of a
>> > > > > > > container to another kernel?  (I had a brief look at
>CRIU.)  Is there a unique
>> > > > > > > identifier for each running instance of a kernel?  Or at
>least some identifier
>> > > > > > > within the container migration realm?
>> > > > > > 
>> > > > > > Are you asking for a way of distinguishing an migrated
>container from an
>> > > > > > unmigrated one?  The answer is pretty much "no" because the
>job of
>> > > > > > migration is to restore to the same state as much as
>possible.
>> > > > > > 
>> > > > > > Reading between the lines, I think your goal is to
>correlate audit
>> > > > > > information across a container migration, right?  Ideally
>the management
>> > > > > > system should be able to cough up an audit trail for a
>container
>> > > > > > wherever it's running and however many times it's been
>migrated?
>> > > > > > 
>> > > > > > In that case, I think your idea of a numeric serial number
>in a dense
>> > > > > > range is wrong.  Because the range is dense you're
>obviously never going
>> > > > > > to be able to use the same serial number across a
>migration.  However,
>> > > > > 
>> > > > > Ah, but I was being silly before, we can actually address
>this pretty
>> > > > > simply.  If we just (for instance) add
>> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the
>serial number
>> > > > > for the relevant ns for the task, then criu can dump this
>info at
>> > > > > checkpoint.  Then at restart it can dump an audit message per
>task and
>> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit
>log reader
>> > > > > can if it cares keep track.
>> > > > 
>> > > > This is the sort of idea I had in mind...
>> > > 
>> > > OK, but I don't understand then why you need a serial number. 
>There are
>> > > plenty of things we preserve across a migration, like namespace
>name for
>> > > instance.  Could you explain what function it performs because I
>think I
>> > > might be missing something.
>> > 
>> > We're looking ahead to a time when audit is namespaced, and a
>container
>> > can keep its own audit logs (without limiting what the host audits
>of
>> > course).  So if a container is auditing suspicious activity by some
>> > task in a sub-namesapce, then the whole parent container gets
>migrated,
>> > after migration we want to continue being able to correlate the
>namespaces.
>> > 
>> > We're also looking at audit trails on a host that is up for years. 
>We
>> > would like every namespace to be uniquely logged there.  That is
>why
>> > inode #s on /proc/self/ns/* are not sufficient, unless we add a
>generation
>> > # (which would end more complicated, not less, than a serial #).
>> 
>> Right, but when the contaner has an audit namespace, that namespace
>has
>> a name,
>
>What ns has a name?

The netns for instance.

>  The audit ns can be tied to 50 pid namespaces, and
>we
>want to log which pidns is responsible for something.
>
>If you mean the pidns has a name, that's the problem...  it does not,
>it
>only has a inode # which may later be re-use.

I still think there's a miscommunication somewhere: I believe you just need a 
stable id to tie the audit to, so why not just give the audit namespace a name 
like net?  The id would then be durable across migrations.

>> which CRIU would migrate, so why not use that name for the
>> log .. no need for numbers (unless you make the name a number, of
>> course)?
>> 
>> James
>
>Sorry if I'm being dense...

No I think our assumptions are mismatched. I just can't figure out where.

James

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Serge Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote:
> > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > > > On 14/05/05, Serge E. Hallyn wrote:
> > > > > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > > > Questions:
> > > > > > > Is there a way to link serial numbers of namespaces involved in 
> > > > > > > migration of a
> > > > > > > container to another kernel?  (I had a brief look at CRIU.)  Is 
> > > > > > > there a unique
> > > > > > > identifier for each running instance of a kernel?  Or at least 
> > > > > > > some identifier
> > > > > > > within the container migration realm?
> > > > > > 
> > > > > > Are you asking for a way of distinguishing an migrated container 
> > > > > > from an
> > > > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > > > migration is to restore to the same state as much as possible.
> > > > > > 
> > > > > > Reading between the lines, I think your goal is to correlate audit
> > > > > > information across a container migration, right?  Ideally the 
> > > > > > management
> > > > > > system should be able to cough up an audit trail for a container
> > > > > > wherever it's running and however many times it's been migrated?
> > > > > > 
> > > > > > In that case, I think your idea of a numeric serial number in a 
> > > > > > dense
> > > > > > range is wrong.  Because the range is dense you're obviously never 
> > > > > > going
> > > > > > to be able to use the same serial number across a migration.  
> > > > > > However,
> > > > > 
> > > > > Ah, but I was being silly before, we can actually address this pretty
> > > > > simply.  If we just (for instance) add
> > > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial 
> > > > > number
> > > > > for the relevant ns for the task, then criu can dump this info at
> > > > > checkpoint.  Then at restart it can dump an audit message per task and
> > > > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > > > can if it cares keep track.
> > > > 
> > > > This is the sort of idea I had in mind...
> > > 
> > > OK, but I don't understand then why you need a serial number.  There are
> > > plenty of things we preserve across a migration, like namespace name for
> > > instance.  Could you explain what function it performs because I think I
> > > might be missing something.
> > 
> > We're looking ahead to a time when audit is namespaced, and a container
> > can keep its own audit logs (without limiting what the host audits of
> > course).  So if a container is auditing suspicious activity by some
> > task in a sub-namesapce, then the whole parent container gets migrated,
> > after migration we want to continue being able to correlate the namespaces.
> > 
> > We're also looking at audit trails on a host that is up for years.  We
> > would like every namespace to be uniquely logged there.  That is why
> > inode #s on /proc/self/ns/* are not sufficient, unless we add a generation
> > # (which would end more complicated, not less, than a serial #).
> 
> Right, but when the contaner has an audit namespace, that namespace has
> a name,

What ns has a name?  The audit ns can be tied to 50 pid namespaces, and we
want to log which pidns is responsible for something.

If you mean the pidns has a name, that's the problem...  it does not, it
only has a inode # which may later be re-use.

> which CRIU would migrate, so why not use that name for the
> log .. no need for numbers (unless you make the name a number, of
> course)?
> 
> James

Sorry if I'm being dense...

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread James Bottomley
On Mon, 2014-05-05 at 22:27 +, Serge Hallyn wrote:
> Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > > On 14/05/05, Serge E. Hallyn wrote:
> > > > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > > Questions:
> > > > > > Is there a way to link serial numbers of namespaces involved in 
> > > > > > migration of a
> > > > > > container to another kernel?  (I had a brief look at CRIU.)  Is 
> > > > > > there a unique
> > > > > > identifier for each running instance of a kernel?  Or at least some 
> > > > > > identifier
> > > > > > within the container migration realm?
> > > > > 
> > > > > Are you asking for a way of distinguishing an migrated container from 
> > > > > an
> > > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > > migration is to restore to the same state as much as possible.
> > > > > 
> > > > > Reading between the lines, I think your goal is to correlate audit
> > > > > information across a container migration, right?  Ideally the 
> > > > > management
> > > > > system should be able to cough up an audit trail for a container
> > > > > wherever it's running and however many times it's been migrated?
> > > > > 
> > > > > In that case, I think your idea of a numeric serial number in a dense
> > > > > range is wrong.  Because the range is dense you're obviously never 
> > > > > going
> > > > > to be able to use the same serial number across a migration.  However,
> > > > 
> > > > Ah, but I was being silly before, we can actually address this pretty
> > > > simply.  If we just (for instance) add
> > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > > for the relevant ns for the task, then criu can dump this info at
> > > > checkpoint.  Then at restart it can dump an audit message per task and
> > > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > > can if it cares keep track.
> > > 
> > > This is the sort of idea I had in mind...
> > 
> > OK, but I don't understand then why you need a serial number.  There are
> > plenty of things we preserve across a migration, like namespace name for
> > instance.  Could you explain what function it performs because I think I
> > might be missing something.
> 
> We're looking ahead to a time when audit is namespaced, and a container
> can keep its own audit logs (without limiting what the host audits of
> course).  So if a container is auditing suspicious activity by some
> task in a sub-namesapce, then the whole parent container gets migrated,
> after migration we want to continue being able to correlate the namespaces.
> 
> We're also looking at audit trails on a host that is up for years.  We
> would like every namespace to be uniquely logged there.  That is why
> inode #s on /proc/self/ns/* are not sufficient, unless we add a generation
> # (which would end more complicated, not less, than a serial #).

Right, but when the contaner has an audit namespace, that namespace has
a name, which CRIU would migrate, so why not use that name for the
log .. no need for numbers (unless you make the name a number, of
course)?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Serge Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > On 14/05/05, Serge E. Hallyn wrote:
> > > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > Questions:
> > > > > Is there a way to link serial numbers of namespaces involved in 
> > > > > migration of a
> > > > > container to another kernel?  (I had a brief look at CRIU.)  Is there 
> > > > > a unique
> > > > > identifier for each running instance of a kernel?  Or at least some 
> > > > > identifier
> > > > > within the container migration realm?
> > > > 
> > > > Are you asking for a way of distinguishing an migrated container from an
> > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > migration is to restore to the same state as much as possible.
> > > > 
> > > > Reading between the lines, I think your goal is to correlate audit
> > > > information across a container migration, right?  Ideally the management
> > > > system should be able to cough up an audit trail for a container
> > > > wherever it's running and however many times it's been migrated?
> > > > 
> > > > In that case, I think your idea of a numeric serial number in a dense
> > > > range is wrong.  Because the range is dense you're obviously never going
> > > > to be able to use the same serial number across a migration.  However,
> > > 
> > > Ah, but I was being silly before, we can actually address this pretty
> > > simply.  If we just (for instance) add
> > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > for the relevant ns for the task, then criu can dump this info at
> > > checkpoint.  Then at restart it can dump an audit message per task and
> > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > can if it cares keep track.
> > 
> > This is the sort of idea I had in mind...
> 
> OK, but I don't understand then why you need a serial number.  There are
> plenty of things we preserve across a migration, like namespace name for
> instance.  Could you explain what function it performs because I think I
> might be missing something.

We're looking ahead to a time when audit is namespaced, and a container
can keep its own audit logs (without limiting what the host audits of
course).  So if a container is auditing suspicious activity by some
task in a sub-namesapce, then the whole parent container gets migrated,
after migration we want to continue being able to correlate the namespaces.

We're also looking at audit trails on a host that is up for years.  We
would like every namespace to be uniquely logged there.  That is why
inode #s on /proc/self/ns/* are not sufficient, unless we add a generation
# (which would end more complicated, not less, than a serial #).

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread James Bottomley
On Mon, 2014-05-05 at 18:11 -0400, Richard Guy Briggs wrote:
> On 14/05/05, James Bottomley wrote:
> > On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > > On 14/05/05, Serge E. Hallyn wrote:
> > > > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > > Questions:
> > > > > > Is there a way to link serial numbers of namespaces involved in 
> > > > > > migration of a
> > > > > > container to another kernel?  (I had a brief look at CRIU.)  Is 
> > > > > > there a unique
> > > > > > identifier for each running instance of a kernel?  Or at least some 
> > > > > > identifier
> > > > > > within the container migration realm?
> > > > > 
> > > > > Are you asking for a way of distinguishing an migrated container from 
> > > > > an
> > > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > > migration is to restore to the same state as much as possible.
> > > > > 
> > > > > Reading between the lines, I think your goal is to correlate audit
> > > > > information across a container migration, right?  Ideally the 
> > > > > management
> > > > > system should be able to cough up an audit trail for a container
> > > > > wherever it's running and however many times it's been migrated?
> > > > > 
> > > > > In that case, I think your idea of a numeric serial number in a dense
> > > > > range is wrong.  Because the range is dense you're obviously never 
> > > > > going
> > > > > to be able to use the same serial number across a migration.  However,
> > > > 
> > > > Ah, but I was being silly before, we can actually address this pretty
> > > > simply.  If we just (for instance) add
> > > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > > for the relevant ns for the task, then criu can dump this info at
> > > > checkpoint.  Then at restart it can dump an audit message per task and
> > > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > > can if it cares keep track.
> > > 
> > > This is the sort of idea I had in mind...
> > 
> > OK, but I don't understand then why you need a serial number.  There are
> > plenty of things we preserve across a migration, like namespace name for
> > instance.  Could you explain what function it performs because I think I
> > might be missing something.
> 
> If a container was defined as an entity with 6 namespaces to itself,
> this would make sense.  As Eric P. put it, containers/namespaces seem to
> be a bucket of semi-related nuts and bolts, with any namespace being
> optional depending on the application.

That's right.  An IaaS container has a well defined composition, since
it has to contain a full OS, but an application container is variable.

It's the usual procedure with container management systems to have one
name for the container and give this name to all the namespaces, but I
agree, it doesn't have to.

>   My understanding is a
> container could be migrated to another host requiring the creation of
> (none,) some or all of its namespaces, potentially leaving behind some
> of its shared namespaces and/or clashing names of namespaces on the
> destination host.

Well, no, the environment gets migrated as well so when the migration is
over, the namespaces the migrated entity is in will look the same as
before the migration ... if they didn't exist on the recipient, they'll
be created.  If a namespace already exists the restore fails ... this is
because we support the usual container case where you're migrating to a
disjoint set of namespaces.

Even if there were some reason for supporting shared namespaces, the
fundamental invariant is still the namespace names (i.e. the namespaces
have the same names before and after migration), so how does a serial
number help?

James

> > James
> > 
> > > > -serge
> > > > 
> > > > (Another, more heavyweight approach would be to track all ns hierarchies
> > > > and make the serial numbers per-namespace-instance.  So my container's
> > > > pidns serial might be 0x2, and if it clones a new pidns that would be
> > > > "(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
> > > > need that if the simple userspace approach suffices)
> > > 
> > > This sounds manageable...
> > > 
> > > - RGB
> > > 
> > > --
> > > Richard Guy Briggs 
> > > Senior Software Engineer, Kernel Security, AMER ENG Base Operating 
> > > Systems, Red Hat
> > > Remote, Ottawa, Canada
> > > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
> > 
> > 
> > 
> 
> - RGB
> 
> --
> Richard Guy Briggs 
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, 
> Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/maj

Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Richard Guy Briggs
On 14/05/05, James Bottomley wrote:
> On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> > On 14/05/05, Serge E. Hallyn wrote:
> > > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > > Questions:
> > > > > Is there a way to link serial numbers of namespaces involved in 
> > > > > migration of a
> > > > > container to another kernel?  (I had a brief look at CRIU.)  Is there 
> > > > > a unique
> > > > > identifier for each running instance of a kernel?  Or at least some 
> > > > > identifier
> > > > > within the container migration realm?
> > > > 
> > > > Are you asking for a way of distinguishing an migrated container from an
> > > > unmigrated one?  The answer is pretty much "no" because the job of
> > > > migration is to restore to the same state as much as possible.
> > > > 
> > > > Reading between the lines, I think your goal is to correlate audit
> > > > information across a container migration, right?  Ideally the management
> > > > system should be able to cough up an audit trail for a container
> > > > wherever it's running and however many times it's been migrated?
> > > > 
> > > > In that case, I think your idea of a numeric serial number in a dense
> > > > range is wrong.  Because the range is dense you're obviously never going
> > > > to be able to use the same serial number across a migration.  However,
> > > 
> > > Ah, but I was being silly before, we can actually address this pretty
> > > simply.  If we just (for instance) add
> > > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > > for the relevant ns for the task, then criu can dump this info at
> > > checkpoint.  Then at restart it can dump an audit message per task and
> > > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > > can if it cares keep track.
> > 
> > This is the sort of idea I had in mind...
> 
> OK, but I don't understand then why you need a serial number.  There are
> plenty of things we preserve across a migration, like namespace name for
> instance.  Could you explain what function it performs because I think I
> might be missing something.

If a container was defined as an entity with 6 namespaces to itself,
this would make sense.  As Eric P. put it, containers/namespaces seem to
be a bucket of semi-related nuts and bolts, with any namespace being
optional depending on the application.  My understanding is a
container could be migrated to another host requiring the creation of
(none,) some or all of its namespaces, potentially leaving behind some
of its shared namespaces and/or clashing names of namespaces on the
destination host.

> James
> 
> > > -serge
> > > 
> > > (Another, more heavyweight approach would be to track all ns hierarchies
> > > and make the serial numbers per-namespace-instance.  So my container's
> > > pidns serial might be 0x2, and if it clones a new pidns that would be
> > > "(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
> > > need that if the simple userspace approach suffices)
> > 
> > This sounds manageable...
> > 
> > - RGB
> > 
> > --
> > Richard Guy Briggs 
> > Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, 
> > Red Hat
> > Remote, Ottawa, Canada
> > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
> 
> 
> 

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread James Bottomley
On Mon, 2014-05-05 at 17:48 -0400, Richard Guy Briggs wrote:
> On 14/05/05, Serge E. Hallyn wrote:
> > Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > > Questions:
> > > > Is there a way to link serial numbers of namespaces involved in 
> > > > migration of a
> > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > > > unique
> > > > identifier for each running instance of a kernel?  Or at least some 
> > > > identifier
> > > > within the container migration realm?
> > > 
> > > Are you asking for a way of distinguishing an migrated container from an
> > > unmigrated one?  The answer is pretty much "no" because the job of
> > > migration is to restore to the same state as much as possible.
> > > 
> > > Reading between the lines, I think your goal is to correlate audit
> > > information across a container migration, right?  Ideally the management
> > > system should be able to cough up an audit trail for a container
> > > wherever it's running and however many times it's been migrated?
> > > 
> > > In that case, I think your idea of a numeric serial number in a dense
> > > range is wrong.  Because the range is dense you're obviously never going
> > > to be able to use the same serial number across a migration.  However,
> > 
> > Ah, but I was being silly before, we can actually address this pretty
> > simply.  If we just (for instance) add
> > /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> > for the relevant ns for the task, then criu can dump this info at
> > checkpoint.  Then at restart it can dump an audit message per task and
> > ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> > can if it cares keep track.
> 
> This is the sort of idea I had in mind...

OK, but I don't understand then why you need a serial number.  There are
plenty of things we preserve across a migration, like namespace name for
instance.  Could you explain what function it performs because I think I
might be missing something.

Thanks,

James


> > -serge
> > 
> > (Another, more heavyweight approach would be to track all ns hierarchies
> > and make the serial numbers per-namespace-instance.  So my container's
> > pidns serial might be 0x2, and if it clones a new pidns that would be
> > "(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
> > need that if the simple userspace approach suffices)
> 
> This sounds manageable...
> 
> - RGB
> 
> --
> Richard Guy Briggs 
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, 
> Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Richard Guy Briggs
On 14/05/05, Serge E. Hallyn wrote:
> Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration 
> > > of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > > unique
> > > identifier for each running instance of a kernel?  Or at least some 
> > > identifier
> > > within the container migration realm?
> > 
> > Are you asking for a way of distinguishing an migrated container from an
> > unmigrated one?  The answer is pretty much "no" because the job of
> > migration is to restore to the same state as much as possible.
> > 
> > Reading between the lines, I think your goal is to correlate audit
> > information across a container migration, right?  Ideally the management
> > system should be able to cough up an audit trail for a container
> > wherever it's running and however many times it's been migrated?
> > 
> > In that case, I think your idea of a numeric serial number in a dense
> > range is wrong.  Because the range is dense you're obviously never going
> > to be able to use the same serial number across a migration.  However,
> 
> Ah, but I was being silly before, we can actually address this pretty
> simply.  If we just (for instance) add
> /proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
> for the relevant ns for the task, then criu can dump this info at
> checkpoint.  Then at restart it can dump an audit message per task and
> ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
> can if it cares keep track.

This is the sort of idea I had in mind...

> -serge
> 
> (Another, more heavyweight approach would be to track all ns hierarchies
> and make the serial numbers per-namespace-instance.  So my container's
> pidns serial might be 0x2, and if it clones a new pidns that would be
> "(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
> need that if the simple userspace approach suffices)

This sounds manageable...

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Richard Guy Briggs
On 14/05/03, James Bottomley wrote:
> On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > Questions:
> > Is there a way to link serial numbers of namespaces involved in migration 
> > of a
> > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > unique
> > identifier for each running instance of a kernel?  Or at least some 
> > identifier
> > within the container migration realm?
> 
> Are you asking for a way of distinguishing an migrated container from an
> unmigrated one?  The answer is pretty much "no" because the job of
> migration is to restore to the same state as much as possible.

I hadn't thought to distinguish a migrated container from an unmigrated
one, but rather I'm more interested in the underlying namespaces.  The
use of a generation number to identify a migrated namespace may be
useful along with the logging to tie them together.

> Reading between the lines, I think your goal is to correlate audit
> information across a container migration, right?  Ideally the management
> system should be able to cough up an audit trail for a container
> wherever it's running and however many times it's been migrated?

The original intent was to track the underlying namespaces themselves.
This sounds like another layer on top of that which sounds useful but
that I had not yet considered.

But yes, that sounds like a good eventual goal.

> In that case, I think your idea of a numeric serial number in a dense
> range is wrong.  Because the range is dense you're obviously never going
> to be able to use the same serial number across a migration.  However,
> if you look at all the management systems for containers, they all have
> a concept of some unique ID per container, be it name, UUID or even
> GUID.  I suspect it's that you should be using to tag the audit trail
> with.

That does sound potentially useful but for the fact that several
containers could share one or more types of namespaces.

Would logging just a container ID be sufficient for audit purposes?  I'm
going to have to dig a bit to understand that one because I was unaware
each container had a unique ID.

I did originally consider a UUID/GUID for namespaces.

> James

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Richard Guy Briggs
On 14/05/02, Serge Hallyn wrote:
> Quoting Richard Guy Briggs (r...@redhat.com):
> > On 14/05/02, Serge E. Hallyn wrote:
> > > Quoting Richard Guy Briggs (r...@redhat.com):
> > > > I saw no replies to my questions when I replied a year after Aris' 
> > > > posting, so
> > > > I don't know if it was ignored or got lost in stale threads:
> > > > 
> > > > https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> > > > 
> > > > https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > > > 
> > > > (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> > > > 
> > > > https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > > > 
> > > > I've tried to answer a number of questions that were raised in that 
> > > > thread.
> > > > 
> > > > The goal is not quite identical to Aris' patchset.
> > > > 
> > > > The purpose is to track namespaces in use by logged processes from the
> > > > perspective of init_*_ns.  The first patch defines a function to list 
> > > > them.
> > > > The second patch provides an example of usage for audit_log_task_info() 
> > > > which
> > > > is used by syscall audits, among others.  audit_log_task() and
> > > > audit_common_recv_message() would be other potential use cases.
> > > > 
> > > > Use a serial number per namespace (unique across one boot of one kernel)
> > > > instead of the inode number (which is claimed to have had the right to 
> > > > change
> > > > reserved and is not necessarily unique if there is more than one proc 
> > > > fs).  It
> > > > could be argued that the inode numbers have now become a defacto 
> > > > interface and
> > > > can't change now, but I'm proposing this approach to see if this helps 
> > > > address
> > > > some of the objections to the earlier patchset.
> > > > 
> > > > There could also have messages added to track the creation and the 
> > > > destruction
> > > > of namespaces, listing the parent for hierarchical namespaces such as 
> > > > pidns,
> > > > userns, and listing other ids for non-hierarchical namespaces, as well 
> > > > as other
> > > > information to help identify a namespace.
> > > > 
> > > > There has been some progress made for audit in net namespaces and pid
> > > > namespaces since this previous thread.  net namespaces are now served 
> > > > as peers
> > > > by one auditd in the init_net namespace with processes in a non-init_net
> > > > namespace being able to write records if they are in the init_user_ns 
> > > > and have
> > > > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > > > records.  As for CAP_AUDIT_READ, I just posted a patchset to check 
> > > > capabilities
> > > > of userspace processes that try to join netlink broadcast groups.
> > > > 
> > > > 
> > > > Questions:
> > > > Is there a way to link serial numbers of namespaces involved in 
> > > > migration of a
> > > > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > > > unique
> > > > identifier for each running instance of a kernel?  Or at least some 
> > > > identifier
> > > > within the container migration realm?
> > > 
> > > Eric Biederman has always been adamantly opposed to adding new namespaces
> > > of namespaces, so the fact that you're asking this question concerns me.
> > 
> > I have seen that position and I don't fully understand the justification
> > for it other than added complexity.
> > 
> > One way that occured to me to be able to identify a kernel instance was
> > to look at CPU serial numbers or other CPU entity intended to be
> > globally unique, but that isn't universally available.
> 
> That's one issue, which is uniqueness of namespaces cross-machines.
> 
> But it gets worse if we consider that after allowing in-container audit,
> we'll have a nested container running, then have the parent container
> migrated to another host (or just checkpointed and restarted);  Now the
> nexted container's indexes will all be changed.  Is there any way audit
> can track who's who after the migration?

Presumably the namespace serial numbers before and after would be logged
in one message to tie them together.

> That's not an indictment of the serial # approach, since (a) we don't
> have in-container audit yet and (b) we don't have c/r/migration of nested
> containers.  But it's worth considering whether we can solve the issue
> with serial #s, and, if not, whether we can solve it with any other
> approach.
> 
> I guess one approach to solve it would be to allow userspace to request
> a next serial #.  Which will immediately lead us to a namespace of serial
> #s (since the requested # might be lower than the last used one on the
> new host).

:P

> As you've said inode #s for /proc/self/ns/* probably aren't sufficiently
> unique, though perhaps we could attach a generation # for the sake of
> audit.  Then after a c/r/migration the generation # may be different,
> but we may have a better shot at 

Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-05 Thread Nicolas Dichtel

Le 02/05/2014 16:28, Richard Guy Briggs a écrit :

On 14/05/02, Serge E. Hallyn wrote:

Quoting Richard Guy Briggs (r...@redhat.com):

I saw no replies to my questions when I replied a year after Aris' posting, so
I don't know if it was ignored or got lost in stale threads:
 https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
 https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html

(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
 https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html

I've tried to answer a number of questions that were raised in that thread.

The goal is not quite identical to Aris' patchset.

The purpose is to track namespaces in use by logged processes from the
perspective of init_*_ns.  The first patch defines a function to list them.
The second patch provides an example of usage for audit_log_task_info() which
is used by syscall audits, among others.  audit_log_task() and
audit_common_recv_message() would be other potential use cases.

Use a serial number per namespace (unique across one boot of one kernel)
instead of the inode number (which is claimed to have had the right to change
reserved and is not necessarily unique if there is more than one proc fs).  It
could be argued that the inode numbers have now become a defacto interface and
can't change now, but I'm proposing this approach to see if this helps address
some of the objections to the earlier patchset.

There could also have messages added to track the creation and the destruction
of namespaces, listing the parent for hierarchical namespaces such as pidns,
userns, and listing other ids for non-hierarchical namespaces, as well as other
information to help identify a namespace.

There has been some progress made for audit in net namespaces and pid
namespaces since this previous thread.  net namespaces are now served as peers
by one auditd in the init_net namespace with processes in a non-init_net
namespace being able to write records if they are in the init_user_ns and have
CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
of userspace processes that try to join netlink broadcast groups.


Questions:
Is there a way to link serial numbers of namespaces involved in migration of a
container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
identifier for each running instance of a kernel?  Or at least some identifier
within the container migration realm?


Eric Biederman has always been adamantly opposed to adding new namespaces
of namespaces, so the fact that you're asking this question concerns me.


I have seen that position and I don't fully understand the justification
for it other than added complexity.

Just FYI, have you seen this thread:
http://thread.gmane.org/gmane.linux.network/286572/

There is some explanations/examples about this topic.


Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-04 Thread Serge E. Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> > Questions:
> > Is there a way to link serial numbers of namespaces involved in migration 
> > of a
> > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > unique
> > identifier for each running instance of a kernel?  Or at least some 
> > identifier
> > within the container migration realm?
> 
> Are you asking for a way of distinguishing an migrated container from an
> unmigrated one?  The answer is pretty much "no" because the job of
> migration is to restore to the same state as much as possible.
> 
> Reading between the lines, I think your goal is to correlate audit
> information across a container migration, right?  Ideally the management
> system should be able to cough up an audit trail for a container
> wherever it's running and however many times it's been migrated?
> 
> In that case, I think your idea of a numeric serial number in a dense
> range is wrong.  Because the range is dense you're obviously never going
> to be able to use the same serial number across a migration.  However,

Ah, but I was being silly before, we can actually address this pretty
simply.  If we just (for instance) add
/proc/self/ns/{ic,mnt,net,pid,user,uts}_seq containing the serial number
for the relevant ns for the task, then criu can dump this info at
checkpoint.  Then at restart it can dump an audit message per task and
ns saying old_serial=%x,new_serial=%x.  That way the audit log reader
can if it cares keep track.

-serge

(Another, more heavyweight approach would be to track all ns hierarchies
and make the serial numbers per-namespace-instance.  So my container's
pidns serial might be 0x2, and if it clones a new pidns that would be
"(0x2,0x1)" on the host, or just 0x1 inside the container.  But we don't
need that if the simple userspace approach suffices)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-03 Thread James Bottomley
On Tue, 2014-04-22 at 14:12 -0400, Richard Guy Briggs wrote:
> Questions:
> Is there a way to link serial numbers of namespaces involved in migration of a
> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> identifier for each running instance of a kernel?  Or at least some identifier
> within the container migration realm?

Are you asking for a way of distinguishing an migrated container from an
unmigrated one?  The answer is pretty much "no" because the job of
migration is to restore to the same state as much as possible.

Reading between the lines, I think your goal is to correlate audit
information across a container migration, right?  Ideally the management
system should be able to cough up an audit trail for a container
wherever it's running and however many times it's been migrated?

In that case, I think your idea of a numeric serial number in a dense
range is wrong.  Because the range is dense you're obviously never going
to be able to use the same serial number across a migration.  However,
if you look at all the management systems for containers, they all have
a concept of some unique ID per container, be it name, UUID or even
GUID.  I suspect it's that you should be using to tag the audit trail
with.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-02 Thread Serge Hallyn
Quoting Richard Guy Briggs (r...@redhat.com):
> On 14/05/02, Serge E. Hallyn wrote:
> > Quoting Richard Guy Briggs (r...@redhat.com):
> > > I saw no replies to my questions when I replied a year after Aris' 
> > > posting, so
> > > I don't know if it was ignored or got lost in stale threads:
> > > 
> > > https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> > > 
> > > https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > >   
> > > (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> > > 
> > > https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > > 
> > > I've tried to answer a number of questions that were raised in that 
> > > thread.
> > > 
> > > The goal is not quite identical to Aris' patchset.
> > > 
> > > The purpose is to track namespaces in use by logged processes from the
> > > perspective of init_*_ns.  The first patch defines a function to list 
> > > them.
> > > The second patch provides an example of usage for audit_log_task_info() 
> > > which
> > > is used by syscall audits, among others.  audit_log_task() and
> > > audit_common_recv_message() would be other potential use cases.
> > > 
> > > Use a serial number per namespace (unique across one boot of one kernel)
> > > instead of the inode number (which is claimed to have had the right to 
> > > change
> > > reserved and is not necessarily unique if there is more than one proc 
> > > fs).  It
> > > could be argued that the inode numbers have now become a defacto 
> > > interface and
> > > can't change now, but I'm proposing this approach to see if this helps 
> > > address
> > > some of the objections to the earlier patchset.
> > > 
> > > There could also have messages added to track the creation and the 
> > > destruction
> > > of namespaces, listing the parent for hierarchical namespaces such as 
> > > pidns,
> > > userns, and listing other ids for non-hierarchical namespaces, as well as 
> > > other
> > > information to help identify a namespace.
> > > 
> > > There has been some progress made for audit in net namespaces and pid
> > > namespaces since this previous thread.  net namespaces are now served as 
> > > peers
> > > by one auditd in the init_net namespace with processes in a non-init_net
> > > namespace being able to write records if they are in the init_user_ns and 
> > > have
> > > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > > records.  As for CAP_AUDIT_READ, I just posted a patchset to check 
> > > capabilities
> > > of userspace processes that try to join netlink broadcast groups.
> > > 
> > > 
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration 
> > > of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > > unique
> > > identifier for each running instance of a kernel?  Or at least some 
> > > identifier
> > > within the container migration realm?
> > 
> > Eric Biederman has always been adamantly opposed to adding new namespaces
> > of namespaces, so the fact that you're asking this question concerns me.
> 
> I have seen that position and I don't fully understand the justification
> for it other than added complexity.
> 
> One way that occured to me to be able to identify a kernel instance was
> to look at CPU serial numbers or other CPU entity intended to be
> globally unique, but that isn't universally available.

That's one issue, which is uniqueness of namespaces cross-machines.

But it gets worse if we consider that after allowing in-container audit,
we'll have a nested container running, then have the parent container
migrated to another host (or just checkpointed and restarted);  Now the
nexted container's indexes will all be changed.  Is there any way audit
can track who's who after the migration?

That's not an indictment of the serial # approach, since (a) we don't
have in-container audit yet and (b) we don't have c/r/migration of nested
containers.  But it's worth considering whether we can solve the issue
with serial #s, and, if not, whether we can solve it with any other
approach.

I guess one approach to solve it would be to allow userspace to request
a next serial #.  Which will immediately lead us to a namespace of serial
#s (since the requested # might be lower than the last used one on the
new host).

As you've said inode #s for /proc/self/ns/* probably aren't sufficiently
unique, though perhaps we could attach a generation # for the sake of
audit.  Then after a c/r/migration the generation # may be different,
but we may have a better shot at at least using the same ino#.

> Another possibility was RTC reading at time of boot, but that isn't good
> enough either.
> 
> Both are dubious in VMs anyways.
> 
> > The way things are right now, since audit belongs to the init userns,
> > we can get away with saying if a container 'migrates', the new kernel
> > will see a different set of serials,

Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-02 Thread Richard Guy Briggs
On 14/05/02, Serge E. Hallyn wrote:
> Quoting Richard Guy Briggs (r...@redhat.com):
> > I saw no replies to my questions when I replied a year after Aris' posting, 
> > so
> > I don't know if it was ignored or got lost in stale threads:
> > https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> > https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > 
> > (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> > 
> > https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > 
> > I've tried to answer a number of questions that were raised in that thread.
> > 
> > The goal is not quite identical to Aris' patchset.
> > 
> > The purpose is to track namespaces in use by logged processes from the
> > perspective of init_*_ns.  The first patch defines a function to list them.
> > The second patch provides an example of usage for audit_log_task_info() 
> > which
> > is used by syscall audits, among others.  audit_log_task() and
> > audit_common_recv_message() would be other potential use cases.
> > 
> > Use a serial number per namespace (unique across one boot of one kernel)
> > instead of the inode number (which is claimed to have had the right to 
> > change
> > reserved and is not necessarily unique if there is more than one proc fs).  
> > It
> > could be argued that the inode numbers have now become a defacto interface 
> > and
> > can't change now, but I'm proposing this approach to see if this helps 
> > address
> > some of the objections to the earlier patchset.
> > 
> > There could also have messages added to track the creation and the 
> > destruction
> > of namespaces, listing the parent for hierarchical namespaces such as pidns,
> > userns, and listing other ids for non-hierarchical namespaces, as well as 
> > other
> > information to help identify a namespace.
> > 
> > There has been some progress made for audit in net namespaces and pid
> > namespaces since this previous thread.  net namespaces are now served as 
> > peers
> > by one auditd in the init_net namespace with processes in a non-init_net
> > namespace being able to write records if they are in the init_user_ns and 
> > have
> > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > records.  As for CAP_AUDIT_READ, I just posted a patchset to check 
> > capabilities
> > of userspace processes that try to join netlink broadcast groups.
> > 
> > 
> > Questions:
> > Is there a way to link serial numbers of namespaces involved in migration 
> > of a
> > container to another kernel?  (I had a brief look at CRIU.)  Is there a 
> > unique
> > identifier for each running instance of a kernel?  Or at least some 
> > identifier
> > within the container migration realm?
> 
> Eric Biederman has always been adamantly opposed to adding new namespaces
> of namespaces, so the fact that you're asking this question concerns me.

I have seen that position and I don't fully understand the justification
for it other than added complexity.

One way that occured to me to be able to identify a kernel instance was
to look at CPU serial numbers or other CPU entity intended to be
globally unique, but that isn't universally available.

Another possibility was RTC reading at time of boot, but that isn't good
enough either.

Both are dubious in VMs anyways.

> The way things are right now, since audit belongs to the init userns,
> we can get away with saying if a container 'migrates', the new kernel
> will see a different set of serials, and noone should care.  However,
> if we're going to be allowing containers to have their own audit
> namespace/layer/whatever, then this becomes more of a concern.

Having a container have its own audit daemon (partitionned appropriately
in the kernel) would be a long-term goal.

> That said, I'll now look at the patches while pretending that problem
> does not exist :)  If I ack, it'll be on correctness of the code, but
> we'll still have to deal with this issue.

Getting some discussion about this migration challenge was a significant
motivation for posting this patch, so I'm hoping others will weigh in.

Thanks for your review, Serge.

> > What additional events should list this information?
> > 
> > Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> > init namespace at the moment.
> > 
> > 
> > Proposed output format:
> > This differs slightly from Aristeu's patch because of the label conflict 
> > with
> > "pid=" due to including it in existing records rather than it being a 
> > seperate
> > record:
> > type=SYSCALL msg=audit(1398112249.996:65): arch=c03e 
> > syscall=272 success=yes exit=0 a0=4000 a1= a2=0 a3=22 
> > items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 
> > egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" 
> > ex

Re: [PATCH 0/2] namespaces: log namespaces per task

2014-05-01 Thread Serge E. Hallyn
Quoting Richard Guy Briggs (r...@redhat.com):
> I saw no replies to my questions when I replied a year after Aris' posting, so
> I don't know if it was ignored or got lost in stale threads:
> https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
>   
> (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> 
> I've tried to answer a number of questions that were raised in that thread.
> 
> The goal is not quite identical to Aris' patchset.
> 
> The purpose is to track namespaces in use by logged processes from the
> perspective of init_*_ns.  The first patch defines a function to list them.
> The second patch provides an example of usage for audit_log_task_info() which
> is used by syscall audits, among others.  audit_log_task() and
> audit_common_recv_message() would be other potential use cases.
> 
> Use a serial number per namespace (unique across one boot of one kernel)
> instead of the inode number (which is claimed to have had the right to change
> reserved and is not necessarily unique if there is more than one proc fs).  It
> could be argued that the inode numbers have now become a defacto interface and
> can't change now, but I'm proposing this approach to see if this helps address
> some of the objections to the earlier patchset.
> 
> There could also have messages added to track the creation and the destruction
> of namespaces, listing the parent for hierarchical namespaces such as pidns,
> userns, and listing other ids for non-hierarchical namespaces, as well as 
> other
> information to help identify a namespace.
> 
> There has been some progress made for audit in net namespaces and pid
> namespaces since this previous thread.  net namespaces are now served as peers
> by one auditd in the init_net namespace with processes in a non-init_net
> namespace being able to write records if they are in the init_user_ns and have
> CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> records.  As for CAP_AUDIT_READ, I just posted a patchset to check 
> capabilities
> of userspace processes that try to join netlink broadcast groups.
> 
> 
> Questions:
> Is there a way to link serial numbers of namespaces involved in migration of a
> container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> identifier for each running instance of a kernel?  Or at least some identifier
> within the container migration realm?

Eric Biederman has always been adamantly opposed to adding new namespaces
of namespaces, so the fact that you're asking this question concerns me.

The way things are right now, since audit belongs to the init userns,
we can get away with saying if a container 'migrates', the new kernel
will see a different set of serials, and noone should care.  However,
if we're going to be allowing containers to have their own audit
namespace/layer/whatever, then this becomes more of a concern.

That said, I'll now look at the patches while pretending that problem
does not exist :)  If I ack, it'll be on correctness of the code, but
we'll still have to deal with this issue.

> What additional events should list this information?
> 
> Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> init namespace at the moment.
> 
> 
> Proposed output format:
> This differs slightly from Aristeu's patch because of the label conflict with
> "pid=" due to including it in existing records rather than it being a seperate
> record:
> type=SYSCALL msg=audit(1398112249.996:65): arch=c03e syscall=272 
> success=yes exit=0 a0=4000 a1= a2=0 a3=22 items=0 ppid=1 
> pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 
> fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" 
> exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 
> userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> 
> 
> Note: This set does not try to solve the non-init namespace audit messages and
> auditd problem yet.  That will come later, likely with additional auditd
> instances running in another namespace with a limited ability to influence the
> master auditd.  I echo Eric B's idea that messages destined for different
> namespaces would have to be tailored for that namespace with references that
> make sense (such as the right pid number reported to that pid namespace, and
> not leaking info about parents or peers).
> 
> 
> Richard Guy Briggs (2):
>   namespaces: give each namespace a serial number
>   audit: log namespace serial numbers
> 
>  fs/mount.h |1 +
>  fs/namespace.c |1 +
>  include/linux/audit.h  |7 +++
>  include/linux/ipc_namespace.h  |1 +
>  include/linux/nsproxy