Re: Re: Thoughts on credential switching

2014-04-22 Thread Jim Lieb
On Tuesday, April 22, 2014 15:14:33 Boaz Harrosh wrote:
> On 04/22/2014 02:37 PM, Florian Weimer wrote:
> > On 03/27/2014 02:33 PM, Boaz Harrosh wrote:
> >> POSIX or not it just does not have any real programming mining
> >> at all.
> > 
> > What do you mean with "mining" in this context?
> 
> Sorry I saw this mistake after I posted. I meant "meaning".
> 
> What I'm saying is that the mess starts when you are trying
> to keep patching a very wrong API. the POSIX politics aside,
> in regard to user switching (and current directory and etc...)
> this API is plain WRONG. I mean in the mathematical sense wrong.
> 
> All these application mess is not the application programmers
> fault. He had to do what he had to do. The mess starts when you
> are trying to keep a mathematical contradiction in your proof.
> 
> It is glibc mess for trying to maintain compatibility with
> these "PROCESS WIDE OPERATIONS". And naming it holy names like
> POSIX will not cover the mess that they are. As long as you try
> to keep them there will be mess. If you want to honestly clean
> things up is by throwing the true garbage out. Convert all legacy
> code to new mathematically sound API's.
> 
> Peace
> Boaz

I don't want to keep this churning thread going.  I'd rather solve today's 
problems.  So let's put this in perspective and move on.  Our project has a 
problem to solve.

I wouldn't necessarily say wrong, simply very out of date.  All of these 
issues with POSIX and pthreads in particular come from one common source, the 
actual capabilities of the systems, BSD, and SysV at the time.  The goofy 
localtime_r and friends were (and still are) hacks to work around the 
simplistic interfaces in the original Bell Labs code that deeply assumed a 
single process with a single thread, i.e. there were only 3 segments, no 
shared libraries or shared r/w memory and definitely no (real) TLS.  The 
pthreads interfaces deeply assumed the same.  Even worse, it had to also run 
on VMS which (I don't believe) ever had kernel support for multiple user space 
theads.  ASTs are nasty enough and only worked because everybody knew that you 
were either in app code or the AST...  So all of this stuff is based on lowest 
common denominator.  The "Single UNIX Specification" was a peace treaty among 
warring computer companies out looking for lock-ins and competitive advantages 
while reluctantly realizing that the ISV community (Oracle) really didn't care 
about anything but a common base.

Let's move on.  Glibc has to keep some semblance of POSIX in order to cope 
with legacy code from that era and I commend them for their perseverance.  But 
that should not hold us back from leveraging today's architectures and, more 
importantly, today's multi-client workloads.  If that means extensions to 
POSIX fine.  But if POSIX gets hung up because *BSD doesn't have kernel support 
for thread level creds and *bsd.org doesn't want to do it, we just do what fits 
today's requirements and what Linux is capable of,

Jim
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-04-22 Thread Boaz Harrosh
On 04/22/2014 02:37 PM, Florian Weimer wrote:
> On 03/27/2014 02:33 PM, Boaz Harrosh wrote:
>> POSIX or not it just does not have any real programming mining
>> at all.
> 
> What do you mean with "mining" in this context?
> 

Sorry I saw this mistake after I posted. I meant "meaning".

What I'm saying is that the mess starts when you are trying
to keep patching a very wrong API. the POSIX politics aside,
in regard to user switching (and current directory and etc...)
this API is plain WRONG. I mean in the mathematical sense wrong.

All these application mess is not the application programmers
fault. He had to do what he had to do. The mess starts when you
are trying to keep a mathematical contradiction in your proof.

It is glibc mess for trying to maintain compatibility with
these "PROCESS WIDE OPERATIONS". And naming it holy names like
POSIX will not cover the mess that they are. As long as you try
to keep them there will be mess. If you want to honestly clean
things up is by throwing the true garbage out. Convert all legacy
code to new mathematically sound API's.

Peace
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-04-22 Thread Florian Weimer

On 03/27/2014 02:33 PM, Boaz Harrosh wrote:


man setuid should be saying DEPRECATED, EMULATED and SIGNAL NOT SAFE
and be done with it POSIX or no POSIX who cares?


The glibc side cares, and there's also this bit: "It aims towards POSIX 
and Single UNIX Specification compliance.", which should be familiar.


I get that POSIX doesn't do what people want here (umask and current 
directory handling are another story).  But it hasn't really helped that 
some folks just said "POSIX sucks" and did their own thing, completely 
bypassing the system interfaces, instead of getting things fixed 
properly.  Now we have to balance the kernel behavior, POSIX demands, 
glibc dirty hacks to implement the questionable POSIX behavior on top of 
the kernel interfaces, and more dirty hacks to work around glibc because 
some applications do not like it.


I would really like to clean up this mess, but it's not clear to me 
where to start.



Please show me one user of that and declare it brain dead.


Safely and demonstratively relinquishing elevated privileges.


POSIX or not it just does not have any real programming mining
at all.


What do you mean with "mining" in this context?

--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-04-22 Thread Florian Weimer

On 03/27/2014 02:33 PM, Boaz Harrosh wrote:


man setuid should be saying DEPRECATED, EMULATED and SIGNAL NOT SAFE
and be done with it POSIX or no POSIX who cares?


The glibc side cares, and there's also this bit: It aims towards POSIX 
and Single UNIX Specification compliance., which should be familiar.


I get that POSIX doesn't do what people want here (umask and current 
directory handling are another story).  But it hasn't really helped that 
some folks just said POSIX sucks and did their own thing, completely 
bypassing the system interfaces, instead of getting things fixed 
properly.  Now we have to balance the kernel behavior, POSIX demands, 
glibc dirty hacks to implement the questionable POSIX behavior on top of 
the kernel interfaces, and more dirty hacks to work around glibc because 
some applications do not like it.


I would really like to clean up this mess, but it's not clear to me 
where to start.



Please show me one user of that and declare it brain dead.


Safely and demonstratively relinquishing elevated privileges.


POSIX or not it just does not have any real programming mining
at all.


What do you mean with mining in this context?

--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-04-22 Thread Boaz Harrosh
On 04/22/2014 02:37 PM, Florian Weimer wrote:
 On 03/27/2014 02:33 PM, Boaz Harrosh wrote:
 POSIX or not it just does not have any real programming mining
 at all.
 
 What do you mean with mining in this context?
 

Sorry I saw this mistake after I posted. I meant meaning.

What I'm saying is that the mess starts when you are trying
to keep patching a very wrong API. the POSIX politics aside,
in regard to user switching (and current directory and etc...)
this API is plain WRONG. I mean in the mathematical sense wrong.

All these application mess is not the application programmers
fault. He had to do what he had to do. The mess starts when you
are trying to keep a mathematical contradiction in your proof.

It is glibc mess for trying to maintain compatibility with
these PROCESS WIDE OPERATIONS. And naming it holy names like
POSIX will not cover the mess that they are. As long as you try
to keep them there will be mess. If you want to honestly clean
things up is by throwing the true garbage out. Convert all legacy
code to new mathematically sound API's.

Peace
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Thoughts on credential switching

2014-04-22 Thread Jim Lieb
On Tuesday, April 22, 2014 15:14:33 Boaz Harrosh wrote:
 On 04/22/2014 02:37 PM, Florian Weimer wrote:
  On 03/27/2014 02:33 PM, Boaz Harrosh wrote:
  POSIX or not it just does not have any real programming mining
  at all.
  
  What do you mean with mining in this context?
 
 Sorry I saw this mistake after I posted. I meant meaning.
 
 What I'm saying is that the mess starts when you are trying
 to keep patching a very wrong API. the POSIX politics aside,
 in regard to user switching (and current directory and etc...)
 this API is plain WRONG. I mean in the mathematical sense wrong.
 
 All these application mess is not the application programmers
 fault. He had to do what he had to do. The mess starts when you
 are trying to keep a mathematical contradiction in your proof.
 
 It is glibc mess for trying to maintain compatibility with
 these PROCESS WIDE OPERATIONS. And naming it holy names like
 POSIX will not cover the mess that they are. As long as you try
 to keep them there will be mess. If you want to honestly clean
 things up is by throwing the true garbage out. Convert all legacy
 code to new mathematically sound API's.
 
 Peace
 Boaz

I don't want to keep this churning thread going.  I'd rather solve today's 
problems.  So let's put this in perspective and move on.  Our project has a 
problem to solve.

I wouldn't necessarily say wrong, simply very out of date.  All of these 
issues with POSIX and pthreads in particular come from one common source, the 
actual capabilities of the systems, BSD, and SysV at the time.  The goofy 
localtime_r and friends were (and still are) hacks to work around the 
simplistic interfaces in the original Bell Labs code that deeply assumed a 
single process with a single thread, i.e. there were only 3 segments, no 
shared libraries or shared r/w memory and definitely no (real) TLS.  The 
pthreads interfaces deeply assumed the same.  Even worse, it had to also run 
on VMS which (I don't believe) ever had kernel support for multiple user space 
theads.  ASTs are nasty enough and only worked because everybody knew that you 
were either in app code or the AST...  So all of this stuff is based on lowest 
common denominator.  The Single UNIX Specification was a peace treaty among 
warring computer companies out looking for lock-ins and competitive advantages 
while reluctantly realizing that the ISV community (Oracle) really didn't care 
about anything but a common base.

Let's move on.  Glibc has to keep some semblance of POSIX in order to cope 
with legacy code from that era and I commend them for their perseverance.  But 
that should not hold us back from leveraging today's architectures and, more 
importantly, today's multi-client workloads.  If that means extensions to 
POSIX fine.  But if POSIX gets hung up because *BSD doesn't have kernel support 
for thread level creds and *bsd.org doesn't want to do it, we just do what fits 
today's requirements and what Linux is capable of,

Jim
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

If ease of use was the only requirement, we would all be riding tricycles
- Douglas Engelbart 1925–2013
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-04-01 Thread One Thousand Gnomes
> Do we include credfd fds sitting in Unix sockets?

Ouch

> Hmm.  What if we had initial_creds and creds, and initial_creds never
> changed unless explicitly requested.  There wouldn't be any way to
> revert to initial_creds, but may_ptrace would check initial_creds
> *and* creds.  This is a little hacky, and it might break Android (I'm
> not really sure how the zygote thing works), but it could do the
> trick.

That mirros setfsuid and seteuid type behaviour so seems sensible enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-04-01 Thread One Thousand Gnomes
 Do we include credfd fds sitting in Unix sockets?

Ouch

 Hmm.  What if we had initial_creds and creds, and initial_creds never
 changed unless explicitly requested.  There wouldn't be any way to
 revert to initial_creds, but may_ptrace would check initial_creds
 *and* creds.  This is a little hacky, and it might break Android (I'm
 not really sure how the zygote thing works), but it could do the
 trick.

That mirros setfsuid and seteuid type behaviour so seems sensible enough.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Andy Lutomirski
On Mon, Mar 31, 2014 at 1:14 PM, Trond Myklebust
 wrote:
>
> On Mar 31, 2014, at 15:26, Andy Lutomirski  wrote:
>
>> On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
>>  wrote:
>>>
>>> On Mar 31, 2014, at 7:51, Jeff Layton  wrote:
>>>
 On Sun, 30 Mar 2014 09:03:29 -0400
 "Theodore Ts'o"  wrote:

> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>> I had some time to think about this last night...
>>
>> While using a fd to pass around credentials is convenient, the danger
>> is that it's pretty opaque. You have a fd that you know has creds
>> attached to it, but it's hard to be certain what is going to change.
>
> I don't think that's a particularly tough problem.  In general, the fd
> isn't something that you would want to pass around, and so the process
> which generated it will know exactly what it contained.
>

 I think there's a bit more of a use-case for passing around such an fd
 via socket...

 Part of the problem is that the traditional uid/gid switching glibc
 wrappers are per-process. If we're proposing doing something like:

 seteuid()
 setegid()
 setgroups()
 fd = open()
 (...and then revert the creds using same syscalls)

 ...during the time that you're doing all of that, you can't really
 allow any thread in the process to be doing something that requires
 _other_ creds until you've completed the above.
>>>
>>> Umm... open() isn't the only operation that you want to be able to do with 
>>> an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... 
>>> Pretty much any interaction with the underlying filesystem needs to use the 
>>> right identity.
>>>
 So, I could envision a program like ganesha firing up a separate
 process to handle the credential switching and fd creation and then
 handing those back to the main process via a unix domain socket.
>>>
>>> How about using the keyrings interface to atomically cache and retrieve 
>>> these user identities? We already have support for different types of keys 
>>> that store/retrieve different types of structured information. How is this 
>>> so different?
>>
>> This sounds considerably more complicated than just using fds.  What's
>> the advantage?
>
> The advantage is that it's considerably _less_ complicated because it uses 
> interfaces that were designed to carry security related information, and to 
> share them across threads.
>
>> I guess using keys for local fs credentials fits in with using keys to
>> access things like AFS, but I'm still not sure I see why this helps
>> here.
>
> All you want to do is atomically store and retrieve a user identity (in 
> practice a credential) and share it between members of a thread. As I said, 
> that kind of storage/retrieval of structured data is what keyrings do. As far 
> as I know, there are already LSM interfaces in place (see 
> security_key_alloc/permission/free), and the generic keyring/keyctl interface 
> exists with appropriate process sharing/inheritance rules via the kernel 
> keyring interface.
>
> So basically, the recipe is:
>
> - You set up a kernel key type that takes a reference to the current process 
> credential on instantiation (no upcalls, etc needed). Ganesha can then use 
> standard libkeyutils methods to create the key and store it in its user, 
> process or thread keyring.
> - You add a new keyctl (KEYCTL_APPLY?) that enables any thread/process that 
> has access to the key, via the keyring in which it is stored, to apply the 
> stored process credential to itself (and perhaps cache it's old credential in 
> a new key?).

I can see pros and cons of using keys instead of fds:

Pros:
 - You can change may_ptrace to check that you can legally ptrace any
creds in the keyring.  (It's debatable how useful this is.)
 - You could plausibly stick subsidiary uids into a user keyring and
use them for uid_map, removing the need for newuidmap.

Cons:
 - You can't send them via SCM_RIGHTS.  I still think that the ability
to do that is extremely useful.

Neutral: What happens if the filesystem you're trying to share
requires keys?  I don't think that credfd or credkeys can really do
this.  I suppose that if the keys live in the user keyring
corresponding to the user being impersonated, everything just works
regardless of how this is implemented.

Cue brain explosion: thread_keyring lives inside struct cred.  I
suspect that any sane implementation of credential switching will need
to change that.  Sigh.


Anyway, the actual parts that seem like they need solving one way or
another are:

 - The keyring pointers in struct cred are a problem.  (Solved with
credkeys, but only if credkeys can only live on the thread keyring.
Also, there will be a problem with circular references.)

 - may_ptrace and all its users (e.g. procfs).  "Solved" by
real_creds, but no one has really defined the semantics of real_creds,
since it's not currently 

Re: Thoughts on credential switching

2014-03-31 Thread Trond Myklebust

On Mar 31, 2014, at 15:26, Andy Lutomirski  wrote:

> On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
>  wrote:
>> 
>> On Mar 31, 2014, at 7:51, Jeff Layton  wrote:
>> 
>>> On Sun, 30 Mar 2014 09:03:29 -0400
>>> "Theodore Ts'o"  wrote:
>>> 
 On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
> I had some time to think about this last night...
> 
> While using a fd to pass around credentials is convenient, the danger
> is that it's pretty opaque. You have a fd that you know has creds
> attached to it, but it's hard to be certain what is going to change.
 
 I don't think that's a particularly tough problem.  In general, the fd
 isn't something that you would want to pass around, and so the process
 which generated it will know exactly what it contained.
 
>>> 
>>> I think there's a bit more of a use-case for passing around such an fd
>>> via socket...
>>> 
>>> Part of the problem is that the traditional uid/gid switching glibc
>>> wrappers are per-process. If we're proposing doing something like:
>>> 
>>> seteuid()
>>> setegid()
>>> setgroups()
>>> fd = open()
>>> (...and then revert the creds using same syscalls)
>>> 
>>> ...during the time that you're doing all of that, you can't really
>>> allow any thread in the process to be doing something that requires
>>> _other_ creds until you've completed the above.
>> 
>> Umm... open() isn't the only operation that you want to be able to do with 
>> an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... 
>> Pretty much any interaction with the underlying filesystem needs to use the 
>> right identity.
>> 
>>> So, I could envision a program like ganesha firing up a separate
>>> process to handle the credential switching and fd creation and then
>>> handing those back to the main process via a unix domain socket.
>> 
>> How about using the keyrings interface to atomically cache and retrieve 
>> these user identities? We already have support for different types of keys 
>> that store/retrieve different types of structured information. How is this 
>> so different?
> 
> This sounds considerably more complicated than just using fds.  What's
> the advantage?

The advantage is that it’s considerably _less_ complicated because it uses 
interfaces that were designed to carry security related information, and to 
share them across threads.

> I guess using keys for local fs credentials fits in with using keys to
> access things like AFS, but I'm still not sure I see why this helps
> here.

All you want to do is atomically store and retrieve a user identity (in 
practice a credential) and share it between members of a thread. As I said, 
that kind of storage/retrieval of structured data is what keyrings do. As far 
as I know, there are already LSM interfaces in place (see 
security_key_alloc/permission/free), and the generic keyring/keyctl interface 
exists with appropriate process sharing/inheritance rules via the kernel 
keyring interface.

So basically, the recipe is:

- You set up a kernel key type that takes a reference to the current process 
credential on instantiation (no upcalls, etc needed). Ganesha can then use 
standard libkeyutils methods to create the key and store it in its user, 
process or thread keyring.
- You add a new keyctl (KEYCTL_APPLY?) that enables any thread/process that has 
access to the key, via the keyring in which it is stored, to apply the stored 
process credential to itself (and perhaps cache it’s old credential in a new 
key?).


_
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Andy Lutomirski
On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
 wrote:
>
> On Mar 31, 2014, at 7:51, Jeff Layton  wrote:
>
>> On Sun, 30 Mar 2014 09:03:29 -0400
>> "Theodore Ts'o"  wrote:
>>
>>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
 I had some time to think about this last night...

 While using a fd to pass around credentials is convenient, the danger
 is that it's pretty opaque. You have a fd that you know has creds
 attached to it, but it's hard to be certain what is going to change.
>>>
>>> I don't think that's a particularly tough problem.  In general, the fd
>>> isn't something that you would want to pass around, and so the process
>>> which generated it will know exactly what it contained.
>>>
>>
>> I think there's a bit more of a use-case for passing around such an fd
>> via socket...
>>
>> Part of the problem is that the traditional uid/gid switching glibc
>> wrappers are per-process. If we're proposing doing something like:
>>
>> seteuid()
>> setegid()
>> setgroups()
>> fd = open()
>> (...and then revert the creds using same syscalls)
>>
>> ...during the time that you're doing all of that, you can't really
>> allow any thread in the process to be doing something that requires
>> _other_ creds until you've completed the above.
>
> Umm... open() isn't the only operation that you want to be able to do with an 
> assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... 
> Pretty much any interaction with the underlying filesystem needs to use the 
> right identity.
>
>> So, I could envision a program like ganesha firing up a separate
>> process to handle the credential switching and fd creation and then
>> handing those back to the main process via a unix domain socket.
>
> How about using the keyrings interface to atomically cache and retrieve these 
> user identities? We already have support for different types of keys that 
> store/retrieve different types of structured information. How is this so 
> different?

This sounds considerably more complicated than just using fds.  What's
the advantage?

I guess using keys for local fs credentials fits in with using keys to
access things like AFS, but I'm still not sure I see why this helps
here.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Jeremy Allison
On Mon, Mar 31, 2014 at 11:44:59AM +0100, One Thousand Gnomes wrote:
> On Wed, 26 Mar 2014 17:23:24 -0700
> Andy Lutomirski  wrote:
> 
> > Hi various people who care about user-space NFS servers and/or
> > security-relevant APIs.
> > 
> > I propose the following set of new syscalls:
> > 
> > int credfd_create(unsigned int flags): returns a new credfd that
> > corresponds to current's creds.
> > 
> > int credfd_activate(int fd, unsigned int flags): Change current's
> > creds to match the creds stored in fd.  To be clear, this changes both
> > the "subjective" and "objective" (aka real_cred and cred) because
> > there aren't any real semantics for what happens when userspace code
> > runs with real_cred != cred.
> 
> What is the sematic of a simultaneous ptrace racing a credfd_activate on
> another processor core ?
> 
> What are the rules for simultaneous threads doing I/O and and credential
> changes ?
> 
> What is the rule for a faulting of an mmapped page in a multithreaded app
> one thread of which has changed credentials ?
> 
> Who owns a file created while you are changing credentials ?
> 
> >  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> > don't want a privileged daemon to be ptraced while impersonating
> > someone else.
> 
> That's one of the obvious problems but if you have that problem then
> you've got races against signals and ptrace etc to deal with.

FYI, Any process using pthreads and glibc already
has to cope with these races as setresuid on glibc
on Linux is not atomic.

That's why Samba eventually changed to using the
raw system calls on Linux due to an interesting
bug with glibc aio interacting with setresuid
races (receiving signal thread was uid 0, sending
aio thread was non-zero - signal couldn't be
delivered, glibc aio wakeup lost).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Jeff Layton
On Mon, 31 Mar 2014 14:06:01 -0400
Trond Myklebust  wrote:

> 
> On Mar 31, 2014, at 7:51, Jeff Layton  wrote:
> 
> > On Sun, 30 Mar 2014 09:03:29 -0400
> > "Theodore Ts'o"  wrote:
> > 
> >> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
> >>> I had some time to think about this last night...
> >>> 
> >>> While using a fd to pass around credentials is convenient, the danger
> >>> is that it's pretty opaque. You have a fd that you know has creds
> >>> attached to it, but it's hard to be certain what is going to change.
> >> 
> >> I don't think that's a particularly tough problem.  In general, the fd
> >> isn't something that you would want to pass around, and so the process
> >> which generated it will know exactly what it contained.
> >> 
> > 
> > I think there's a bit more of a use-case for passing around such an fd
> > via socket...
> > 
> > Part of the problem is that the traditional uid/gid switching glibc
> > wrappers are per-process. If we're proposing doing something like:
> > 
> > seteuid()
> > setegid()
> > setgroups()
> > fd = open()
> > (...and then revert the creds using same syscalls)
> > 
> > ...during the time that you're doing all of that, you can't really
> > allow any thread in the process to be doing something that requires
> > _other_ creds until you've completed the above.
> 
> Umm… open() isn’t the only operation that you want to be able to do with an 
> assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty 
> much any interaction with the underlying filesystem needs to use the right 
> identity.
> 

The proposal that Al originally had was to have userland set up the
credentials that it wanted to use, and then open("/dev/null", ...) to
get a fd that has those creds attached.

Then, we'd add a syscall that takes such an fd and switches to the creds
attached to it. So the open() in the above example isn't to do an actual
operation but rather to create a fd to act as a credential "handle".

> > So, I could envision a program like ganesha firing up a separate
> > process to handle the credential switching and fd creation and then
> > handing those back to the main process via a unix domain socket.
> 
> How about using the keyrings interface to atomically cache and retrieve these 
> user identities? We already have support for different types of keys that 
> store/retrieve different types of structured information. How is this so 
> different?
> 

That's an interesting idea. 
-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Trond Myklebust

On Mar 31, 2014, at 7:51, Jeff Layton  wrote:

> On Sun, 30 Mar 2014 09:03:29 -0400
> "Theodore Ts'o"  wrote:
> 
>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>>> I had some time to think about this last night...
>>> 
>>> While using a fd to pass around credentials is convenient, the danger
>>> is that it's pretty opaque. You have a fd that you know has creds
>>> attached to it, but it's hard to be certain what is going to change.
>> 
>> I don't think that's a particularly tough problem.  In general, the fd
>> isn't something that you would want to pass around, and so the process
>> which generated it will know exactly what it contained.
>> 
> 
> I think there's a bit more of a use-case for passing around such an fd
> via socket...
> 
> Part of the problem is that the traditional uid/gid switching glibc
> wrappers are per-process. If we're proposing doing something like:
> 
> seteuid()
> setegid()
> setgroups()
> fd = open()
> (...and then revert the creds using same syscalls)
> 
> ...during the time that you're doing all of that, you can't really
> allow any thread in the process to be doing something that requires
> _other_ creds until you've completed the above.

Umm… open() isn’t the only operation that you want to be able to do with an 
assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty 
much any interaction with the underlying filesystem needs to use the right 
identity.

> So, I could envision a program like ganesha firing up a separate
> process to handle the credential switching and fd creation and then
> handing those back to the main process via a unix domain socket.

How about using the keyrings interface to atomically cache and retrieve these 
user identities? We already have support for different types of keys that 
store/retrieve different types of structured information. How is this so 
different?

Cheers
  Trond

_
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Andy Lutomirski
On Mar 31, 2014 3:45 AM, "One Thousand Gnomes"
 wrote:
>
> On Wed, 26 Mar 2014 17:23:24 -0700
> Andy Lutomirski  wrote:
>
> > Hi various people who care about user-space NFS servers and/or
> > security-relevant APIs.
> >
> > I propose the following set of new syscalls:
> >
> > int credfd_create(unsigned int flags): returns a new credfd that
> > corresponds to current's creds.
> >
> > int credfd_activate(int fd, unsigned int flags): Change current's
> > creds to match the creds stored in fd.  To be clear, this changes both
> > the "subjective" and "objective" (aka real_cred and cred) because
> > there aren't any real semantics for what happens when userspace code
> > runs with real_cred != cred.
>
> What is the sematic of a simultaneous ptrace racing a credfd_activate on
> another processor core ?

Exactly the same as for the raw syscall setresuid.  That is, we
require sequential consistency.  If the kernel fails to provide
sequential consistency, it's a bug.

>
> What are the rules for simultaneous threads doing I/O and and credential
> changes ?

None.  credfd_activate affects only the calling thread.

>
> What is the rule for a faulting of an mmapped page in a multithreaded app
> one thread of which has changed credentials ?

Same as in current kernels.  Note that checking credentials in
readpage, etc. is generally a bug.

>
> Who owns a file created while you are changing credentials ?

Can't happen.  See above.

>
> >  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> > don't want a privileged daemon to be ptraced while impersonating
> > someone else.
>
> That's one of the obvious problems but if you have that problem then
> you've got races against signals and ptrace etc to deal with.
>
> One way to implement it I think safely but which requires a fair bit more
> work elsewhere is to apply the debug and signal type checks as
>
>'you may only do X if you also posess the right to do so for *all*
> credentials accessible to this task'
>
> which I think is the correct logical check.

Do we include credfd fds sitting in Unix sockets?

More generally, there are plenty of cases where ptracing something
might give you access to one of its fds that you shouldn't normally
have access to.  Even real file fds are like this, and they could come
from outside a namespace, they could be a result of odd selinux
policy, etc.

The only real new issue is that, while using a subsidiary credential,
you don't want people who couldn't ptrace you before to suddenly be
able to ptrace you.  This is where "subjective" and "objective"
credentials came from, but the current in-kernel implementation is
really half-baked.  Fixing it for real will probably run into all
kinds of funny corner cases involving the existing syscalls, and it
would be nice to have a way to handle this that doesn't involve
auditing every commit_creds caller.

Hmm.  What if we had initial_creds and creds, and initial_creds never
changed unless explicitly requested.  There wouldn't be any way to
revert to initial_creds, but may_ptrace would check initial_creds
*and* creds.  This is a little hacky, and it might break Android (I'm
not really sure how the zygote thing works), but it could do the
trick.

Or we can rely on dumpable, which I think can prevent unprivileged
ptracers from attaching.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Jeff Layton
On Sun, 30 Mar 2014 09:03:29 -0400
"Theodore Ts'o"  wrote:

> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
> > I had some time to think about this last night...
> > 
> > While using a fd to pass around credentials is convenient, the danger
> > is that it's pretty opaque. You have a fd that you know has creds
> > attached to it, but it's hard to be certain what is going to change.
> 
> I don't think that's a particularly tough problem.  In general, the fd
> isn't something that you would want to pass around, and so the process
> which generated it will know exactly what it contained.
> 

I think there's a bit more of a use-case for passing around such an fd
via socket...

Part of the problem is that the traditional uid/gid switching glibc
wrappers are per-process. If we're proposing doing something like:

seteuid()
setegid()
setgroups()
fd = open()
(...and then revert the creds using same syscalls)

...during the time that you're doing all of that, you can't really
allow any thread in the process to be doing something that requires
_other_ creds until you've completed the above.

So, I could envision a program like ganesha firing up a separate
process to handle the credential switching and fd creation and then
handing those back to the main process via a unix domain socket.

> > Perhaps we can use the flags field for that. So, assuming we have a fd
> > with the creds attached, we could do something like:
> > 
> > err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
> > 
> > ...then the switch_creds syscall could be set up to fail if the new
> > credentials had other fields that didn't match those in the current
> > task credentials. So if (for instance) the cred->euid were
> > different between the two, the above could fail with -EINVAL or
> > something.
> 
> Huh?  The whole *point* is that the creds value will be different, of
> course they won't match!  I would think this would be over
> complicating the interface.
> 

You know that *some* of the credentials won't match, but presumably
some of them (e.g. the real uid/gid) should match. Using a flags field
could act as a mask of what creds are allowed to change.

> 
> A couple of other things.  What I would suggest is that we create a
> few new fd flags, to join FD_CLOEXEC:
> 
> FD_NOPROCFS   disallow being able to open the inode via /proc//fd
>   (but in the case of a creds fd, for bonus points, the
>   target of the pseudo-symlink could be something like:
>   "uid: 15806 gid: 100: grps: 27, 50" to aid in debugging
>   a userspace file server).  This also answers Jeff's concern
>   if for some reason --- I don't know how --- a process
>   doesn't know what the contents of the creds fd that
>   it created itself.
> 
> FD_NOPASSFD   disallow being able to pass the fd via a unix domain socket
> 
> FD_LOCKFLAGS  if this bit is set, disallow any further changes of FD_CLOEXEC,
>   FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.
> 
> Some of the functionality requested by the folks suggesting the "SEAL"
> API would also be covered by these fd flags.
> 
> In order to solve some potential race concerns, a credsfd must be
> created with FD_CLOEXEC and FD_NOPROCFS enabled.
> 
> Why is this important even if the anon_inode is owned by root with a
> mode of 0?  Because if the system is set up to use SELinux or full
> Posix capabilities, merely having the a uid of 0 is not special, and
> we don't want to allow a process with uid of 0 to be able modify the
> mode with the /proc//fd/ and then proceed to open the inode
> using open.  This way, instead of adding special case code to prevent
> this from happening, we can add a more general facility which can be
> used to solve a few other problems.
> 



-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread One Thousand Gnomes
On Wed, 26 Mar 2014 17:23:24 -0700
Andy Lutomirski  wrote:

> Hi various people who care about user-space NFS servers and/or
> security-relevant APIs.
> 
> I propose the following set of new syscalls:
> 
> int credfd_create(unsigned int flags): returns a new credfd that
> corresponds to current's creds.
> 
> int credfd_activate(int fd, unsigned int flags): Change current's
> creds to match the creds stored in fd.  To be clear, this changes both
> the "subjective" and "objective" (aka real_cred and cred) because
> there aren't any real semantics for what happens when userspace code
> runs with real_cred != cred.

What is the sematic of a simultaneous ptrace racing a credfd_activate on
another processor core ?

What are the rules for simultaneous threads doing I/O and and credential
changes ?

What is the rule for a faulting of an mmapped page in a multithreaded app
one thread of which has changed credentials ?

Who owns a file created while you are changing credentials ?

>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> don't want a privileged daemon to be ptraced while impersonating
> someone else.

That's one of the obvious problems but if you have that problem then
you've got races against signals and ptrace etc to deal with.

One way to implement it I think safely but which requires a fair bit more
work elsewhere is to apply the debug and signal type checks as

   'you may only do X if you also posess the right to do so for *all*
credentials accessible to this task'

which I think is the correct logical check.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread One Thousand Gnomes
On Wed, 26 Mar 2014 17:23:24 -0700
Andy Lutomirski l...@amacapital.net wrote:

 Hi various people who care about user-space NFS servers and/or
 security-relevant APIs.
 
 I propose the following set of new syscalls:
 
 int credfd_create(unsigned int flags): returns a new credfd that
 corresponds to current's creds.
 
 int credfd_activate(int fd, unsigned int flags): Change current's
 creds to match the creds stored in fd.  To be clear, this changes both
 the subjective and objective (aka real_cred and cred) because
 there aren't any real semantics for what happens when userspace code
 runs with real_cred != cred.

What is the sematic of a simultaneous ptrace racing a credfd_activate on
another processor core ?

What are the rules for simultaneous threads doing I/O and and credential
changes ?

What is the rule for a faulting of an mmapped page in a multithreaded app
one thread of which has changed credentials ?

Who owns a file created while you are changing credentials ?

  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
 don't want a privileged daemon to be ptraced while impersonating
 someone else.

That's one of the obvious problems but if you have that problem then
you've got races against signals and ptrace etc to deal with.

One way to implement it I think safely but which requires a fair bit more
work elsewhere is to apply the debug and signal type checks as

   'you may only do X if you also posess the right to do so for *all*
credentials accessible to this task'

which I think is the correct logical check.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Jeff Layton
On Sun, 30 Mar 2014 09:03:29 -0400
Theodore Ts'o ty...@mit.edu wrote:

 On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
  I had some time to think about this last night...
  
  While using a fd to pass around credentials is convenient, the danger
  is that it's pretty opaque. You have a fd that you know has creds
  attached to it, but it's hard to be certain what is going to change.
 
 I don't think that's a particularly tough problem.  In general, the fd
 isn't something that you would want to pass around, and so the process
 which generated it will know exactly what it contained.
 

I think there's a bit more of a use-case for passing around such an fd
via socket...

Part of the problem is that the traditional uid/gid switching glibc
wrappers are per-process. If we're proposing doing something like:

seteuid()
setegid()
setgroups()
fd = open()
(...and then revert the creds using same syscalls)

...during the time that you're doing all of that, you can't really
allow any thread in the process to be doing something that requires
_other_ creds until you've completed the above.

So, I could envision a program like ganesha firing up a separate
process to handle the credential switching and fd creation and then
handing those back to the main process via a unix domain socket.

  Perhaps we can use the flags field for that. So, assuming we have a fd
  with the creds attached, we could do something like:
  
  err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
  
  ...then the switch_creds syscall could be set up to fail if the new
  credentials had other fields that didn't match those in the current
  task credentials. So if (for instance) the cred-euid were
  different between the two, the above could fail with -EINVAL or
  something.
 
 Huh?  The whole *point* is that the creds value will be different, of
 course they won't match!  I would think this would be over
 complicating the interface.
 

You know that *some* of the credentials won't match, but presumably
some of them (e.g. the real uid/gid) should match. Using a flags field
could act as a mask of what creds are allowed to change.

 
 A couple of other things.  What I would suggest is that we create a
 few new fd flags, to join FD_CLOEXEC:
 
 FD_NOPROCFS   disallow being able to open the inode via /proc/pid/fd
   (but in the case of a creds fd, for bonus points, the
   target of the pseudo-symlink could be something like:
   uid: 15806 gid: 100: grps: 27, 50 to aid in debugging
   a userspace file server).  This also answers Jeff's concern
   if for some reason --- I don't know how --- a process
   doesn't know what the contents of the creds fd that
   it created itself.
 
 FD_NOPASSFD   disallow being able to pass the fd via a unix domain socket
 
 FD_LOCKFLAGS  if this bit is set, disallow any further changes of FD_CLOEXEC,
   FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.
 
 Some of the functionality requested by the folks suggesting the SEAL
 API would also be covered by these fd flags.
 
 In order to solve some potential race concerns, a credsfd must be
 created with FD_CLOEXEC and FD_NOPROCFS enabled.
 
 Why is this important even if the anon_inode is owned by root with a
 mode of 0?  Because if the system is set up to use SELinux or full
 Posix capabilities, merely having the a uid of 0 is not special, and
 we don't want to allow a process with uid of 0 to be able modify the
 mode with the /proc/pid/fd/FD and then proceed to open the inode
 using open.  This way, instead of adding special case code to prevent
 this from happening, we can add a more general facility which can be
 used to solve a few other problems.
 



-- 
Jeff Layton jlay...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Andy Lutomirski
On Mar 31, 2014 3:45 AM, One Thousand Gnomes
gno...@lxorguk.ukuu.org.uk wrote:

 On Wed, 26 Mar 2014 17:23:24 -0700
 Andy Lutomirski l...@amacapital.net wrote:

  Hi various people who care about user-space NFS servers and/or
  security-relevant APIs.
 
  I propose the following set of new syscalls:
 
  int credfd_create(unsigned int flags): returns a new credfd that
  corresponds to current's creds.
 
  int credfd_activate(int fd, unsigned int flags): Change current's
  creds to match the creds stored in fd.  To be clear, this changes both
  the subjective and objective (aka real_cred and cred) because
  there aren't any real semantics for what happens when userspace code
  runs with real_cred != cred.

 What is the sematic of a simultaneous ptrace racing a credfd_activate on
 another processor core ?

Exactly the same as for the raw syscall setresuid.  That is, we
require sequential consistency.  If the kernel fails to provide
sequential consistency, it's a bug.


 What are the rules for simultaneous threads doing I/O and and credential
 changes ?

None.  credfd_activate affects only the calling thread.


 What is the rule for a faulting of an mmapped page in a multithreaded app
 one thread of which has changed credentials ?

Same as in current kernels.  Note that checking credentials in
readpage, etc. is generally a bug.


 Who owns a file created while you are changing credentials ?

Can't happen.  See above.


   - credfd_activate fails (-EINVAL) if dumpable.  This is because we
  don't want a privileged daemon to be ptraced while impersonating
  someone else.

 That's one of the obvious problems but if you have that problem then
 you've got races against signals and ptrace etc to deal with.

 One way to implement it I think safely but which requires a fair bit more
 work elsewhere is to apply the debug and signal type checks as

'you may only do X if you also posess the right to do so for *all*
 credentials accessible to this task'

 which I think is the correct logical check.

Do we include credfd fds sitting in Unix sockets?

More generally, there are plenty of cases where ptracing something
might give you access to one of its fds that you shouldn't normally
have access to.  Even real file fds are like this, and they could come
from outside a namespace, they could be a result of odd selinux
policy, etc.

The only real new issue is that, while using a subsidiary credential,
you don't want people who couldn't ptrace you before to suddenly be
able to ptrace you.  This is where subjective and objective
credentials came from, but the current in-kernel implementation is
really half-baked.  Fixing it for real will probably run into all
kinds of funny corner cases involving the existing syscalls, and it
would be nice to have a way to handle this that doesn't involve
auditing every commit_creds caller.

Hmm.  What if we had initial_creds and creds, and initial_creds never
changed unless explicitly requested.  There wouldn't be any way to
revert to initial_creds, but may_ptrace would check initial_creds
*and* creds.  This is a little hacky, and it might break Android (I'm
not really sure how the zygote thing works), but it could do the
trick.

Or we can rely on dumpable, which I think can prevent unprivileged
ptracers from attaching.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Trond Myklebust

On Mar 31, 2014, at 7:51, Jeff Layton jlay...@redhat.com wrote:

 On Sun, 30 Mar 2014 09:03:29 -0400
 Theodore Ts'o ty...@mit.edu wrote:
 
 On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
 I had some time to think about this last night...
 
 While using a fd to pass around credentials is convenient, the danger
 is that it's pretty opaque. You have a fd that you know has creds
 attached to it, but it's hard to be certain what is going to change.
 
 I don't think that's a particularly tough problem.  In general, the fd
 isn't something that you would want to pass around, and so the process
 which generated it will know exactly what it contained.
 
 
 I think there's a bit more of a use-case for passing around such an fd
 via socket...
 
 Part of the problem is that the traditional uid/gid switching glibc
 wrappers are per-process. If we're proposing doing something like:
 
 seteuid()
 setegid()
 setgroups()
 fd = open()
 (...and then revert the creds using same syscalls)
 
 ...during the time that you're doing all of that, you can't really
 allow any thread in the process to be doing something that requires
 _other_ creds until you've completed the above.

Umm… open() isn’t the only operation that you want to be able to do with an 
assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty 
much any interaction with the underlying filesystem needs to use the right 
identity.

 So, I could envision a program like ganesha firing up a separate
 process to handle the credential switching and fd creation and then
 handing those back to the main process via a unix domain socket.

How about using the keyrings interface to atomically cache and retrieve these 
user identities? We already have support for different types of keys that 
store/retrieve different types of structured information. How is this so 
different?

Cheers
  Trond

_
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Jeff Layton
On Mon, 31 Mar 2014 14:06:01 -0400
Trond Myklebust trond.mykleb...@primarydata.com wrote:

 
 On Mar 31, 2014, at 7:51, Jeff Layton jlay...@redhat.com wrote:
 
  On Sun, 30 Mar 2014 09:03:29 -0400
  Theodore Ts'o ty...@mit.edu wrote:
  
  On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
  I had some time to think about this last night...
  
  While using a fd to pass around credentials is convenient, the danger
  is that it's pretty opaque. You have a fd that you know has creds
  attached to it, but it's hard to be certain what is going to change.
  
  I don't think that's a particularly tough problem.  In general, the fd
  isn't something that you would want to pass around, and so the process
  which generated it will know exactly what it contained.
  
  
  I think there's a bit more of a use-case for passing around such an fd
  via socket...
  
  Part of the problem is that the traditional uid/gid switching glibc
  wrappers are per-process. If we're proposing doing something like:
  
  seteuid()
  setegid()
  setgroups()
  fd = open()
  (...and then revert the creds using same syscalls)
  
  ...during the time that you're doing all of that, you can't really
  allow any thread in the process to be doing something that requires
  _other_ creds until you've completed the above.
 
 Umm… open() isn’t the only operation that you want to be able to do with an 
 assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty 
 much any interaction with the underlying filesystem needs to use the right 
 identity.
 

The proposal that Al originally had was to have userland set up the
credentials that it wanted to use, and then open(/dev/null, ...) to
get a fd that has those creds attached.

Then, we'd add a syscall that takes such an fd and switches to the creds
attached to it. So the open() in the above example isn't to do an actual
operation but rather to create a fd to act as a credential handle.

  So, I could envision a program like ganesha firing up a separate
  process to handle the credential switching and fd creation and then
  handing those back to the main process via a unix domain socket.
 
 How about using the keyrings interface to atomically cache and retrieve these 
 user identities? We already have support for different types of keys that 
 store/retrieve different types of structured information. How is this so 
 different?
 

That's an interesting idea. 
-- 
Jeff Layton jlay...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Jeremy Allison
On Mon, Mar 31, 2014 at 11:44:59AM +0100, One Thousand Gnomes wrote:
 On Wed, 26 Mar 2014 17:23:24 -0700
 Andy Lutomirski l...@amacapital.net wrote:
 
  Hi various people who care about user-space NFS servers and/or
  security-relevant APIs.
  
  I propose the following set of new syscalls:
  
  int credfd_create(unsigned int flags): returns a new credfd that
  corresponds to current's creds.
  
  int credfd_activate(int fd, unsigned int flags): Change current's
  creds to match the creds stored in fd.  To be clear, this changes both
  the subjective and objective (aka real_cred and cred) because
  there aren't any real semantics for what happens when userspace code
  runs with real_cred != cred.
 
 What is the sematic of a simultaneous ptrace racing a credfd_activate on
 another processor core ?
 
 What are the rules for simultaneous threads doing I/O and and credential
 changes ?
 
 What is the rule for a faulting of an mmapped page in a multithreaded app
 one thread of which has changed credentials ?
 
 Who owns a file created while you are changing credentials ?
 
   - credfd_activate fails (-EINVAL) if dumpable.  This is because we
  don't want a privileged daemon to be ptraced while impersonating
  someone else.
 
 That's one of the obvious problems but if you have that problem then
 you've got races against signals and ptrace etc to deal with.

FYI, Any process using pthreads and glibc already
has to cope with these races as setresuid on glibc
on Linux is not atomic.

That's why Samba eventually changed to using the
raw system calls on Linux due to an interesting
bug with glibc aio interacting with setresuid
races (receiving signal thread was uid 0, sending
aio thread was non-zero - signal couldn't be
delivered, glibc aio wakeup lost).

Jeremy.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Andy Lutomirski
On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
trond.mykleb...@primarydata.com wrote:

 On Mar 31, 2014, at 7:51, Jeff Layton jlay...@redhat.com wrote:

 On Sun, 30 Mar 2014 09:03:29 -0400
 Theodore Ts'o ty...@mit.edu wrote:

 On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
 I had some time to think about this last night...

 While using a fd to pass around credentials is convenient, the danger
 is that it's pretty opaque. You have a fd that you know has creds
 attached to it, but it's hard to be certain what is going to change.

 I don't think that's a particularly tough problem.  In general, the fd
 isn't something that you would want to pass around, and so the process
 which generated it will know exactly what it contained.


 I think there's a bit more of a use-case for passing around such an fd
 via socket...

 Part of the problem is that the traditional uid/gid switching glibc
 wrappers are per-process. If we're proposing doing something like:

 seteuid()
 setegid()
 setgroups()
 fd = open()
 (...and then revert the creds using same syscalls)

 ...during the time that you're doing all of that, you can't really
 allow any thread in the process to be doing something that requires
 _other_ creds until you've completed the above.

 Umm... open() isn't the only operation that you want to be able to do with an 
 assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... 
 Pretty much any interaction with the underlying filesystem needs to use the 
 right identity.

 So, I could envision a program like ganesha firing up a separate
 process to handle the credential switching and fd creation and then
 handing those back to the main process via a unix domain socket.

 How about using the keyrings interface to atomically cache and retrieve these 
 user identities? We already have support for different types of keys that 
 store/retrieve different types of structured information. How is this so 
 different?

This sounds considerably more complicated than just using fds.  What's
the advantage?

I guess using keys for local fs credentials fits in with using keys to
access things like AFS, but I'm still not sure I see why this helps
here.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Trond Myklebust

On Mar 31, 2014, at 15:26, Andy Lutomirski l...@amacapital.net wrote:

 On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
 trond.mykleb...@primarydata.com wrote:
 
 On Mar 31, 2014, at 7:51, Jeff Layton jlay...@redhat.com wrote:
 
 On Sun, 30 Mar 2014 09:03:29 -0400
 Theodore Ts'o ty...@mit.edu wrote:
 
 On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
 I had some time to think about this last night...
 
 While using a fd to pass around credentials is convenient, the danger
 is that it's pretty opaque. You have a fd that you know has creds
 attached to it, but it's hard to be certain what is going to change.
 
 I don't think that's a particularly tough problem.  In general, the fd
 isn't something that you would want to pass around, and so the process
 which generated it will know exactly what it contained.
 
 
 I think there's a bit more of a use-case for passing around such an fd
 via socket...
 
 Part of the problem is that the traditional uid/gid switching glibc
 wrappers are per-process. If we're proposing doing something like:
 
 seteuid()
 setegid()
 setgroups()
 fd = open()
 (...and then revert the creds using same syscalls)
 
 ...during the time that you're doing all of that, you can't really
 allow any thread in the process to be doing something that requires
 _other_ creds until you've completed the above.
 
 Umm... open() isn't the only operation that you want to be able to do with 
 an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... 
 Pretty much any interaction with the underlying filesystem needs to use the 
 right identity.
 
 So, I could envision a program like ganesha firing up a separate
 process to handle the credential switching and fd creation and then
 handing those back to the main process via a unix domain socket.
 
 How about using the keyrings interface to atomically cache and retrieve 
 these user identities? We already have support for different types of keys 
 that store/retrieve different types of structured information. How is this 
 so different?
 
 This sounds considerably more complicated than just using fds.  What's
 the advantage?

The advantage is that it’s considerably _less_ complicated because it uses 
interfaces that were designed to carry security related information, and to 
share them across threads.

 I guess using keys for local fs credentials fits in with using keys to
 access things like AFS, but I'm still not sure I see why this helps
 here.

All you want to do is atomically store and retrieve a user identity (in 
practice a credential) and share it between members of a thread. As I said, 
that kind of storage/retrieval of structured data is what keyrings do. As far 
as I know, there are already LSM interfaces in place (see 
security_key_alloc/permission/free), and the generic keyring/keyctl interface 
exists with appropriate process sharing/inheritance rules via the kernel 
keyring interface.

So basically, the recipe is:

- You set up a kernel key type that takes a reference to the current process 
credential on instantiation (no upcalls, etc needed). Ganesha can then use 
standard libkeyutils methods to create the key and store it in its user, 
process or thread keyring.
- You add a new keyctl (KEYCTL_APPLY?) that enables any thread/process that has 
access to the key, via the keyring in which it is stored, to apply the stored 
process credential to itself (and perhaps cache it’s old credential in a new 
key?).


_
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Andy Lutomirski
On Mon, Mar 31, 2014 at 1:14 PM, Trond Myklebust
trond.mykleb...@primarydata.com wrote:

 On Mar 31, 2014, at 15:26, Andy Lutomirski l...@amacapital.net wrote:

 On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
 trond.mykleb...@primarydata.com wrote:

 On Mar 31, 2014, at 7:51, Jeff Layton jlay...@redhat.com wrote:

 On Sun, 30 Mar 2014 09:03:29 -0400
 Theodore Ts'o ty...@mit.edu wrote:

 On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
 I had some time to think about this last night...

 While using a fd to pass around credentials is convenient, the danger
 is that it's pretty opaque. You have a fd that you know has creds
 attached to it, but it's hard to be certain what is going to change.

 I don't think that's a particularly tough problem.  In general, the fd
 isn't something that you would want to pass around, and so the process
 which generated it will know exactly what it contained.


 I think there's a bit more of a use-case for passing around such an fd
 via socket...

 Part of the problem is that the traditional uid/gid switching glibc
 wrappers are per-process. If we're proposing doing something like:

 seteuid()
 setegid()
 setgroups()
 fd = open()
 (...and then revert the creds using same syscalls)

 ...during the time that you're doing all of that, you can't really
 allow any thread in the process to be doing something that requires
 _other_ creds until you've completed the above.

 Umm... open() isn't the only operation that you want to be able to do with 
 an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... 
 Pretty much any interaction with the underlying filesystem needs to use the 
 right identity.

 So, I could envision a program like ganesha firing up a separate
 process to handle the credential switching and fd creation and then
 handing those back to the main process via a unix domain socket.

 How about using the keyrings interface to atomically cache and retrieve 
 these user identities? We already have support for different types of keys 
 that store/retrieve different types of structured information. How is this 
 so different?

 This sounds considerably more complicated than just using fds.  What's
 the advantage?

 The advantage is that it's considerably _less_ complicated because it uses 
 interfaces that were designed to carry security related information, and to 
 share them across threads.

 I guess using keys for local fs credentials fits in with using keys to
 access things like AFS, but I'm still not sure I see why this helps
 here.

 All you want to do is atomically store and retrieve a user identity (in 
 practice a credential) and share it between members of a thread. As I said, 
 that kind of storage/retrieval of structured data is what keyrings do. As far 
 as I know, there are already LSM interfaces in place (see 
 security_key_alloc/permission/free), and the generic keyring/keyctl interface 
 exists with appropriate process sharing/inheritance rules via the kernel 
 keyring interface.

 So basically, the recipe is:

 - You set up a kernel key type that takes a reference to the current process 
 credential on instantiation (no upcalls, etc needed). Ganesha can then use 
 standard libkeyutils methods to create the key and store it in its user, 
 process or thread keyring.
 - You add a new keyctl (KEYCTL_APPLY?) that enables any thread/process that 
 has access to the key, via the keyring in which it is stored, to apply the 
 stored process credential to itself (and perhaps cache it's old credential in 
 a new key?).

I can see pros and cons of using keys instead of fds:

Pros:
 - You can change may_ptrace to check that you can legally ptrace any
creds in the keyring.  (It's debatable how useful this is.)
 - You could plausibly stick subsidiary uids into a user keyring and
use them for uid_map, removing the need for newuidmap.

Cons:
 - You can't send them via SCM_RIGHTS.  I still think that the ability
to do that is extremely useful.

Neutral: What happens if the filesystem you're trying to share
requires keys?  I don't think that credfd or credkeys can really do
this.  I suppose that if the keys live in the user keyring
corresponding to the user being impersonated, everything just works
regardless of how this is implemented.

Cue brain explosion: thread_keyring lives inside struct cred.  I
suspect that any sane implementation of credential switching will need
to change that.  Sigh.


Anyway, the actual parts that seem like they need solving one way or
another are:

 - The keyring pointers in struct cred are a problem.  (Solved with
credkeys, but only if credkeys can only live on the thread keyring.
Also, there will be a problem with circular references.)

 - may_ptrace and all its users (e.g. procfs).  Solved by
real_creds, but no one has really defined the semantics of real_creds,
since it's not currently available to userspace.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a 

Re: Thoughts on credential switching

2014-03-30 Thread Andy Lutomirski
On Sun, Mar 30, 2014 at 6:03 AM, Theodore Ts'o  wrote:
> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>> I had some time to think about this last night...
>>
>> While using a fd to pass around credentials is convenient, the danger
>> is that it's pretty opaque. You have a fd that you know has creds
>> attached to it, but it's hard to be certain what is going to change.
>
> I don't think that's a particularly tough problem.  In general, the fd
> isn't something that you would want to pass around, and so the process
> which generated it will know exactly what it contained.
>
>> Perhaps we can use the flags field for that. So, assuming we have a fd
>> with the creds attached, we could do something like:
>>
>> err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
>>
>> ...then the switch_creds syscall could be set up to fail if the new
>> credentials had other fields that didn't match those in the current
>> task credentials. So if (for instance) the cred->euid were
>> different between the two, the above could fail with -EINVAL or
>> something.
>
> Huh?  The whole *point* is that the creds value will be different, of
> course they won't match!  I would think this would be over
> complicating the interface.
>
>
> A couple of other things.  What I would suggest is that we create a
> few new fd flags, to join FD_CLOEXEC:
>
> FD_NOPROCFS disallow being able to open the inode via /proc//fd
> (but in the case of a creds fd, for bonus points, the
> target of the pseudo-symlink could be something like:
> "uid: 15806 gid: 100: grps: 27, 50" to aid in debugging
> a userspace file server).  This also answers Jeff's concern
> if for some reason --- I don't know how --- a process
> doesn't know what the contents of the creds fd that
> it created itself.
>
> FD_NOPASSFD disallow being able to pass the fd via a unix domain socket
>
> FD_LOCKFLAGSif this bit is set, disallow any further changes of 
> FD_CLOEXEC,
> FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.
>
> Some of the functionality requested by the folks suggesting the "SEAL"
> API would also be covered by these fd flags.
>
> In order to solve some potential race concerns, a credsfd must be
> created with FD_CLOEXEC and FD_NOPROCFS enabled.

It might be nice to try to coordinate this with the Capsicum people,
who are doing something along these lines.  They even have tentative
Linux patches.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-30 Thread Theodore Ts'o
On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
> I had some time to think about this last night...
> 
> While using a fd to pass around credentials is convenient, the danger
> is that it's pretty opaque. You have a fd that you know has creds
> attached to it, but it's hard to be certain what is going to change.

I don't think that's a particularly tough problem.  In general, the fd
isn't something that you would want to pass around, and so the process
which generated it will know exactly what it contained.

> Perhaps we can use the flags field for that. So, assuming we have a fd
> with the creds attached, we could do something like:
> 
> err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
> 
> ...then the switch_creds syscall could be set up to fail if the new
> credentials had other fields that didn't match those in the current
> task credentials. So if (for instance) the cred->euid were
> different between the two, the above could fail with -EINVAL or
> something.

Huh?  The whole *point* is that the creds value will be different, of
course they won't match!  I would think this would be over
complicating the interface.


A couple of other things.  What I would suggest is that we create a
few new fd flags, to join FD_CLOEXEC:

FD_NOPROCFS disallow being able to open the inode via /proc//fd
(but in the case of a creds fd, for bonus points, the
target of the pseudo-symlink could be something like:
"uid: 15806 gid: 100: grps: 27, 50" to aid in debugging
a userspace file server).  This also answers Jeff's concern
if for some reason --- I don't know how --- a process
doesn't know what the contents of the creds fd that
it created itself.

FD_NOPASSFD disallow being able to pass the fd via a unix domain socket

FD_LOCKFLAGSif this bit is set, disallow any further changes of FD_CLOEXEC,
FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.

Some of the functionality requested by the folks suggesting the "SEAL"
API would also be covered by these fd flags.

In order to solve some potential race concerns, a credsfd must be
created with FD_CLOEXEC and FD_NOPROCFS enabled.

Why is this important even if the anon_inode is owned by root with a
mode of 0?  Because if the system is set up to use SELinux or full
Posix capabilities, merely having the a uid of 0 is not special, and
we don't want to allow a process with uid of 0 to be able modify the
mode with the /proc//fd/ and then proceed to open the inode
using open.  This way, instead of adding special case code to prevent
this from happening, we can add a more general facility which can be
used to solve a few other problems.

Regards,

- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-30 Thread Theodore Ts'o
On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
 I had some time to think about this last night...
 
 While using a fd to pass around credentials is convenient, the danger
 is that it's pretty opaque. You have a fd that you know has creds
 attached to it, but it's hard to be certain what is going to change.

I don't think that's a particularly tough problem.  In general, the fd
isn't something that you would want to pass around, and so the process
which generated it will know exactly what it contained.

 Perhaps we can use the flags field for that. So, assuming we have a fd
 with the creds attached, we could do something like:
 
 err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
 
 ...then the switch_creds syscall could be set up to fail if the new
 credentials had other fields that didn't match those in the current
 task credentials. So if (for instance) the cred-euid were
 different between the two, the above could fail with -EINVAL or
 something.

Huh?  The whole *point* is that the creds value will be different, of
course they won't match!  I would think this would be over
complicating the interface.


A couple of other things.  What I would suggest is that we create a
few new fd flags, to join FD_CLOEXEC:

FD_NOPROCFS disallow being able to open the inode via /proc/pid/fd
(but in the case of a creds fd, for bonus points, the
target of the pseudo-symlink could be something like:
uid: 15806 gid: 100: grps: 27, 50 to aid in debugging
a userspace file server).  This also answers Jeff's concern
if for some reason --- I don't know how --- a process
doesn't know what the contents of the creds fd that
it created itself.

FD_NOPASSFD disallow being able to pass the fd via a unix domain socket

FD_LOCKFLAGSif this bit is set, disallow any further changes of FD_CLOEXEC,
FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.

Some of the functionality requested by the folks suggesting the SEAL
API would also be covered by these fd flags.

In order to solve some potential race concerns, a credsfd must be
created with FD_CLOEXEC and FD_NOPROCFS enabled.

Why is this important even if the anon_inode is owned by root with a
mode of 0?  Because if the system is set up to use SELinux or full
Posix capabilities, merely having the a uid of 0 is not special, and
we don't want to allow a process with uid of 0 to be able modify the
mode with the /proc/pid/fd/FD and then proceed to open the inode
using open.  This way, instead of adding special case code to prevent
this from happening, we can add a more general facility which can be
used to solve a few other problems.

Regards,

- Ted

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-30 Thread Andy Lutomirski
On Sun, Mar 30, 2014 at 6:03 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
 I had some time to think about this last night...

 While using a fd to pass around credentials is convenient, the danger
 is that it's pretty opaque. You have a fd that you know has creds
 attached to it, but it's hard to be certain what is going to change.

 I don't think that's a particularly tough problem.  In general, the fd
 isn't something that you would want to pass around, and so the process
 which generated it will know exactly what it contained.

 Perhaps we can use the flags field for that. So, assuming we have a fd
 with the creds attached, we could do something like:

 err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);

 ...then the switch_creds syscall could be set up to fail if the new
 credentials had other fields that didn't match those in the current
 task credentials. So if (for instance) the cred-euid were
 different between the two, the above could fail with -EINVAL or
 something.

 Huh?  The whole *point* is that the creds value will be different, of
 course they won't match!  I would think this would be over
 complicating the interface.


 A couple of other things.  What I would suggest is that we create a
 few new fd flags, to join FD_CLOEXEC:

 FD_NOPROCFS disallow being able to open the inode via /proc/pid/fd
 (but in the case of a creds fd, for bonus points, the
 target of the pseudo-symlink could be something like:
 uid: 15806 gid: 100: grps: 27, 50 to aid in debugging
 a userspace file server).  This also answers Jeff's concern
 if for some reason --- I don't know how --- a process
 doesn't know what the contents of the creds fd that
 it created itself.

 FD_NOPASSFD disallow being able to pass the fd via a unix domain socket

 FD_LOCKFLAGSif this bit is set, disallow any further changes of 
 FD_CLOEXEC,
 FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.

 Some of the functionality requested by the folks suggesting the SEAL
 API would also be covered by these fd flags.

 In order to solve some potential race concerns, a credsfd must be
 created with FD_CLOEXEC and FD_NOPROCFS enabled.

It might be nice to try to coordinate this with the Capsicum people,
who are doing something along these lines.  They even have tentative
Linux patches.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-29 Thread Alex Elsayed
Jeff Layton wrote:

> On Wed, 26 Mar 2014 20:25:35 -0700
> Jeff Layton  wrote:
> 
>> On Wed, 26 Mar 2014 20:05:16 -0700
>> Andy Lutomirski  wrote:
>> 
>> > On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton 
>> > wrote:
>> > > On Wed, 26 Mar 2014 17:23:24 -0700
>> > > Andy Lutomirski  wrote:
>> > >
>> > >> Hi various people who care about user-space NFS servers and/or
>> > >> security-relevant APIs.
>> > >>
>> > >> I propose the following set of new syscalls:
>> > >>
>> > >> int credfd_create(unsigned int flags): returns a new credfd that
>> > >> corresponds to current's creds.
>> > >>
>> > >> int credfd_activate(int fd, unsigned int flags): Change current's
>> > >> creds to match the creds stored in fd.  To be clear, this changes
>> > >> both the "subjective" and "objective" (aka real_cred and cred)
>> > >> because there aren't any real semantics for what happens when
>> > >> userspace code runs with real_cred != cred.
>> > >>
>> > >> Rules:
>> > >>
>> > >>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>> > >>  - credfd_activate fails (-EPERM) if the fd's userns doesn't
>> > >> match current's userns.  credfd_activate is not intended to be a
>> > >> substitute for setns.
>> > >>  - credfd_activate will fail (-EPERM) if LSM does not allow the
>> > >> switch.  This probably needs to be a new selinux action --
>> > >> dyntransition is too restrictive.
>> > >>
>> > >>
>> > >> Optional:
>> > >>  - credfd_create always sets cloexec, because the alternative is
>> > >> silly.
>> > >>  - credfd_activate fails (-EINVAL) if dumpable.  This is because
>> > >> we don't want a privileged daemon to be ptraced while
>> > >> impersonating someone else.
>> > >>  - optional: both credfd_create and credfd_activate fail if
>> > >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
>> > >>
>> > >> The first question: does this solve Ganesha's problem?
>> > >>
>> > >> The second question: is this safe?  I can see two major concerns.
>> > >> The bigger concern is that having these syscalls available will
>> > >> allow users to exploit things that were previously secure.  For
>> > >> example, maybe some configuration assumes that a task running as
>> > >> uid==1 can't switch to uid==2, even with uid 2's consent.
>> > >> Similar issues happen with capabilities.  If CAP_SYS_ADMIN is not
>> > >> required, then this is no longer really true.
>> > >>
>> > >> Alternatively, something running as uid == 0 with heavy
>> > >> capability restrictions in a mount namespace (but not a uid
>> > >> namespace) could pass a credfd out of the namespace.  This could
>> > >> break things like Docker pretty badly.  CAP_SYS_ADMIN guards
>> > >> against this to some extent.  But I think that Docker is already
>> > >> totally screwed if a Docker root task can receive an O_DIRECTORY
>> > >> or O_PATH fd out of the container, so it's not entirely clear
>> > >> that the situation is any worse, even without requiring
>> > >> CAP_SYS_ADMIN.
>> > >>
>> > >> The second concern is that it may be difficult to use this
>> > >> correctly. There's a reason that real_cred and cred exist, but
>> > >> it's not really well set up for being used.
>> > >>
>> > >> As a simple way to stay safe, Ganesha could only use credfds that
>> > >> have real_uid == 0.
>> > >>
>> > >> --Andy
>> > >
>> > >
>> > > I still don't quite grok why having this special credfd_create
>> > > call buys you anything over simply doing what Al had originally
>> > > suggested -- switch creds using all of the different syscalls and
>> > > then simply caching that in a "normal" fd:
>> > >
>> > > fd = open("/dev/null", O_PATH...);
>> > >
>> > > ...it seems to me that the credfd_activate call will still need to
>> > > do the same permission checking that all of the individual
>> > > set*id() calls require (and all of the other stuff like changing
>> > > selinux contexts, etc).
>> > >
>> > > IOW, this fd is just a "handle" for passing around a struct cred,
>> > > but I don't see why having access to that handle would allow you
>> > > to do something you couldn't already do anyway.
>> > >
>> > > Am I missing something obvious here?
>> > 
>> > Not really.  I think I didn't adequately explain a piece of this.
>> > 
>> > I think that what you're suggesting is for an fd to encode a set of
>> > credentials but not to grant permission to use those credentials.
>> > So switch_creds(fd) is more or less the same thing as
>> > switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac
>> > label, ...).  switch_creds needs to verify that the caller can
>> > dyntransition to the label, set all the ids, etc., but it avoids
>> > allocating anything and running RCU callbacks.
>> > 
>> > The trouble with this is that the verification needed is complicated
>> > and expensive.  And I think that my proposal is potentially more
>> > useful.
>> > 
>> 
>> Is it really though? My understanding of the problem was that it was
>> the syscall (context switching) overhead + having to do a bunch of RCU
>> critical 

Re: Thoughts on credential switching

2014-03-29 Thread Alex Elsayed
Jeff Layton wrote:

 On Wed, 26 Mar 2014 20:25:35 -0700
 Jeff Layton jlay...@redhat.com wrote:
 
 On Wed, 26 Mar 2014 20:05:16 -0700
 Andy Lutomirski l...@amacapital.net wrote:
 
  On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton jlay...@redhat.com
  wrote:
   On Wed, 26 Mar 2014 17:23:24 -0700
   Andy Lutomirski l...@amacapital.net wrote:
  
   Hi various people who care about user-space NFS servers and/or
   security-relevant APIs.
  
   I propose the following set of new syscalls:
  
   int credfd_create(unsigned int flags): returns a new credfd that
   corresponds to current's creds.
  
   int credfd_activate(int fd, unsigned int flags): Change current's
   creds to match the creds stored in fd.  To be clear, this changes
   both the subjective and objective (aka real_cred and cred)
   because there aren't any real semantics for what happens when
   userspace code runs with real_cred != cred.
  
   Rules:
  
- credfd_activate fails (-EINVAL) if fd is not a credfd.
- credfd_activate fails (-EPERM) if the fd's userns doesn't
   match current's userns.  credfd_activate is not intended to be a
   substitute for setns.
- credfd_activate will fail (-EPERM) if LSM does not allow the
   switch.  This probably needs to be a new selinux action --
   dyntransition is too restrictive.
  
  
   Optional:
- credfd_create always sets cloexec, because the alternative is
   silly.
- credfd_activate fails (-EINVAL) if dumpable.  This is because
   we don't want a privileged daemon to be ptraced while
   impersonating someone else.
- optional: both credfd_create and credfd_activate fail if
   !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
  
   The first question: does this solve Ganesha's problem?
  
   The second question: is this safe?  I can see two major concerns.
   The bigger concern is that having these syscalls available will
   allow users to exploit things that were previously secure.  For
   example, maybe some configuration assumes that a task running as
   uid==1 can't switch to uid==2, even with uid 2's consent.
   Similar issues happen with capabilities.  If CAP_SYS_ADMIN is not
   required, then this is no longer really true.
  
   Alternatively, something running as uid == 0 with heavy
   capability restrictions in a mount namespace (but not a uid
   namespace) could pass a credfd out of the namespace.  This could
   break things like Docker pretty badly.  CAP_SYS_ADMIN guards
   against this to some extent.  But I think that Docker is already
   totally screwed if a Docker root task can receive an O_DIRECTORY
   or O_PATH fd out of the container, so it's not entirely clear
   that the situation is any worse, even without requiring
   CAP_SYS_ADMIN.
  
   The second concern is that it may be difficult to use this
   correctly. There's a reason that real_cred and cred exist, but
   it's not really well set up for being used.
  
   As a simple way to stay safe, Ganesha could only use credfds that
   have real_uid == 0.
  
   --Andy
  
  
   I still don't quite grok why having this special credfd_create
   call buys you anything over simply doing what Al had originally
   suggested -- switch creds using all of the different syscalls and
   then simply caching that in a normal fd:
  
   fd = open(/dev/null, O_PATH...);
  
   ...it seems to me that the credfd_activate call will still need to
   do the same permission checking that all of the individual
   set*id() calls require (and all of the other stuff like changing
   selinux contexts, etc).
  
   IOW, this fd is just a handle for passing around a struct cred,
   but I don't see why having access to that handle would allow you
   to do something you couldn't already do anyway.
  
   Am I missing something obvious here?
  
  Not really.  I think I didn't adequately explain a piece of this.
  
  I think that what you're suggesting is for an fd to encode a set of
  credentials but not to grant permission to use those credentials.
  So switch_creds(fd) is more or less the same thing as
  switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac
  label, ...).  switch_creds needs to verify that the caller can
  dyntransition to the label, set all the ids, etc., but it avoids
  allocating anything and running RCU callbacks.
  
  The trouble with this is that the verification needed is complicated
  and expensive.  And I think that my proposal is potentially more
  useful.
  
 
 Is it really though? My understanding of the problem was that it was
 the syscall (context switching) overhead + having to do a bunch of RCU
 critical stuff that was the problem. If we can do all of this in the
 context of a single RCU critical section, isn't that still a win?
 
 As to the complicated part...maybe but it doesn't seem like it would
 have to be. We could simply return -EINVAL or something if the old
 struct cred doesn't have fields that match the ones we're replacing
 and that we don't expect to see changed.
 
  A credfd is like a 

Re: Re: Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 1:47 PM, Jim Lieb  wrote:
> On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote:
>> On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb  wrote:
>> > Rather than inline, I'm responding in the context of Jeremy's comments but
>> > I have to answer others as well.  It is Jeremy after all who said my baby
>> > was ugly ;).
>> >
>> > Jeremy is right about overloading "fd".  Maybe I can call it something
>> > else
>> > but an fd (in implementation) has merit because a creds struct hangs off
>> > of
>> > either/or/both task_struct and file but nothing else.  Coming up with a
>> > creds token struct adds/intros another struct to the kernel that has to
>> > justify its existence.  This fd points to an anon inode so the resource
>> > consumption is pretty low and all the paths for managing it are well
>> > known and *work*.  I'm trying to get across the swamp, not build a new
>> > Golden Gate bridge...
>> >
>> > As for POSIX, all of the pthreads API was based on CDE threads whose
>> > initial implementation was on Ultrix (BSD 4.2).  Process wide was assumed
>> > because utheeads was a user space hack and setuid was process wide
>> > because a proc was a just that.  Good grief, that kernel was UP...  When
>> > OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its
>> > proc + thread(s) model to be compatible with the old world.  In other
>> > words, we incrementally backed ourselves into a corner.  Declaring POSIX
>> > now avoids admitting that we didn't see all that far into the future.
>> > Enough said.  These calls are *outside* POSIX.  Pthreads in 2014 is
>> > broken/obsolete.
>> >
>> > For the interface, I changed it from what is in the cited lkml below.   It
>> > is>
>> > now:
>> >int switch_creds(struct user_creds *creds);
>>
>> What is struct user_creds?  And why is this called switch_creds?  It
>> doesn't switch anything.
>
> Sorry, it was in the cited lkml... It is:
>
> struct user_creds {
> uid_t fsuid;
> gid_t fsgid;
> int ngroups;
> gid_t altgroups[];
> };
>
> ngroups is limited by NGROUPS.
>
> It is called switch_creds because it does switch them to the contents of
> creds.  When you return from switch_creds "bob" your fsuid is "Bob".  The
> return value is a handle to those creds so the next time you need to be "Bob"
> you can just:
>
>  use_creds(bob_creds);
>
>>
>> > Behavior is the same except the mux'd syscall idea is gone.  Adding a
>> > flags arg to this is a useful idea both for current use and future
>> > proofing the API.  But this requires a second call
>> >
>> >int use_creds(int fd);
>> >
>> > This call does the "use an fd" case but adds -1 to revert to real creds.
>> > Its guts are either an override_creds(fd->file->creds) or a
>> > revert_creds().  Nice and quick.  Note that use_creds(fd) is used if I
>> > have cached the fd/token from switch_creds().  Also nice and quick.
>> >
>> > Given that I've done the checking in switch_creds and I don't pass
>> > anything
>> > back other than the token/fd and this token/fd is/will be confined to the
>> > thread group, use_creds(fd) only needs to validate that it is a
>> > switch_creds one, not from an arbitrary open().  I do this.
>>
>> Not so fast...  what if you start privileged, create a cred fd, call
>> unshare, do a dyntransition, chroot, drop privileges, and call
>> use_creds?  I don't immediately see why this is insecure, but having
>> it be secure seems to be more or less the same condition as having my
>> credfd_activate be secure.
>>
>> And I still don't see why you need revert at all.  Just create a
>> second token/fd/whatever representing your initial creds and switch to
>> that.
>
> The creds you get have also had the capabilities reduced.  You can't do a
> chroot because it will EPERM.  Same with a lot of others.  This is a
> restricted file access jail in itself.  We *reduce* privs+caps here.

Why not just leave the capabilities as-is?  The caller will have to
remove CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH from the effective set
first, but that's okay.

>
> The use_creds has a shortcut of -1 to revert to real creds.  So if you do a
> series of these, how do you get back to your priv'd state?  You can either
> keep track of that yourself or just pass in -1 to use_creds() and know you
> are...

int fully_privileged_cred_fd;

>>
>> > I have focused this down to "fsuid" because I intend this for ops that
>> > file
>> > perms.  Other stuff is out of scope unless someone can come up with a use
>> > case and add flag defs...  The other variations on the theme uid, euid,
>> > and that other one I don't understand the use for, are for long lasting
>> > creds change, i.e. become user "bob" and go off an fork a shell...  I am
>> > wrapping I/O.
>> Isn't there euid for that?
>
> Right.  A crappy interface given that creds are really more than just euid or
> even egid but it does that deed just fine.  It is for long term as in "become
> Dave and run this 

Re: Re: Re: Thoughts on credential switching

2014-03-27 Thread Jim Lieb
On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote:
> On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb  wrote:
> > Rather than inline, I'm responding in the context of Jeremy's comments but
> > I have to answer others as well.  It is Jeremy after all who said my baby
> > was ugly ;).
> > 
> > Jeremy is right about overloading "fd".  Maybe I can call it something
> > else
> > but an fd (in implementation) has merit because a creds struct hangs off
> > of
> > either/or/both task_struct and file but nothing else.  Coming up with a
> > creds token struct adds/intros another struct to the kernel that has to
> > justify its existence.  This fd points to an anon inode so the resource
> > consumption is pretty low and all the paths for managing it are well
> > known and *work*.  I'm trying to get across the swamp, not build a new
> > Golden Gate bridge...
> > 
> > As for POSIX, all of the pthreads API was based on CDE threads whose
> > initial implementation was on Ultrix (BSD 4.2).  Process wide was assumed
> > because utheeads was a user space hack and setuid was process wide
> > because a proc was a just that.  Good grief, that kernel was UP...  When
> > OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its
> > proc + thread(s) model to be compatible with the old world.  In other
> > words, we incrementally backed ourselves into a corner.  Declaring POSIX
> > now avoids admitting that we didn't see all that far into the future. 
> > Enough said.  These calls are *outside* POSIX.  Pthreads in 2014 is
> > broken/obsolete.
> > 
> > For the interface, I changed it from what is in the cited lkml below.   It
> > is> 
> > now:
> >int switch_creds(struct user_creds *creds);
> 
> What is struct user_creds?  And why is this called switch_creds?  It
> doesn't switch anything.

Sorry, it was in the cited lkml... It is:

struct user_creds {
uid_t fsuid;
gid_t fsgid;
int ngroups;
gid_t altgroups[];
};

ngroups is limited by NGROUPS.

It is called switch_creds because it does switch them to the contents of 
creds.  When you return from switch_creds "bob" your fsuid is "Bob".  The 
return value is a handle to those creds so the next time you need to be "Bob" 
you can just:

 use_creds(bob_creds);

> 
> > Behavior is the same except the mux'd syscall idea is gone.  Adding a
> > flags arg to this is a useful idea both for current use and future
> > proofing the API.  But this requires a second call
> > 
> >int use_creds(int fd);
> > 
> > This call does the "use an fd" case but adds -1 to revert to real creds. 
> > Its guts are either an override_creds(fd->file->creds) or a
> > revert_creds().  Nice and quick.  Note that use_creds(fd) is used if I
> > have cached the fd/token from switch_creds().  Also nice and quick.
> > 
> > Given that I've done the checking in switch_creds and I don't pass
> > anything
> > back other than the token/fd and this token/fd is/will be confined to the
> > thread group, use_creds(fd) only needs to validate that it is a
> > switch_creds one, not from an arbitrary open().  I do this.
> 
> Not so fast...  what if you start privileged, create a cred fd, call
> unshare, do a dyntransition, chroot, drop privileges, and call
> use_creds?  I don't immediately see why this is insecure, but having
> it be secure seems to be more or less the same condition as having my
> credfd_activate be secure.
> 
> And I still don't see why you need revert at all.  Just create a
> second token/fd/whatever representing your initial creds and switch to
> that.

The creds you get have also had the capabilities reduced.  You can't do a 
chroot because it will EPERM.  Same with a lot of others.  This is a 
restricted file access jail in itself.  We *reduce* privs+caps here.

The use_creds has a shortcut of -1 to revert to real creds.  So if you do a 
series of these, how do you get back to your priv'd state?  You can either 
keep track of that yourself or just pass in -1 to use_creds() and know you 
are...
> 
> > I have focused this down to "fsuid" because I intend this for ops that
> > file
> > perms.  Other stuff is out of scope unless someone can come up with a use
> > case and add flag defs...  The other variations on the theme uid, euid,
> > and that other one I don't understand the use for, are for long lasting
> > creds change, i.e. become user "bob" and go off an fork a shell...  I am
> > wrapping I/O.
> Isn't there euid for that?

Right.  A crappy interface given that creds are really more than just euid or 
even egid but it does that deed just fine.  It is for long term as in "become 
Dave and run this program..."  I don't change that.  I'm dealing with "As Bob, 
do a pwrite(s)+fstat" across N cores and M ops per core...

> 
> > I do not like the idea of spinning off a separate proc to do creds work. 
> > It doesn't buy anything in performance (everybody is a task to the
> > kernel) but it does open a door to scary places.  Jeremy and I agree that
> > this 

Re: Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb  wrote:
> Rather than inline, I'm responding in the context of Jeremy's comments but I
> have to answer others as well.  It is Jeremy after all who said my baby was
> ugly ;).
>
> Jeremy is right about overloading "fd".  Maybe I can call it something else
> but an fd (in implementation) has merit because a creds struct hangs off of
> either/or/both task_struct and file but nothing else.  Coming up with a creds
> token struct adds/intros another struct to the kernel that has to justify its
> existence.  This fd points to an anon inode so the resource consumption is
> pretty low and all the paths for managing it are well known and *work*.  I'm
> trying to get across the swamp, not build a new Golden Gate bridge...
>
> As for POSIX, all of the pthreads API was based on CDE threads whose initial
> implementation was on Ultrix (BSD 4.2).  Process wide was assumed because
> utheeads was a user space hack and setuid was process wide because a proc was
> a just that.  Good grief, that kernel was UP...  When OSF/1 appeared, the Mach
> 2.5 kernel just carried that forward with its proc + thread(s) model to be
> compatible with the old world.  In other words, we incrementally backed
> ourselves into a corner.  Declaring POSIX now avoids admitting that we didn't
> see all that far into the future.  Enough said.  These calls are *outside*
> POSIX.  Pthreads in 2014 is broken/obsolete.
>
> For the interface, I changed it from what is in the cited lkml below.   It is
> now:
>
>int switch_creds(struct user_creds *creds);

What is struct user_creds?  And why is this called switch_creds?  It
doesn't switch anything.

>
> Behavior is the same except the mux'd syscall idea is gone.  Adding a flags 
> arg
> to this is a useful idea both for current use and future proofing the API.  
> But
> this requires a second call
>
>int use_creds(int fd);
>
> This call does the "use an fd" case but adds -1 to revert to real creds.  Its
> guts are either an override_creds(fd->file->creds) or a revert_creds().  Nice
> and quick.  Note that use_creds(fd) is used if I have cached the fd/token from
> switch_creds().  Also nice and quick.
>
> Given that I've done the checking in switch_creds and I don't pass anything
> back other than the token/fd and this token/fd is/will be confined to the
> thread group, use_creds(fd) only needs to validate that it is a switch_creds
> one, not from an arbitrary open().  I do this.

Not so fast...  what if you start privileged, create a cred fd, call
unshare, do a dyntransition, chroot, drop privileges, and call
use_creds?  I don't immediately see why this is insecure, but having
it be secure seems to be more or less the same condition as having my
credfd_activate be secure.

And I still don't see why you need revert at all.  Just create a
second token/fd/whatever representing your initial creds and switch to
that.

>
> I have focused this down to "fsuid" because I intend this for ops that file
> perms.  Other stuff is out of scope unless someone can come up with a use case
> and add flag defs...  The other variations on the theme uid, euid, and that
> other one I don't understand the use for, are for long lasting creds change,
> i.e. become user "bob" and go off an fork a shell...  I am wrapping I/O.

Isn't there euid for that?

>
> I do not like the idea of spinning off a separate proc to do creds work.  It
> doesn't buy anything in performance (everybody is a task to the kernel) but it
> does open a door to scary places.  Jeremy and I agree that this token/fd must
> stay within the thread group, aka, process.  I have already (in the newer
> patchset) tied off inheritance by opening the anon fd with close-on-exec.  I
> think I tied off passing the fd thru a unix socket but if not, I will in my
> next submission.  This fd/token should stay within the thread group, period.

Maybe I'm uniquely not scared of adding sane interfaces.  setuid(2) is
insane.  Impersonating a token is quite sane and even has lots of
prior art.

>
> As to the "get an fd and then do set*id", you have to do this twice because
> that fd gets the creds at the time of open, not after fooling around.  I am
> trying to avoid multiple RCU cycles, not add more.  Second, the above path
> makes the creds switch atomic because I use the creds swapping infrastructure.
> Popping back up to user space before that *all* happens opens all kinds of
> ptrace+signal+??? holes.

I assume you're planning on caching these things.  So spending some
cycles setting this thing up shouldn't matter much.

If you want to add a totally separate syscall
setresfsuidgidgroupscaps, be my guest :)  It would actually be
generally useful.

>
> Note also that I mask caps as well in the override creds including the caps to
> do things like set*id.  That is intentional.  This whole idea is to constrain
> the thread, not open a door *but* still provide a way to get back home
> (safely).  That is via use_creds(-1).
>
> Some 

Re: Re: Thoughts on credential switching

2014-03-27 Thread Jim Lieb
Rather than inline, I'm responding in the context of Jeremy's comments but I 
have to answer others as well.  It is Jeremy after all who said my baby was 
ugly ;).

Jeremy is right about overloading "fd".  Maybe I can call it something else 
but an fd (in implementation) has merit because a creds struct hangs off of 
either/or/both task_struct and file but nothing else.  Coming up with a creds 
token struct adds/intros another struct to the kernel that has to justify its 
existence.  This fd points to an anon inode so the resource consumption is 
pretty low and all the paths for managing it are well known and *work*.  I'm 
trying to get across the swamp, not build a new Golden Gate bridge...

As for POSIX, all of the pthreads API was based on CDE threads whose initial 
implementation was on Ultrix (BSD 4.2).  Process wide was assumed because 
utheeads was a user space hack and setuid was process wide because a proc was 
a just that.  Good grief, that kernel was UP...  When OSF/1 appeared, the Mach 
2.5 kernel just carried that forward with its proc + thread(s) model to be 
compatible with the old world.  In other words, we incrementally backed 
ourselves into a corner.  Declaring POSIX now avoids admitting that we didn't 
see all that far into the future.  Enough said.  These calls are *outside* 
POSIX.  Pthreads in 2014 is broken/obsolete.

For the interface, I changed it from what is in the cited lkml below.   It is 
now:

   int switch_creds(struct user_creds *creds);

Behavior is the same except the mux'd syscall idea is gone.  Adding a flags arg 
to this is a useful idea both for current use and future proofing the API.  But 
this requires a second call

   int use_creds(int fd);

This call does the "use an fd" case but adds -1 to revert to real creds.  Its 
guts are either an override_creds(fd->file->creds) or a revert_creds().  Nice 
and quick.  Note that use_creds(fd) is used if I have cached the fd/token from 
switch_creds().  Also nice and quick.

Given that I've done the checking in switch_creds and I don't pass anything 
back other than the token/fd and this token/fd is/will be confined to the 
thread group, use_creds(fd) only needs to validate that it is a switch_creds 
one, not from an arbitrary open().  I do this.

I have focused this down to "fsuid" because I intend this for ops that file 
perms.  Other stuff is out of scope unless someone can come up with a use case 
and add flag defs...  The other variations on the theme uid, euid, and that 
other one I don't understand the use for, are for long lasting creds change, 
i.e. become user "bob" and go off an fork a shell...  I am wrapping I/O.

I do not like the idea of spinning off a separate proc to do creds work.  It 
doesn't buy anything in performance (everybody is a task to the kernel) but it 
does open a door to scary places.  Jeremy and I agree that this token/fd must 
stay within the thread group, aka, process.  I have already (in the newer 
patchset) tied off inheritance by opening the anon fd with close-on-exec.  I 
think I tied off passing the fd thru a unix socket but if not, I will in my 
next submission.  This fd/token should stay within the thread group, period.

As to the "get an fd and then do set*id", you have to do this twice because 
that fd gets the creds at the time of open, not after fooling around.  I am 
trying to avoid multiple RCU cycles, not add more.  Second, the above path 
makes the creds switch atomic because I use the creds swapping infrastructure. 
Popping back up to user space before that *all* happens opens all kinds of 
ptrace+signal+??? holes.

Note also that I mask caps as well in the override creds including the caps to 
do things like set*id.  That is intentional.  This whole idea is to constrain 
the thread, not open a door *but* still provide a way to get back home 
(safely).  That is via use_creds(-1).

Some have proposed that we personalize a worker thread (the rpc op processor) 
to the creds of the client user right off.  Ganesha only needs to do this user 
creds work for VFS (kernel local) filesystems.  Most of our cluster filesystems 
have apis that allow us to pass creds directly on calls.  We only need this 
for that local mounted namespace.  The server core owns rpc and ops, the 
backend (FSAL) owns the shim layer.  User creds are backend...  Having a 
personalized thread complicates the core.

As I mentioned at LSF, I have another set pending that needs a bit more polish 
to answer issues from the last cycle.  I need to fix the issue of handling 
syscalls that would do their own creds swapping inside the switch_creds ... 
use_creds(-1) region.  The patch causes these syscalls, e.g. setuid() to 
EPERM.  Again, I like this because I want the client creds impersonating 
thread to only be able to do I/O, not escape into the wild.

Jim
On Thursday, March 27, 2014 11:26:17 Jeremy Allison wrote:
> On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote:
> > On Thu, 27 Mar 2014 14:06:32 +0100

Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 11:56 AM, Jeremy Allison  wrote:
> On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote:
>> On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison  wrote:
>> >
>> > Amen to that :-).
>> >
>> > However, after talking with Jeff and Jim at CollabSummit,
>> > I was 'encouraged' to make my opinions known on the list.
>> >
>> > To me, calling the creds handle a file descriptor just
>> > feels wrong. IT *isn't* an fd, you can't read/write/poll
>> > on it, and it's only done as a convenience to get the
>> > close-on-exec semantics and the fact that the creds are
>> > already hung off the fd's in kernel space.
>>
>> Windows calls these things "handles."  Linux has "file descriptors,"
>> and there's plenty of precedent for things that aren't files.
>
> Sure, but there's a set of expectations around
> fd's that these things don't satisfy - IO-ops.

eventfd, timerfd, and signalfd barely satisfy those.  namespace fds
don't satisfy those expectations at all.  And /proc/pid/fd is really
quite useful for debugging.

>
>> > That way we can also make it clear this thing only has
>> > meaning to a thread group, and SHOULD NOT (and indeed
>> > preferably CAN NOT) be passed between processes.
>> >
>>
>> If you want those semantics, then stick a struct pid * in there for
>> the tgid of the cretor and make sure that current's tgid matches when
>> you try to use it.
>>
>> I think they'd be more useful without that check, though.
>
> I'm more worried about leakage and unintended consequences
> here.
>
>> BTW, what do you want to have happen on fork?  I think they should keep 
>> working.
>
> Yeah, that's true. I want them to keep
> working across fork, but not across exec
> or any other method of fd-passing.
>

This seems like an unfortunate restriction to put in the kernel to
prevent userspace from shooting itself in the foot.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeremy Allison
On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote:
> On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison  wrote:
> >
> > Amen to that :-).
> >
> > However, after talking with Jeff and Jim at CollabSummit,
> > I was 'encouraged' to make my opinions known on the list.
> >
> > To me, calling the creds handle a file descriptor just
> > feels wrong. IT *isn't* an fd, you can't read/write/poll
> > on it, and it's only done as a convenience to get the
> > close-on-exec semantics and the fact that the creds are
> > already hung off the fd's in kernel space.
> 
> Windows calls these things "handles."  Linux has "file descriptors,"
> and there's plenty of precedent for things that aren't files.

Sure, but there's a set of expectations around
fd's that these things don't satisfy - IO-ops.

> > That way we can also make it clear this thing only has
> > meaning to a thread group, and SHOULD NOT (and indeed
> > preferably CAN NOT) be passed between processes.
> >
> 
> If you want those semantics, then stick a struct pid * in there for
> the tgid of the cretor and make sure that current's tgid matches when
> you try to use it.
> 
> I think they'd be more useful without that check, though.

I'm more worried about leakage and unintended consequences
here.

> BTW, what do you want to have happen on fork?  I think they should keep 
> working.

Yeah, that's true. I want them to keep
working across fork, but not across exec
or any other method of fd-passing.

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison  wrote:
>
> Amen to that :-).
>
> However, after talking with Jeff and Jim at CollabSummit,
> I was 'encouraged' to make my opinions known on the list.
>
> To me, calling the creds handle a file descriptor just
> feels wrong. IT *isn't* an fd, you can't read/write/poll
> on it, and it's only done as a convenience to get the
> close-on-exec semantics and the fact that the creds are
> already hung off the fd's in kernel space.

Windows calls these things "handles."  Linux has "file descriptors,"
and there's plenty of precedent for things that aren't files.

>
> I'd rather any creads call use a different type, even if
> it's a typedef of 'int -> creds_handle_t', just to make
> it really clear it's *NOT* an fd.
>
> That way we can also make it clear this thing only has
> meaning to a thread group, and SHOULD NOT (and indeed
> preferably CAN NOT) be passed between processes.
>

If you want those semantics, then stick a struct pid * in there for
the tgid of the cretor and make sure that current's tgid matches when
you try to use it.

I think they'd be more useful without that check, though.

BTW, what do you want to have happen on fork?  I think they should keep working.

> Cheers,
>
> Jeremy.



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeremy Allison
On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote:
> On Thu, 27 Mar 2014 14:06:32 +0100
> Florian Weimer  wrote:
> 
> > On 03/27/2014 02:02 PM, Jeff Layton wrote:
> > 
> > >> This interface does not address the long-term lack of POSIX
> > >> compliance in setuid and friends, which are required to be
> > >> process-global and not thread-specific (as they are on the kernel
> > >> side).
> > >>
> > >> glibc works around this by reserving a signal and running set*id on
> > >> every thread in a special signal handler.  This is just crass, and
> > >> it is likely impossible to restore the original process state in
> > >> case of partial failure.  We really need kernel support to perform
> > >> the process-wide switch in an all-or-nothing manner.
> > >>
> > >
> > > I disagree. We're treading new ground here with this syscall. It's
> > > not defined by POSIX so we're under no obligation to follow its
> > > silly directives in this regard. Per-process cred switching doesn't
> > > really make much sense in this day and age, IMO. Wasn't part of the
> > > spec was written before threading existed
> > 
> > Okay, then we need to add a separate set of system calls.
> > 
> > I really, really want to get rid of that signal handler mess in
> > glibc, with its partial failures.
> > 
> 
> I agree, it's a hack, but hardly anyone these days really wants to
> switch creds on a per-process basis. It's just that we're saddled with
> a spec for those calls that was written before threads really existed.
> 
> The kernel syscalls already do the right thing as far as I'm concerned.
> What would be nice however is a blessed glibc interface to them
> that didn't involve all of the signal handling stuff. Then samba et. al.
> wouldn't need to call syscall() directly to get at them.

Amen to that :-).

However, after talking with Jeff and Jim at CollabSummit,
I was 'encouraged' to make my opinions known on the list.

To me, calling the creds handle a file descriptor just
feels wrong. IT *isn't* an fd, you can't read/write/poll
on it, and it's only done as a convenience to get the
close-on-exec semantics and the fact that the creds are
already hung off the fd's in kernel space.

I'd rather any creads call use a different type, even if
it's a typedef of 'int -> creds_handle_t', just to make
it really clear it's *NOT* an fd.

That way we can also make it clear this thing only has
meaning to a thread group, and SHOULD NOT (and indeed
preferably CAN NOT) be passed between processes.

Cheers,

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 8:41 AM, Florian Weimer  wrote:
> On 03/27/2014 02:01 AM, Andy Lutomirski wrote:
>
>> Essentially, it's a performance problem.  knfsd has override_creds,
>> and it can cache struct cred.  But userspace doing the same thing
>> (i.e. impersonating a user) has to do setresuid, setresgid, and
>> setgroups, which kills performance, since it results in something like
>> five RCU callbacks per impersonation round-trip.
>
>
> Do you mean setfsuid instead of setresuid?

No, but I should have given context.  setfsuid sucks, since its return
value is garbage.  It's also pointless nowadays, since setresuid can
be used to sanely change only the effective uid, and doing so should
be secure.

See: http://article.gmane.org/gmane.linux.kernel/1540880

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Florian Weimer

On 03/27/2014 02:01 AM, Andy Lutomirski wrote:


Essentially, it's a performance problem.  knfsd has override_creds,
and it can cache struct cred.  But userspace doing the same thing
(i.e. impersonating a user) has to do setresuid, setresgid, and
setgroups, which kills performance, since it results in something like
five RCU callbacks per impersonation round-trip.


Do you mean setfsuid instead of setresuid?

--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeff Layton
On Wed, 26 Mar 2014 20:25:35 -0700
Jeff Layton  wrote:

> On Wed, 26 Mar 2014 20:05:16 -0700
> Andy Lutomirski  wrote:
> 
> > On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton 
> > wrote:
> > > On Wed, 26 Mar 2014 17:23:24 -0700
> > > Andy Lutomirski  wrote:
> > >
> > >> Hi various people who care about user-space NFS servers and/or
> > >> security-relevant APIs.
> > >>
> > >> I propose the following set of new syscalls:
> > >>
> > >> int credfd_create(unsigned int flags): returns a new credfd that
> > >> corresponds to current's creds.
> > >>
> > >> int credfd_activate(int fd, unsigned int flags): Change current's
> > >> creds to match the creds stored in fd.  To be clear, this changes
> > >> both the "subjective" and "objective" (aka real_cred and cred)
> > >> because there aren't any real semantics for what happens when
> > >> userspace code runs with real_cred != cred.
> > >>
> > >> Rules:
> > >>
> > >>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
> > >>  - credfd_activate fails (-EPERM) if the fd's userns doesn't
> > >> match current's userns.  credfd_activate is not intended to be a
> > >> substitute for setns.
> > >>  - credfd_activate will fail (-EPERM) if LSM does not allow the
> > >> switch.  This probably needs to be a new selinux action --
> > >> dyntransition is too restrictive.
> > >>
> > >>
> > >> Optional:
> > >>  - credfd_create always sets cloexec, because the alternative is
> > >> silly.
> > >>  - credfd_activate fails (-EINVAL) if dumpable.  This is because
> > >> we don't want a privileged daemon to be ptraced while
> > >> impersonating someone else.
> > >>  - optional: both credfd_create and credfd_activate fail if
> > >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
> > >>
> > >> The first question: does this solve Ganesha's problem?
> > >>
> > >> The second question: is this safe?  I can see two major concerns.
> > >> The bigger concern is that having these syscalls available will
> > >> allow users to exploit things that were previously secure.  For
> > >> example, maybe some configuration assumes that a task running as
> > >> uid==1 can't switch to uid==2, even with uid 2's consent.
> > >> Similar issues happen with capabilities.  If CAP_SYS_ADMIN is not
> > >> required, then this is no longer really true.
> > >>
> > >> Alternatively, something running as uid == 0 with heavy
> > >> capability restrictions in a mount namespace (but not a uid
> > >> namespace) could pass a credfd out of the namespace.  This could
> > >> break things like Docker pretty badly.  CAP_SYS_ADMIN guards
> > >> against this to some extent.  But I think that Docker is already
> > >> totally screwed if a Docker root task can receive an O_DIRECTORY
> > >> or O_PATH fd out of the container, so it's not entirely clear
> > >> that the situation is any worse, even without requiring
> > >> CAP_SYS_ADMIN.
> > >>
> > >> The second concern is that it may be difficult to use this
> > >> correctly. There's a reason that real_cred and cred exist, but
> > >> it's not really well set up for being used.
> > >>
> > >> As a simple way to stay safe, Ganesha could only use credfds that
> > >> have real_uid == 0.
> > >>
> > >> --Andy
> > >
> > >
> > > I still don't quite grok why having this special credfd_create
> > > call buys you anything over simply doing what Al had originally
> > > suggested -- switch creds using all of the different syscalls and
> > > then simply caching that in a "normal" fd:
> > >
> > > fd = open("/dev/null", O_PATH...);
> > >
> > > ...it seems to me that the credfd_activate call will still need to
> > > do the same permission checking that all of the individual
> > > set*id() calls require (and all of the other stuff like changing
> > > selinux contexts, etc).
> > >
> > > IOW, this fd is just a "handle" for passing around a struct cred,
> > > but I don't see why having access to that handle would allow you
> > > to do something you couldn't already do anyway.
> > >
> > > Am I missing something obvious here?
> > 
> > Not really.  I think I didn't adequately explain a piece of this.
> > 
> > I think that what you're suggesting is for an fd to encode a set of
> > credentials but not to grant permission to use those credentials.
> > So switch_creds(fd) is more or less the same thing as
> > switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac
> > label, ...).  switch_creds needs to verify that the caller can
> > dyntransition to the label, set all the ids, etc., but it avoids
> > allocating anything and running RCU callbacks.
> > 
> > The trouble with this is that the verification needed is complicated
> > and expensive.  And I think that my proposal is potentially more
> > useful.
> > 
> 
> Is it really though? My understanding of the problem was that it was
> the syscall (context switching) overhead + having to do a bunch of RCU
> critical stuff that was the problem. If we can do all of this in the
> context of a single RCU critical section, isn't that still a win?
> 
> 

Re: Thoughts on credential switching

2014-03-27 Thread Jeff Layton
On Thu, 27 Mar 2014 14:06:32 +0100
Florian Weimer  wrote:

> On 03/27/2014 02:02 PM, Jeff Layton wrote:
> 
> >> This interface does not address the long-term lack of POSIX
> >> compliance in setuid and friends, which are required to be
> >> process-global and not thread-specific (as they are on the kernel
> >> side).
> >>
> >> glibc works around this by reserving a signal and running set*id on
> >> every thread in a special signal handler.  This is just crass, and
> >> it is likely impossible to restore the original process state in
> >> case of partial failure.  We really need kernel support to perform
> >> the process-wide switch in an all-or-nothing manner.
> >>
> >
> > I disagree. We're treading new ground here with this syscall. It's
> > not defined by POSIX so we're under no obligation to follow its
> > silly directives in this regard. Per-process cred switching doesn't
> > really make much sense in this day and age, IMO. Wasn't part of the
> > spec was written before threading existed
> 
> Okay, then we need to add a separate set of system calls.
> 
> I really, really want to get rid of that signal handler mess in
> glibc, with its partial failures.
> 

I agree, it's a hack, but hardly anyone these days really wants to
switch creds on a per-process basis. It's just that we're saddled with
a spec for those calls that was written before threads really existed.

The kernel syscalls already do the right thing as far as I'm concerned.
What would be nice however is a blessed glibc interface to them
that didn't involve all of the signal handling stuff. Then samba et. al.
wouldn't need to call syscall() directly to get at them.

> > The per-process credential switching is pretty universally a pain in
> > the ass for anyone who wants to write something like a threaded file
> > server. Jeremy Allison had to jump through some rather major hoops
> > to work around it for Samba [1]. I think we want to strive to make
> > this a per-task credential switch and ignore that part of the POSIX
> > spec.
> 
> Yeah, I get that, setfsuid/setfsgid already behaves this way.
> 
> (Current directory and umask are equally problematic, but it's
> possible to avoid most issues.)
> 
> > That said, I think we will need to understand and document what we
> > expect to occur if someone does this new switch_creds(fd) call and
> > then subsequently calls something like setuid(), if only to ensure
> > that we don't get blindsided by it.
> 
> Currently, from the kernel perspective, this is not really a problem 
> because the credentials are always per-task.  It's just that a 
> conforming user space needs the process-wide credentials.
> 

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Boaz Harrosh
On 03/27/2014 03:06 PM, Florian Weimer wrote:
> On 03/27/2014 02:02 PM, Jeff Layton wrote:
> 
>>> This interface does not address the long-term lack of POSIX
>>> compliance in setuid and friends, which are required to be
>>> process-global and not thread-specific (as they are on the kernel
>>> side).
>>>
>>> glibc works around this by reserving a signal and running set*id on
>>> every thread in a special signal handler.  This is just crass, and it
>>> is likely impossible to restore the original process state in case of
>>> partial failure.  We really need kernel support to perform the
>>> process-wide switch in an all-or-nothing manner.
>>>
>>
>> I disagree. We're treading new ground here with this syscall. It's
>> not defined by POSIX so we're under no obligation to follow its silly
>> directives in this regard. Per-process cred switching doesn't really
>> make much sense in this day and age, IMO. Wasn't part of the spec was
>> written before threading existed
> 
> Okay, then we need to add a separate set of system calls.
> 

separate set of system calls is exactly what we do not need. The syscalls
are all per thread.

What we do need is a new set of glibc APIs to follow what everyone
who cares does now. Because no body uses the current code - setuid
and friends - The huge majority of utilities are just single threaded
and could careless. The few threaded apps that do use setuid and friends
use the SYSCALL directly.

Both samba and Ganesha have their own define of setthread_uid/setthread_gid/
setthread_groops. and avoid glibc as a plague.

man setuid should be saying DEPRECATED, EMULATED and SIGNAL NOT SAFE
and be done with it POSIX or no POSIX who cares?

and a new well define set is to be defined by glibc implemented easily
not only on Linux (MACOX FreeBSD are the same as linux)

And that is that

> I really, really want to get rid of that signal handler mess in glibc, 
> with its partial failures.
> 

I would not spend one second on an API no one uses. As I said most
users are single threaded and could care less. Few threaded users
already do direct syscalls

>> The per-process credential switching is pretty universally a pain in
>> the ass for anyone who wants to write something like a threaded file
>> server. Jeremy Allison had to jump through some rather major hoops to
>> work around it for Samba [1]. I think we want to strive to make this a
>> per-task credential switch and ignore that part of the POSIX spec.
> 
> Yeah, I get that, setfsuid/setfsgid already behaves this way.
> 
> (Current directory and umask are equally problematic, but it's possible 
> to avoid most issues.)
> 
>> That said, I think we will need to understand and document what we
>> expect to occur if someone does this new switch_creds(fd) call and then
>> subsequently calls something like setuid(), if only to ensure that we
>> don't get blindsided by it.
> 
> Currently, from the kernel perspective, this is not really a problem 
> because the credentials are always per-task.  It's just that a 
> conforming user space needs the process-wide credentials.
> 

Please show me one user of that and declare it brain dead.
POSIX or not it just does not have any real programming mining
at all. Plain and simple. Even if you made  process-wide credentials
atomic it would still mean nothing.

That said. I second Jeff's point of Jim's implementation doing
all we really need. It is fast and simple and can give us all
we need and with a single added SYSCALL. The patch is already
out there. I thought it was even considered for inclusion through
Al viro.

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Florian Weimer

On 03/27/2014 02:02 PM, Jeff Layton wrote:


This interface does not address the long-term lack of POSIX
compliance in setuid and friends, which are required to be
process-global and not thread-specific (as they are on the kernel
side).

glibc works around this by reserving a signal and running set*id on
every thread in a special signal handler.  This is just crass, and it
is likely impossible to restore the original process state in case of
partial failure.  We really need kernel support to perform the
process-wide switch in an all-or-nothing manner.



I disagree. We're treading new ground here with this syscall. It's
not defined by POSIX so we're under no obligation to follow its silly
directives in this regard. Per-process cred switching doesn't really
make much sense in this day and age, IMO. Wasn't part of the spec was
written before threading existed


Okay, then we need to add a separate set of system calls.

I really, really want to get rid of that signal handler mess in glibc, 
with its partial failures.



The per-process credential switching is pretty universally a pain in
the ass for anyone who wants to write something like a threaded file
server. Jeremy Allison had to jump through some rather major hoops to
work around it for Samba [1]. I think we want to strive to make this a
per-task credential switch and ignore that part of the POSIX spec.


Yeah, I get that, setfsuid/setfsgid already behaves this way.

(Current directory and umask are equally problematic, but it's possible 
to avoid most issues.)



That said, I think we will need to understand and document what we
expect to occur if someone does this new switch_creds(fd) call and then
subsequently calls something like setuid(), if only to ensure that we
don't get blindsided by it.


Currently, from the kernel perspective, this is not really a problem 
because the credentials are always per-task.  It's just that a 
conforming user space needs the process-wide credentials.


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeff Layton
On Thu, 27 Mar 2014 13:46:06 +0100
Florian Weimer  wrote:

> On 03/27/2014 01:23 AM, Andy Lutomirski wrote:
> 
> > I propose the following set of new syscalls:
> >
> > int credfd_create(unsigned int flags): returns a new credfd that
> > corresponds to current's creds.
> >
> > int credfd_activate(int fd, unsigned int flags): Change current's
> > creds to match the creds stored in fd.  To be clear, this changes
> > both the "subjective" and "objective" (aka real_cred and cred)
> > because there aren't any real semantics for what happens when
> > userspace code runs with real_cred != cred.
> 
> This interface does not address the long-term lack of POSIX
> compliance in setuid and friends, which are required to be
> process-global and not thread-specific (as they are on the kernel
> side).
> 
> glibc works around this by reserving a signal and running set*id on 
> every thread in a special signal handler.  This is just crass, and it
> is likely impossible to restore the original process state in case of 
> partial failure.  We really need kernel support to perform the 
> process-wide switch in an all-or-nothing manner.
> 

I disagree. We're treading new ground here with this syscall. It's
not defined by POSIX so we're under no obligation to follow its silly
directives in this regard. Per-process cred switching doesn't really
make much sense in this day and age, IMO. Wasn't part of the spec was
written before threading existed


The per-process credential switching is pretty universally a pain in
the ass for anyone who wants to write something like a threaded file
server. Jeremy Allison had to jump through some rather major hoops to
work around it for Samba [1]. I think we want to strive to make this a
per-task credential switch and ignore that part of the POSIX spec.

That said, I think we will need to understand and document what we
expect to occur if someone does this new switch_creds(fd) call and then
subsequently calls something like setuid(), if only to ensure that we
don't get blindsided by it.

[1]: http://sourceware.org/ml/libc-help/2012-07/msg4.html
-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Florian Weimer

On 03/27/2014 01:23 AM, Andy Lutomirski wrote:


I propose the following set of new syscalls:

int credfd_create(unsigned int flags): returns a new credfd that
corresponds to current's creds.

int credfd_activate(int fd, unsigned int flags): Change current's
creds to match the creds stored in fd.  To be clear, this changes both
the "subjective" and "objective" (aka real_cred and cred) because
there aren't any real semantics for what happens when userspace code
runs with real_cred != cred.


This interface does not address the long-term lack of POSIX compliance 
in setuid and friends, which are required to be process-global and not 
thread-specific (as they are on the kernel side).


glibc works around this by reserving a signal and running set*id on 
every thread in a special signal handler.  This is just crass, and it is 
likely impossible to restore the original process state in case of 
partial failure.  We really need kernel support to perform the 
process-wide switch in an all-or-nothing manner.


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Florian Weimer

On 03/27/2014 01:23 AM, Andy Lutomirski wrote:


I propose the following set of new syscalls:

int credfd_create(unsigned int flags): returns a new credfd that
corresponds to current's creds.

int credfd_activate(int fd, unsigned int flags): Change current's
creds to match the creds stored in fd.  To be clear, this changes both
the subjective and objective (aka real_cred and cred) because
there aren't any real semantics for what happens when userspace code
runs with real_cred != cred.


This interface does not address the long-term lack of POSIX compliance 
in setuid and friends, which are required to be process-global and not 
thread-specific (as they are on the kernel side).


glibc works around this by reserving a signal and running set*id on 
every thread in a special signal handler.  This is just crass, and it is 
likely impossible to restore the original process state in case of 
partial failure.  We really need kernel support to perform the 
process-wide switch in an all-or-nothing manner.


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeff Layton
On Thu, 27 Mar 2014 13:46:06 +0100
Florian Weimer fwei...@redhat.com wrote:

 On 03/27/2014 01:23 AM, Andy Lutomirski wrote:
 
  I propose the following set of new syscalls:
 
  int credfd_create(unsigned int flags): returns a new credfd that
  corresponds to current's creds.
 
  int credfd_activate(int fd, unsigned int flags): Change current's
  creds to match the creds stored in fd.  To be clear, this changes
  both the subjective and objective (aka real_cred and cred)
  because there aren't any real semantics for what happens when
  userspace code runs with real_cred != cred.
 
 This interface does not address the long-term lack of POSIX
 compliance in setuid and friends, which are required to be
 process-global and not thread-specific (as they are on the kernel
 side).
 
 glibc works around this by reserving a signal and running set*id on 
 every thread in a special signal handler.  This is just crass, and it
 is likely impossible to restore the original process state in case of 
 partial failure.  We really need kernel support to perform the 
 process-wide switch in an all-or-nothing manner.
 

I disagree. We're treading new ground here with this syscall. It's
not defined by POSIX so we're under no obligation to follow its silly
directives in this regard. Per-process cred switching doesn't really
make much sense in this day and age, IMO. Wasn't part of the spec was
written before threading existed


The per-process credential switching is pretty universally a pain in
the ass for anyone who wants to write something like a threaded file
server. Jeremy Allison had to jump through some rather major hoops to
work around it for Samba [1]. I think we want to strive to make this a
per-task credential switch and ignore that part of the POSIX spec.

That said, I think we will need to understand and document what we
expect to occur if someone does this new switch_creds(fd) call and then
subsequently calls something like setuid(), if only to ensure that we
don't get blindsided by it.

[1]: http://sourceware.org/ml/libc-help/2012-07/msg4.html
-- 
Jeff Layton jlay...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Florian Weimer

On 03/27/2014 02:02 PM, Jeff Layton wrote:


This interface does not address the long-term lack of POSIX
compliance in setuid and friends, which are required to be
process-global and not thread-specific (as they are on the kernel
side).

glibc works around this by reserving a signal and running set*id on
every thread in a special signal handler.  This is just crass, and it
is likely impossible to restore the original process state in case of
partial failure.  We really need kernel support to perform the
process-wide switch in an all-or-nothing manner.



I disagree. We're treading new ground here with this syscall. It's
not defined by POSIX so we're under no obligation to follow its silly
directives in this regard. Per-process cred switching doesn't really
make much sense in this day and age, IMO. Wasn't part of the spec was
written before threading existed


Okay, then we need to add a separate set of system calls.

I really, really want to get rid of that signal handler mess in glibc, 
with its partial failures.



The per-process credential switching is pretty universally a pain in
the ass for anyone who wants to write something like a threaded file
server. Jeremy Allison had to jump through some rather major hoops to
work around it for Samba [1]. I think we want to strive to make this a
per-task credential switch and ignore that part of the POSIX spec.


Yeah, I get that, setfsuid/setfsgid already behaves this way.

(Current directory and umask are equally problematic, but it's possible 
to avoid most issues.)



That said, I think we will need to understand and document what we
expect to occur if someone does this new switch_creds(fd) call and then
subsequently calls something like setuid(), if only to ensure that we
don't get blindsided by it.


Currently, from the kernel perspective, this is not really a problem 
because the credentials are always per-task.  It's just that a 
conforming user space needs the process-wide credentials.


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Boaz Harrosh
On 03/27/2014 03:06 PM, Florian Weimer wrote:
 On 03/27/2014 02:02 PM, Jeff Layton wrote:
 
 This interface does not address the long-term lack of POSIX
 compliance in setuid and friends, which are required to be
 process-global and not thread-specific (as they are on the kernel
 side).

 glibc works around this by reserving a signal and running set*id on
 every thread in a special signal handler.  This is just crass, and it
 is likely impossible to restore the original process state in case of
 partial failure.  We really need kernel support to perform the
 process-wide switch in an all-or-nothing manner.


 I disagree. We're treading new ground here with this syscall. It's
 not defined by POSIX so we're under no obligation to follow its silly
 directives in this regard. Per-process cred switching doesn't really
 make much sense in this day and age, IMO. Wasn't part of the spec was
 written before threading existed
 
 Okay, then we need to add a separate set of system calls.
 

separate set of system calls is exactly what we do not need. The syscalls
are all per thread.

What we do need is a new set of glibc APIs to follow what everyone
who cares does now. Because no body uses the current code - setuid
and friends - The huge majority of utilities are just single threaded
and could careless. The few threaded apps that do use setuid and friends
use the SYSCALL directly.

Both samba and Ganesha have their own define of setthread_uid/setthread_gid/
setthread_groops. and avoid glibc as a plague.

man setuid should be saying DEPRECATED, EMULATED and SIGNAL NOT SAFE
and be done with it POSIX or no POSIX who cares?

and a new well define set is to be defined by glibc implemented easily
not only on Linux (MACOX FreeBSD are the same as linux)

And that is that

 I really, really want to get rid of that signal handler mess in glibc, 
 with its partial failures.
 

I would not spend one second on an API no one uses. As I said most
users are single threaded and could care less. Few threaded users
already do direct syscalls

 The per-process credential switching is pretty universally a pain in
 the ass for anyone who wants to write something like a threaded file
 server. Jeremy Allison had to jump through some rather major hoops to
 work around it for Samba [1]. I think we want to strive to make this a
 per-task credential switch and ignore that part of the POSIX spec.
 
 Yeah, I get that, setfsuid/setfsgid already behaves this way.
 
 (Current directory and umask are equally problematic, but it's possible 
 to avoid most issues.)
 
 That said, I think we will need to understand and document what we
 expect to occur if someone does this new switch_creds(fd) call and then
 subsequently calls something like setuid(), if only to ensure that we
 don't get blindsided by it.
 
 Currently, from the kernel perspective, this is not really a problem 
 because the credentials are always per-task.  It's just that a 
 conforming user space needs the process-wide credentials.
 

Please show me one user of that and declare it brain dead.
POSIX or not it just does not have any real programming mining
at all. Plain and simple. Even if you made  process-wide credentials
atomic it would still mean nothing.

That said. I second Jeff's point of Jim's implementation doing
all we really need. It is fast and simple and can give us all
we need and with a single added SYSCALL. The patch is already
out there. I thought it was even considered for inclusion through
Al viro.

Cheers
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeff Layton
On Thu, 27 Mar 2014 14:06:32 +0100
Florian Weimer fwei...@redhat.com wrote:

 On 03/27/2014 02:02 PM, Jeff Layton wrote:
 
  This interface does not address the long-term lack of POSIX
  compliance in setuid and friends, which are required to be
  process-global and not thread-specific (as they are on the kernel
  side).
 
  glibc works around this by reserving a signal and running set*id on
  every thread in a special signal handler.  This is just crass, and
  it is likely impossible to restore the original process state in
  case of partial failure.  We really need kernel support to perform
  the process-wide switch in an all-or-nothing manner.
 
 
  I disagree. We're treading new ground here with this syscall. It's
  not defined by POSIX so we're under no obligation to follow its
  silly directives in this regard. Per-process cred switching doesn't
  really make much sense in this day and age, IMO. Wasn't part of the
  spec was written before threading existed
 
 Okay, then we need to add a separate set of system calls.
 
 I really, really want to get rid of that signal handler mess in
 glibc, with its partial failures.
 

I agree, it's a hack, but hardly anyone these days really wants to
switch creds on a per-process basis. It's just that we're saddled with
a spec for those calls that was written before threads really existed.

The kernel syscalls already do the right thing as far as I'm concerned.
What would be nice however is a blessed glibc interface to them
that didn't involve all of the signal handling stuff. Then samba et. al.
wouldn't need to call syscall() directly to get at them.

  The per-process credential switching is pretty universally a pain in
  the ass for anyone who wants to write something like a threaded file
  server. Jeremy Allison had to jump through some rather major hoops
  to work around it for Samba [1]. I think we want to strive to make
  this a per-task credential switch and ignore that part of the POSIX
  spec.
 
 Yeah, I get that, setfsuid/setfsgid already behaves this way.
 
 (Current directory and umask are equally problematic, but it's
 possible to avoid most issues.)
 
  That said, I think we will need to understand and document what we
  expect to occur if someone does this new switch_creds(fd) call and
  then subsequently calls something like setuid(), if only to ensure
  that we don't get blindsided by it.
 
 Currently, from the kernel perspective, this is not really a problem 
 because the credentials are always per-task.  It's just that a 
 conforming user space needs the process-wide credentials.
 

-- 
Jeff Layton jlay...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeff Layton
On Wed, 26 Mar 2014 20:25:35 -0700
Jeff Layton jlay...@redhat.com wrote:

 On Wed, 26 Mar 2014 20:05:16 -0700
 Andy Lutomirski l...@amacapital.net wrote:
 
  On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton jlay...@redhat.com
  wrote:
   On Wed, 26 Mar 2014 17:23:24 -0700
   Andy Lutomirski l...@amacapital.net wrote:
  
   Hi various people who care about user-space NFS servers and/or
   security-relevant APIs.
  
   I propose the following set of new syscalls:
  
   int credfd_create(unsigned int flags): returns a new credfd that
   corresponds to current's creds.
  
   int credfd_activate(int fd, unsigned int flags): Change current's
   creds to match the creds stored in fd.  To be clear, this changes
   both the subjective and objective (aka real_cred and cred)
   because there aren't any real semantics for what happens when
   userspace code runs with real_cred != cred.
  
   Rules:
  
- credfd_activate fails (-EINVAL) if fd is not a credfd.
- credfd_activate fails (-EPERM) if the fd's userns doesn't
   match current's userns.  credfd_activate is not intended to be a
   substitute for setns.
- credfd_activate will fail (-EPERM) if LSM does not allow the
   switch.  This probably needs to be a new selinux action --
   dyntransition is too restrictive.
  
  
   Optional:
- credfd_create always sets cloexec, because the alternative is
   silly.
- credfd_activate fails (-EINVAL) if dumpable.  This is because
   we don't want a privileged daemon to be ptraced while
   impersonating someone else.
- optional: both credfd_create and credfd_activate fail if
   !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
  
   The first question: does this solve Ganesha's problem?
  
   The second question: is this safe?  I can see two major concerns.
   The bigger concern is that having these syscalls available will
   allow users to exploit things that were previously secure.  For
   example, maybe some configuration assumes that a task running as
   uid==1 can't switch to uid==2, even with uid 2's consent.
   Similar issues happen with capabilities.  If CAP_SYS_ADMIN is not
   required, then this is no longer really true.
  
   Alternatively, something running as uid == 0 with heavy
   capability restrictions in a mount namespace (but not a uid
   namespace) could pass a credfd out of the namespace.  This could
   break things like Docker pretty badly.  CAP_SYS_ADMIN guards
   against this to some extent.  But I think that Docker is already
   totally screwed if a Docker root task can receive an O_DIRECTORY
   or O_PATH fd out of the container, so it's not entirely clear
   that the situation is any worse, even without requiring
   CAP_SYS_ADMIN.
  
   The second concern is that it may be difficult to use this
   correctly. There's a reason that real_cred and cred exist, but
   it's not really well set up for being used.
  
   As a simple way to stay safe, Ganesha could only use credfds that
   have real_uid == 0.
  
   --Andy
  
  
   I still don't quite grok why having this special credfd_create
   call buys you anything over simply doing what Al had originally
   suggested -- switch creds using all of the different syscalls and
   then simply caching that in a normal fd:
  
   fd = open(/dev/null, O_PATH...);
  
   ...it seems to me that the credfd_activate call will still need to
   do the same permission checking that all of the individual
   set*id() calls require (and all of the other stuff like changing
   selinux contexts, etc).
  
   IOW, this fd is just a handle for passing around a struct cred,
   but I don't see why having access to that handle would allow you
   to do something you couldn't already do anyway.
  
   Am I missing something obvious here?
  
  Not really.  I think I didn't adequately explain a piece of this.
  
  I think that what you're suggesting is for an fd to encode a set of
  credentials but not to grant permission to use those credentials.
  So switch_creds(fd) is more or less the same thing as
  switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac
  label, ...).  switch_creds needs to verify that the caller can
  dyntransition to the label, set all the ids, etc., but it avoids
  allocating anything and running RCU callbacks.
  
  The trouble with this is that the verification needed is complicated
  and expensive.  And I think that my proposal is potentially more
  useful.
  
 
 Is it really though? My understanding of the problem was that it was
 the syscall (context switching) overhead + having to do a bunch of RCU
 critical stuff that was the problem. If we can do all of this in the
 context of a single RCU critical section, isn't that still a win?
 
 As to the complicated part...maybe but it doesn't seem like it would
 have to be. We could simply return -EINVAL or something if the old
 struct cred doesn't have fields that match the ones we're replacing
 and that we don't expect to see changed.
 
  A credfd is like a struct cred, but 

Re: Thoughts on credential switching

2014-03-27 Thread Florian Weimer

On 03/27/2014 02:01 AM, Andy Lutomirski wrote:


Essentially, it's a performance problem.  knfsd has override_creds,
and it can cache struct cred.  But userspace doing the same thing
(i.e. impersonating a user) has to do setresuid, setresgid, and
setgroups, which kills performance, since it results in something like
five RCU callbacks per impersonation round-trip.


Do you mean setfsuid instead of setresuid?

--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 8:41 AM, Florian Weimer fwei...@redhat.com wrote:
 On 03/27/2014 02:01 AM, Andy Lutomirski wrote:

 Essentially, it's a performance problem.  knfsd has override_creds,
 and it can cache struct cred.  But userspace doing the same thing
 (i.e. impersonating a user) has to do setresuid, setresgid, and
 setgroups, which kills performance, since it results in something like
 five RCU callbacks per impersonation round-trip.


 Do you mean setfsuid instead of setresuid?

No, but I should have given context.  setfsuid sucks, since its return
value is garbage.  It's also pointless nowadays, since setresuid can
be used to sanely change only the effective uid, and doing so should
be secure.

See: http://article.gmane.org/gmane.linux.kernel/1540880

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeremy Allison
On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote:
 On Thu, 27 Mar 2014 14:06:32 +0100
 Florian Weimer fwei...@redhat.com wrote:
 
  On 03/27/2014 02:02 PM, Jeff Layton wrote:
  
   This interface does not address the long-term lack of POSIX
   compliance in setuid and friends, which are required to be
   process-global and not thread-specific (as they are on the kernel
   side).
  
   glibc works around this by reserving a signal and running set*id on
   every thread in a special signal handler.  This is just crass, and
   it is likely impossible to restore the original process state in
   case of partial failure.  We really need kernel support to perform
   the process-wide switch in an all-or-nothing manner.
  
  
   I disagree. We're treading new ground here with this syscall. It's
   not defined by POSIX so we're under no obligation to follow its
   silly directives in this regard. Per-process cred switching doesn't
   really make much sense in this day and age, IMO. Wasn't part of the
   spec was written before threading existed
  
  Okay, then we need to add a separate set of system calls.
  
  I really, really want to get rid of that signal handler mess in
  glibc, with its partial failures.
  
 
 I agree, it's a hack, but hardly anyone these days really wants to
 switch creds on a per-process basis. It's just that we're saddled with
 a spec for those calls that was written before threads really existed.
 
 The kernel syscalls already do the right thing as far as I'm concerned.
 What would be nice however is a blessed glibc interface to them
 that didn't involve all of the signal handling stuff. Then samba et. al.
 wouldn't need to call syscall() directly to get at them.

Amen to that :-).

However, after talking with Jeff and Jim at CollabSummit,
I was 'encouraged' to make my opinions known on the list.

To me, calling the creds handle a file descriptor just
feels wrong. IT *isn't* an fd, you can't read/write/poll
on it, and it's only done as a convenience to get the
close-on-exec semantics and the fact that the creds are
already hung off the fd's in kernel space.

I'd rather any creads call use a different type, even if
it's a typedef of 'int - creds_handle_t', just to make
it really clear it's *NOT* an fd.

That way we can also make it clear this thing only has
meaning to a thread group, and SHOULD NOT (and indeed
preferably CAN NOT) be passed between processes.

Cheers,

Jeremy.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison j...@samba.org wrote:

 Amen to that :-).

 However, after talking with Jeff and Jim at CollabSummit,
 I was 'encouraged' to make my opinions known on the list.

 To me, calling the creds handle a file descriptor just
 feels wrong. IT *isn't* an fd, you can't read/write/poll
 on it, and it's only done as a convenience to get the
 close-on-exec semantics and the fact that the creds are
 already hung off the fd's in kernel space.

Windows calls these things handles.  Linux has file descriptors,
and there's plenty of precedent for things that aren't files.


 I'd rather any creads call use a different type, even if
 it's a typedef of 'int - creds_handle_t', just to make
 it really clear it's *NOT* an fd.

 That way we can also make it clear this thing only has
 meaning to a thread group, and SHOULD NOT (and indeed
 preferably CAN NOT) be passed between processes.


If you want those semantics, then stick a struct pid * in there for
the tgid of the cretor and make sure that current's tgid matches when
you try to use it.

I think they'd be more useful without that check, though.

BTW, what do you want to have happen on fork?  I think they should keep working.

 Cheers,

 Jeremy.



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeremy Allison
On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote:
 On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison j...@samba.org wrote:
 
  Amen to that :-).
 
  However, after talking with Jeff and Jim at CollabSummit,
  I was 'encouraged' to make my opinions known on the list.
 
  To me, calling the creds handle a file descriptor just
  feels wrong. IT *isn't* an fd, you can't read/write/poll
  on it, and it's only done as a convenience to get the
  close-on-exec semantics and the fact that the creds are
  already hung off the fd's in kernel space.
 
 Windows calls these things handles.  Linux has file descriptors,
 and there's plenty of precedent for things that aren't files.

Sure, but there's a set of expectations around
fd's that these things don't satisfy - IO-ops.

  That way we can also make it clear this thing only has
  meaning to a thread group, and SHOULD NOT (and indeed
  preferably CAN NOT) be passed between processes.
 
 
 If you want those semantics, then stick a struct pid * in there for
 the tgid of the cretor and make sure that current's tgid matches when
 you try to use it.
 
 I think they'd be more useful without that check, though.

I'm more worried about leakage and unintended consequences
here.

 BTW, what do you want to have happen on fork?  I think they should keep 
 working.

Yeah, that's true. I want them to keep
working across fork, but not across exec
or any other method of fd-passing.

Jeremy.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 11:56 AM, Jeremy Allison j...@samba.org wrote:
 On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote:
 On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison j...@samba.org wrote:
 
  Amen to that :-).
 
  However, after talking with Jeff and Jim at CollabSummit,
  I was 'encouraged' to make my opinions known on the list.
 
  To me, calling the creds handle a file descriptor just
  feels wrong. IT *isn't* an fd, you can't read/write/poll
  on it, and it's only done as a convenience to get the
  close-on-exec semantics and the fact that the creds are
  already hung off the fd's in kernel space.

 Windows calls these things handles.  Linux has file descriptors,
 and there's plenty of precedent for things that aren't files.

 Sure, but there's a set of expectations around
 fd's that these things don't satisfy - IO-ops.

eventfd, timerfd, and signalfd barely satisfy those.  namespace fds
don't satisfy those expectations at all.  And /proc/pid/fd is really
quite useful for debugging.


  That way we can also make it clear this thing only has
  meaning to a thread group, and SHOULD NOT (and indeed
  preferably CAN NOT) be passed between processes.
 

 If you want those semantics, then stick a struct pid * in there for
 the tgid of the cretor and make sure that current's tgid matches when
 you try to use it.

 I think they'd be more useful without that check, though.

 I'm more worried about leakage and unintended consequences
 here.

 BTW, what do you want to have happen on fork?  I think they should keep 
 working.

 Yeah, that's true. I want them to keep
 working across fork, but not across exec
 or any other method of fd-passing.


This seems like an unfortunate restriction to put in the kernel to
prevent userspace from shooting itself in the foot.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Thoughts on credential switching

2014-03-27 Thread Jim Lieb
Rather than inline, I'm responding in the context of Jeremy's comments but I 
have to answer others as well.  It is Jeremy after all who said my baby was 
ugly ;).

Jeremy is right about overloading fd.  Maybe I can call it something else 
but an fd (in implementation) has merit because a creds struct hangs off of 
either/or/both task_struct and file but nothing else.  Coming up with a creds 
token struct adds/intros another struct to the kernel that has to justify its 
existence.  This fd points to an anon inode so the resource consumption is 
pretty low and all the paths for managing it are well known and *work*.  I'm 
trying to get across the swamp, not build a new Golden Gate bridge...

As for POSIX, all of the pthreads API was based on CDE threads whose initial 
implementation was on Ultrix (BSD 4.2).  Process wide was assumed because 
utheeads was a user space hack and setuid was process wide because a proc was 
a just that.  Good grief, that kernel was UP...  When OSF/1 appeared, the Mach 
2.5 kernel just carried that forward with its proc + thread(s) model to be 
compatible with the old world.  In other words, we incrementally backed 
ourselves into a corner.  Declaring POSIX now avoids admitting that we didn't 
see all that far into the future.  Enough said.  These calls are *outside* 
POSIX.  Pthreads in 2014 is broken/obsolete.

For the interface, I changed it from what is in the cited lkml below.   It is 
now:

   int switch_creds(struct user_creds *creds);

Behavior is the same except the mux'd syscall idea is gone.  Adding a flags arg 
to this is a useful idea both for current use and future proofing the API.  But 
this requires a second call

   int use_creds(int fd);

This call does the use an fd case but adds -1 to revert to real creds.  Its 
guts are either an override_creds(fd-file-creds) or a revert_creds().  Nice 
and quick.  Note that use_creds(fd) is used if I have cached the fd/token from 
switch_creds().  Also nice and quick.

Given that I've done the checking in switch_creds and I don't pass anything 
back other than the token/fd and this token/fd is/will be confined to the 
thread group, use_creds(fd) only needs to validate that it is a switch_creds 
one, not from an arbitrary open().  I do this.

I have focused this down to fsuid because I intend this for ops that file 
perms.  Other stuff is out of scope unless someone can come up with a use case 
and add flag defs...  The other variations on the theme uid, euid, and that 
other one I don't understand the use for, are for long lasting creds change, 
i.e. become user bob and go off an fork a shell...  I am wrapping I/O.

I do not like the idea of spinning off a separate proc to do creds work.  It 
doesn't buy anything in performance (everybody is a task to the kernel) but it 
does open a door to scary places.  Jeremy and I agree that this token/fd must 
stay within the thread group, aka, process.  I have already (in the newer 
patchset) tied off inheritance by opening the anon fd with close-on-exec.  I 
think I tied off passing the fd thru a unix socket but if not, I will in my 
next submission.  This fd/token should stay within the thread group, period.

As to the get an fd and then do set*id, you have to do this twice because 
that fd gets the creds at the time of open, not after fooling around.  I am 
trying to avoid multiple RCU cycles, not add more.  Second, the above path 
makes the creds switch atomic because I use the creds swapping infrastructure. 
Popping back up to user space before that *all* happens opens all kinds of 
ptrace+signal+??? holes.

Note also that I mask caps as well in the override creds including the caps to 
do things like set*id.  That is intentional.  This whole idea is to constrain 
the thread, not open a door *but* still provide a way to get back home 
(safely).  That is via use_creds(-1).

Some have proposed that we personalize a worker thread (the rpc op processor) 
to the creds of the client user right off.  Ganesha only needs to do this user 
creds work for VFS (kernel local) filesystems.  Most of our cluster filesystems 
have apis that allow us to pass creds directly on calls.  We only need this 
for that local mounted namespace.  The server core owns rpc and ops, the 
backend (FSAL) owns the shim layer.  User creds are backend...  Having a 
personalized thread complicates the core.

As I mentioned at LSF, I have another set pending that needs a bit more polish 
to answer issues from the last cycle.  I need to fix the issue of handling 
syscalls that would do their own creds swapping inside the switch_creds ... 
use_creds(-1) region.  The patch causes these syscalls, e.g. setuid() to 
EPERM.  Again, I like this because I want the client creds impersonating 
thread to only be able to do I/O, not escape into the wild.

Jim
On Thursday, March 27, 2014 11:26:17 Jeremy Allison wrote:
 On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote:
  On Thu, 27 Mar 2014 14:06:32 +0100
  
  Florian 

Re: Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb jl...@panasas.com wrote:
 Rather than inline, I'm responding in the context of Jeremy's comments but I
 have to answer others as well.  It is Jeremy after all who said my baby was
 ugly ;).

 Jeremy is right about overloading fd.  Maybe I can call it something else
 but an fd (in implementation) has merit because a creds struct hangs off of
 either/or/both task_struct and file but nothing else.  Coming up with a creds
 token struct adds/intros another struct to the kernel that has to justify its
 existence.  This fd points to an anon inode so the resource consumption is
 pretty low and all the paths for managing it are well known and *work*.  I'm
 trying to get across the swamp, not build a new Golden Gate bridge...

 As for POSIX, all of the pthreads API was based on CDE threads whose initial
 implementation was on Ultrix (BSD 4.2).  Process wide was assumed because
 utheeads was a user space hack and setuid was process wide because a proc was
 a just that.  Good grief, that kernel was UP...  When OSF/1 appeared, the Mach
 2.5 kernel just carried that forward with its proc + thread(s) model to be
 compatible with the old world.  In other words, we incrementally backed
 ourselves into a corner.  Declaring POSIX now avoids admitting that we didn't
 see all that far into the future.  Enough said.  These calls are *outside*
 POSIX.  Pthreads in 2014 is broken/obsolete.

 For the interface, I changed it from what is in the cited lkml below.   It is
 now:

int switch_creds(struct user_creds *creds);

What is struct user_creds?  And why is this called switch_creds?  It
doesn't switch anything.


 Behavior is the same except the mux'd syscall idea is gone.  Adding a flags 
 arg
 to this is a useful idea both for current use and future proofing the API.  
 But
 this requires a second call

int use_creds(int fd);

 This call does the use an fd case but adds -1 to revert to real creds.  Its
 guts are either an override_creds(fd-file-creds) or a revert_creds().  Nice
 and quick.  Note that use_creds(fd) is used if I have cached the fd/token from
 switch_creds().  Also nice and quick.

 Given that I've done the checking in switch_creds and I don't pass anything
 back other than the token/fd and this token/fd is/will be confined to the
 thread group, use_creds(fd) only needs to validate that it is a switch_creds
 one, not from an arbitrary open().  I do this.

Not so fast...  what if you start privileged, create a cred fd, call
unshare, do a dyntransition, chroot, drop privileges, and call
use_creds?  I don't immediately see why this is insecure, but having
it be secure seems to be more or less the same condition as having my
credfd_activate be secure.

And I still don't see why you need revert at all.  Just create a
second token/fd/whatever representing your initial creds and switch to
that.


 I have focused this down to fsuid because I intend this for ops that file
 perms.  Other stuff is out of scope unless someone can come up with a use case
 and add flag defs...  The other variations on the theme uid, euid, and that
 other one I don't understand the use for, are for long lasting creds change,
 i.e. become user bob and go off an fork a shell...  I am wrapping I/O.

Isn't there euid for that?


 I do not like the idea of spinning off a separate proc to do creds work.  It
 doesn't buy anything in performance (everybody is a task to the kernel) but it
 does open a door to scary places.  Jeremy and I agree that this token/fd must
 stay within the thread group, aka, process.  I have already (in the newer
 patchset) tied off inheritance by opening the anon fd with close-on-exec.  I
 think I tied off passing the fd thru a unix socket but if not, I will in my
 next submission.  This fd/token should stay within the thread group, period.

Maybe I'm uniquely not scared of adding sane interfaces.  setuid(2) is
insane.  Impersonating a token is quite sane and even has lots of
prior art.


 As to the get an fd and then do set*id, you have to do this twice because
 that fd gets the creds at the time of open, not after fooling around.  I am
 trying to avoid multiple RCU cycles, not add more.  Second, the above path
 makes the creds switch atomic because I use the creds swapping infrastructure.
 Popping back up to user space before that *all* happens opens all kinds of
 ptrace+signal+??? holes.

I assume you're planning on caching these things.  So spending some
cycles setting this thing up shouldn't matter much.

If you want to add a totally separate syscall
setresfsuidgidgroupscaps, be my guest :)  It would actually be
generally useful.


 Note also that I mask caps as well in the override creds including the caps to
 do things like set*id.  That is intentional.  This whole idea is to constrain
 the thread, not open a door *but* still provide a way to get back home
 (safely).  That is via use_creds(-1).

 Some have proposed that we personalize a worker thread (the rpc op 

Re: Re: Re: Thoughts on credential switching

2014-03-27 Thread Jim Lieb
On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote:
 On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb jl...@panasas.com wrote:
  Rather than inline, I'm responding in the context of Jeremy's comments but
  I have to answer others as well.  It is Jeremy after all who said my baby
  was ugly ;).
  
  Jeremy is right about overloading fd.  Maybe I can call it something
  else
  but an fd (in implementation) has merit because a creds struct hangs off
  of
  either/or/both task_struct and file but nothing else.  Coming up with a
  creds token struct adds/intros another struct to the kernel that has to
  justify its existence.  This fd points to an anon inode so the resource
  consumption is pretty low and all the paths for managing it are well
  known and *work*.  I'm trying to get across the swamp, not build a new
  Golden Gate bridge...
  
  As for POSIX, all of the pthreads API was based on CDE threads whose
  initial implementation was on Ultrix (BSD 4.2).  Process wide was assumed
  because utheeads was a user space hack and setuid was process wide
  because a proc was a just that.  Good grief, that kernel was UP...  When
  OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its
  proc + thread(s) model to be compatible with the old world.  In other
  words, we incrementally backed ourselves into a corner.  Declaring POSIX
  now avoids admitting that we didn't see all that far into the future. 
  Enough said.  These calls are *outside* POSIX.  Pthreads in 2014 is
  broken/obsolete.
  
  For the interface, I changed it from what is in the cited lkml below.   It
  is 
  now:
 int switch_creds(struct user_creds *creds);
 
 What is struct user_creds?  And why is this called switch_creds?  It
 doesn't switch anything.

Sorry, it was in the cited lkml... It is:

struct user_creds {
uid_t fsuid;
gid_t fsgid;
int ngroups;
gid_t altgroups[];
};

ngroups is limited by NGROUPS.

It is called switch_creds because it does switch them to the contents of 
creds.  When you return from switch_creds bob your fsuid is Bob.  The 
return value is a handle to those creds so the next time you need to be Bob 
you can just:

 use_creds(bob_creds);

 
  Behavior is the same except the mux'd syscall idea is gone.  Adding a
  flags arg to this is a useful idea both for current use and future
  proofing the API.  But this requires a second call
  
 int use_creds(int fd);
  
  This call does the use an fd case but adds -1 to revert to real creds. 
  Its guts are either an override_creds(fd-file-creds) or a
  revert_creds().  Nice and quick.  Note that use_creds(fd) is used if I
  have cached the fd/token from switch_creds().  Also nice and quick.
  
  Given that I've done the checking in switch_creds and I don't pass
  anything
  back other than the token/fd and this token/fd is/will be confined to the
  thread group, use_creds(fd) only needs to validate that it is a
  switch_creds one, not from an arbitrary open().  I do this.
 
 Not so fast...  what if you start privileged, create a cred fd, call
 unshare, do a dyntransition, chroot, drop privileges, and call
 use_creds?  I don't immediately see why this is insecure, but having
 it be secure seems to be more or less the same condition as having my
 credfd_activate be secure.
 
 And I still don't see why you need revert at all.  Just create a
 second token/fd/whatever representing your initial creds and switch to
 that.

The creds you get have also had the capabilities reduced.  You can't do a 
chroot because it will EPERM.  Same with a lot of others.  This is a 
restricted file access jail in itself.  We *reduce* privs+caps here.

The use_creds has a shortcut of -1 to revert to real creds.  So if you do a 
series of these, how do you get back to your priv'd state?  You can either 
keep track of that yourself or just pass in -1 to use_creds() and know you 
are...
 
  I have focused this down to fsuid because I intend this for ops that
  file
  perms.  Other stuff is out of scope unless someone can come up with a use
  case and add flag defs...  The other variations on the theme uid, euid,
  and that other one I don't understand the use for, are for long lasting
  creds change, i.e. become user bob and go off an fork a shell...  I am
  wrapping I/O.
 Isn't there euid for that?

Right.  A crappy interface given that creds are really more than just euid or 
even egid but it does that deed just fine.  It is for long term as in become 
Dave and run this program...  I don't change that.  I'm dealing with As Bob, 
do a pwrite(s)+fstat across N cores and M ops per core...

 
  I do not like the idea of spinning off a separate proc to do creds work. 
  It doesn't buy anything in performance (everybody is a task to the
  kernel) but it does open a door to scary places.  Jeremy and I agree that
  this token/fd must stay within the thread group, aka, process.  I have
  already (in the newer patchset) tied off inheritance by opening the 

Re: Re: Re: Thoughts on credential switching

2014-03-27 Thread Andy Lutomirski
On Thu, Mar 27, 2014 at 1:47 PM, Jim Lieb jl...@panasas.com wrote:
 On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote:
 On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb jl...@panasas.com wrote:
  Rather than inline, I'm responding in the context of Jeremy's comments but
  I have to answer others as well.  It is Jeremy after all who said my baby
  was ugly ;).
 
  Jeremy is right about overloading fd.  Maybe I can call it something
  else
  but an fd (in implementation) has merit because a creds struct hangs off
  of
  either/or/both task_struct and file but nothing else.  Coming up with a
  creds token struct adds/intros another struct to the kernel that has to
  justify its existence.  This fd points to an anon inode so the resource
  consumption is pretty low and all the paths for managing it are well
  known and *work*.  I'm trying to get across the swamp, not build a new
  Golden Gate bridge...
 
  As for POSIX, all of the pthreads API was based on CDE threads whose
  initial implementation was on Ultrix (BSD 4.2).  Process wide was assumed
  because utheeads was a user space hack and setuid was process wide
  because a proc was a just that.  Good grief, that kernel was UP...  When
  OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its
  proc + thread(s) model to be compatible with the old world.  In other
  words, we incrementally backed ourselves into a corner.  Declaring POSIX
  now avoids admitting that we didn't see all that far into the future.
  Enough said.  These calls are *outside* POSIX.  Pthreads in 2014 is
  broken/obsolete.
 
  For the interface, I changed it from what is in the cited lkml below.   It
  is
  now:
 int switch_creds(struct user_creds *creds);

 What is struct user_creds?  And why is this called switch_creds?  It
 doesn't switch anything.

 Sorry, it was in the cited lkml... It is:

 struct user_creds {
 uid_t fsuid;
 gid_t fsgid;
 int ngroups;
 gid_t altgroups[];
 };

 ngroups is limited by NGROUPS.

 It is called switch_creds because it does switch them to the contents of
 creds.  When you return from switch_creds bob your fsuid is Bob.  The
 return value is a handle to those creds so the next time you need to be Bob
 you can just:

  use_creds(bob_creds);


  Behavior is the same except the mux'd syscall idea is gone.  Adding a
  flags arg to this is a useful idea both for current use and future
  proofing the API.  But this requires a second call
 
 int use_creds(int fd);
 
  This call does the use an fd case but adds -1 to revert to real creds.
  Its guts are either an override_creds(fd-file-creds) or a
  revert_creds().  Nice and quick.  Note that use_creds(fd) is used if I
  have cached the fd/token from switch_creds().  Also nice and quick.
 
  Given that I've done the checking in switch_creds and I don't pass
  anything
  back other than the token/fd and this token/fd is/will be confined to the
  thread group, use_creds(fd) only needs to validate that it is a
  switch_creds one, not from an arbitrary open().  I do this.

 Not so fast...  what if you start privileged, create a cred fd, call
 unshare, do a dyntransition, chroot, drop privileges, and call
 use_creds?  I don't immediately see why this is insecure, but having
 it be secure seems to be more or less the same condition as having my
 credfd_activate be secure.

 And I still don't see why you need revert at all.  Just create a
 second token/fd/whatever representing your initial creds and switch to
 that.

 The creds you get have also had the capabilities reduced.  You can't do a
 chroot because it will EPERM.  Same with a lot of others.  This is a
 restricted file access jail in itself.  We *reduce* privs+caps here.

Why not just leave the capabilities as-is?  The caller will have to
remove CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH from the effective set
first, but that's okay.


 The use_creds has a shortcut of -1 to revert to real creds.  So if you do a
 series of these, how do you get back to your priv'd state?  You can either
 keep track of that yourself or just pass in -1 to use_creds() and know you
 are...

int fully_privileged_cred_fd;


  I have focused this down to fsuid because I intend this for ops that
  file
  perms.  Other stuff is out of scope unless someone can come up with a use
  case and add flag defs...  The other variations on the theme uid, euid,
  and that other one I don't understand the use for, are for long lasting
  creds change, i.e. become user bob and go off an fork a shell...  I am
  wrapping I/O.
 Isn't there euid for that?

 Right.  A crappy interface given that creds are really more than just euid or
 even egid but it does that deed just fine.  It is for long term as in become
 Dave and run this program...  I don't change that.  I'm dealing with As Bob,
 do a pwrite(s)+fstat across N cores and M ops per core...

But euid can do that just fine.  And I think it would be unfortunate
to add these new syscalls in a 

Re: Thoughts on credential switching

2014-03-26 Thread Jeff Layton
On Wed, 26 Mar 2014 20:05:16 -0700
Andy Lutomirski  wrote:

> On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton 
> wrote:
> > On Wed, 26 Mar 2014 17:23:24 -0700
> > Andy Lutomirski  wrote:
> >
> >> Hi various people who care about user-space NFS servers and/or
> >> security-relevant APIs.
> >>
> >> I propose the following set of new syscalls:
> >>
> >> int credfd_create(unsigned int flags): returns a new credfd that
> >> corresponds to current's creds.
> >>
> >> int credfd_activate(int fd, unsigned int flags): Change current's
> >> creds to match the creds stored in fd.  To be clear, this changes
> >> both the "subjective" and "objective" (aka real_cred and cred)
> >> because there aren't any real semantics for what happens when
> >> userspace code runs with real_cred != cred.
> >>
> >> Rules:
> >>
> >>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
> >>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
> >> current's userns.  credfd_activate is not intended to be a
> >> substitute for setns.
> >>  - credfd_activate will fail (-EPERM) if LSM does not allow the
> >> switch.  This probably needs to be a new selinux action --
> >> dyntransition is too restrictive.
> >>
> >>
> >> Optional:
> >>  - credfd_create always sets cloexec, because the alternative is
> >> silly.
> >>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> >> don't want a privileged daemon to be ptraced while impersonating
> >> someone else.
> >>  - optional: both credfd_create and credfd_activate fail if
> >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
> >>
> >> The first question: does this solve Ganesha's problem?
> >>
> >> The second question: is this safe?  I can see two major concerns.
> >> The bigger concern is that having these syscalls available will
> >> allow users to exploit things that were previously secure.  For
> >> example, maybe some configuration assumes that a task running as
> >> uid==1 can't switch to uid==2, even with uid 2's consent.  Similar
> >> issues happen with capabilities.  If CAP_SYS_ADMIN is not
> >> required, then this is no longer really true.
> >>
> >> Alternatively, something running as uid == 0 with heavy capability
> >> restrictions in a mount namespace (but not a uid namespace) could
> >> pass a credfd out of the namespace.  This could break things like
> >> Docker pretty badly.  CAP_SYS_ADMIN guards against this to some
> >> extent.  But I think that Docker is already totally screwed if a
> >> Docker root task can receive an O_DIRECTORY or O_PATH fd out of
> >> the container, so it's not entirely clear that the situation is
> >> any worse, even without requiring CAP_SYS_ADMIN.
> >>
> >> The second concern is that it may be difficult to use this
> >> correctly. There's a reason that real_cred and cred exist, but
> >> it's not really well set up for being used.
> >>
> >> As a simple way to stay safe, Ganesha could only use credfds that
> >> have real_uid == 0.
> >>
> >> --Andy
> >
> >
> > I still don't quite grok why having this special credfd_create call
> > buys you anything over simply doing what Al had originally
> > suggested -- switch creds using all of the different syscalls and
> > then simply caching that in a "normal" fd:
> >
> > fd = open("/dev/null", O_PATH...);
> >
> > ...it seems to me that the credfd_activate call will still need to
> > do the same permission checking that all of the individual set*id()
> > calls require (and all of the other stuff like changing selinux
> > contexts, etc).
> >
> > IOW, this fd is just a "handle" for passing around a struct cred,
> > but I don't see why having access to that handle would allow you to
> > do something you couldn't already do anyway.
> >
> > Am I missing something obvious here?
> 
> Not really.  I think I didn't adequately explain a piece of this.
> 
> I think that what you're suggesting is for an fd to encode a set of
> credentials but not to grant permission to use those credentials.  So
> switch_creds(fd) is more or less the same thing as switch_creds(ruid,
> euid, suid, rgid, egid, sgid, groups, mac label, ...).  switch_creds
> needs to verify that the caller can dyntransition to the label, set
> all the ids, etc., but it avoids allocating anything and running RCU
> callbacks.
> 
> The trouble with this is that the verification needed is complicated
> and expensive.  And I think that my proposal is potentially more
> useful.
> 

Is it really though? My understanding of the problem was that it was
the syscall (context switching) overhead + having to do a bunch of RCU
critical stuff that was the problem. If we can do all of this in the
context of a single RCU critical section, isn't that still a win?

As to the complicated part...maybe but it doesn't seem like it would
have to be. We could simply return -EINVAL or something if the old
struct cred doesn't have fields that match the ones we're replacing and
that we don't expect to see changed.

> A credfd is like a struct cred, 

Re: Thoughts on credential switching

2014-03-26 Thread Andy Lutomirski
On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton  wrote:
> On Wed, 26 Mar 2014 17:23:24 -0700
> Andy Lutomirski  wrote:
>
>> Hi various people who care about user-space NFS servers and/or
>> security-relevant APIs.
>>
>> I propose the following set of new syscalls:
>>
>> int credfd_create(unsigned int flags): returns a new credfd that
>> corresponds to current's creds.
>>
>> int credfd_activate(int fd, unsigned int flags): Change current's
>> creds to match the creds stored in fd.  To be clear, this changes both
>> the "subjective" and "objective" (aka real_cred and cred) because
>> there aren't any real semantics for what happens when userspace code
>> runs with real_cred != cred.
>>
>> Rules:
>>
>>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
>> current's userns.  credfd_activate is not intended to be a substitute
>> for setns.
>>  - credfd_activate will fail (-EPERM) if LSM does not allow the
>> switch.  This probably needs to be a new selinux action --
>> dyntransition is too restrictive.
>>
>>
>> Optional:
>>  - credfd_create always sets cloexec, because the alternative is
>> silly.
>>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
>> don't want a privileged daemon to be ptraced while impersonating
>> someone else.
>>  - optional: both credfd_create and credfd_activate fail if
>> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
>>
>> The first question: does this solve Ganesha's problem?
>>
>> The second question: is this safe?  I can see two major concerns.  The
>> bigger concern is that having these syscalls available will allow
>> users to exploit things that were previously secure.  For example,
>> maybe some configuration assumes that a task running as uid==1 can't
>> switch to uid==2, even with uid 2's consent.  Similar issues happen
>> with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
>> longer really true.
>>
>> Alternatively, something running as uid == 0 with heavy capability
>> restrictions in a mount namespace (but not a uid namespace) could pass
>> a credfd out of the namespace.  This could break things like Docker
>> pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
>> I think that Docker is already totally screwed if a Docker root task
>> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
>> not entirely clear that the situation is any worse, even without
>> requiring CAP_SYS_ADMIN.
>>
>> The second concern is that it may be difficult to use this correctly.
>> There's a reason that real_cred and cred exist, but it's not really
>> well set up for being used.
>>
>> As a simple way to stay safe, Ganesha could only use credfds that have
>> real_uid == 0.
>>
>> --Andy
>
>
> I still don't quite grok why having this special credfd_create call
> buys you anything over simply doing what Al had originally suggested --
> switch creds using all of the different syscalls and then simply caching
> that in a "normal" fd:
>
> fd = open("/dev/null", O_PATH...);
>
> ...it seems to me that the credfd_activate call will still need to do
> the same permission checking that all of the individual set*id() calls
> require (and all of the other stuff like changing selinux contexts,
> etc).
>
> IOW, this fd is just a "handle" for passing around a struct cred, but I
> don't see why having access to that handle would allow you to do
> something you couldn't already do anyway.
>
> Am I missing something obvious here?

Not really.  I think I didn't adequately explain a piece of this.

I think that what you're suggesting is for an fd to encode a set of
credentials but not to grant permission to use those credentials.  So
switch_creds(fd) is more or less the same thing as switch_creds(ruid,
euid, suid, rgid, egid, sgid, groups, mac label, ...).  switch_creds
needs to verify that the caller can dyntransition to the label, set
all the ids, etc., but it avoids allocating anything and running RCU
callbacks.

The trouble with this is that the verification needed is complicated
and expensive.  And I think that my proposal is potentially more
useful.

A credfd is like a struct cred, but possession of a credfd carries the
permission to use those credentials.  So, for example, credfd_activate
to switch to a given uid might work even if setresuid to that uid
would be disallowed.  But, for this to be secure, the act of giving
someone a credfd needs to be explicit.  Programs implicitly send other
programs their credentials by means of f_cred all the time, and they
don't expect to allow the receiver to impersonate them.

credfd has other uses.  A file server, for example, could actually
delegate creation of the credfds to a separate process, and that
process could validate that the request is for a credfd that the file
server really should be able to obtain.  This would enable that
process to make sure that the user in question has actually
authenticated 

Re: Thoughts on credential switching

2014-03-26 Thread Jeff Layton
On Wed, 26 Mar 2014 17:23:24 -0700
Andy Lutomirski  wrote:

> Hi various people who care about user-space NFS servers and/or
> security-relevant APIs.
> 
> I propose the following set of new syscalls:
> 
> int credfd_create(unsigned int flags): returns a new credfd that
> corresponds to current's creds.
> 
> int credfd_activate(int fd, unsigned int flags): Change current's
> creds to match the creds stored in fd.  To be clear, this changes both
> the "subjective" and "objective" (aka real_cred and cred) because
> there aren't any real semantics for what happens when userspace code
> runs with real_cred != cred.
> 
> Rules:
> 
>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
> current's userns.  credfd_activate is not intended to be a substitute
> for setns.
>  - credfd_activate will fail (-EPERM) if LSM does not allow the
> switch.  This probably needs to be a new selinux action --
> dyntransition is too restrictive.
> 
> 
> Optional:
>  - credfd_create always sets cloexec, because the alternative is
> silly.
>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> don't want a privileged daemon to be ptraced while impersonating
> someone else.
>  - optional: both credfd_create and credfd_activate fail if
> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
> 
> The first question: does this solve Ganesha's problem?
> 
> The second question: is this safe?  I can see two major concerns.  The
> bigger concern is that having these syscalls available will allow
> users to exploit things that were previously secure.  For example,
> maybe some configuration assumes that a task running as uid==1 can't
> switch to uid==2, even with uid 2's consent.  Similar issues happen
> with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
> longer really true.
> 
> Alternatively, something running as uid == 0 with heavy capability
> restrictions in a mount namespace (but not a uid namespace) could pass
> a credfd out of the namespace.  This could break things like Docker
> pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
> I think that Docker is already totally screwed if a Docker root task
> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
> not entirely clear that the situation is any worse, even without
> requiring CAP_SYS_ADMIN.
> 
> The second concern is that it may be difficult to use this correctly.
> There's a reason that real_cred and cred exist, but it's not really
> well set up for being used.
> 
> As a simple way to stay safe, Ganesha could only use credfds that have
> real_uid == 0.
> 
> --Andy


I still don't quite grok why having this special credfd_create call
buys you anything over simply doing what Al had originally suggested --
switch creds using all of the different syscalls and then simply caching
that in a "normal" fd:

fd = open("/dev/null", O_PATH...);

...it seems to me that the credfd_activate call will still need to do
the same permission checking that all of the individual set*id() calls
require (and all of the other stuff like changing selinux contexts,
etc).

IOW, this fd is just a "handle" for passing around a struct cred, but I
don't see why having access to that handle would allow you to do
something you couldn't already do anyway.

Am I missing something obvious here?

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-26 Thread Andy Lutomirski
On Wed, Mar 26, 2014 at 5:42 PM, Serge Hallyn  wrote:
> Quoting Andy Lutomirski (l...@amacapital.net):
>> Hi various people who care about user-space NFS servers and/or
>> security-relevant APIs.
>>
>> I propose the following set of new syscalls:
>>
>> int credfd_create(unsigned int flags): returns a new credfd that
>> corresponds to current's creds.
>>
>> int credfd_activate(int fd, unsigned int flags): Change current's
>> creds to match the creds stored in fd.  To be clear, this changes both
>> the "subjective" and "objective" (aka real_cred and cred) because
>> there aren't any real semantics for what happens when userspace code
>> runs with real_cred != cred.
>
> Is there a URL where I can find the motivation, and why the existing
> features can't be used?

It was an LSF talk.  There's this, though:

http://thread.gmane.org/gmane.linux.file-systems/79234

Essentially, it's a performance problem.  knfsd has override_creds,
and it can cache struct cred.  But userspace doing the same thing
(i.e. impersonating a user) has to do setresuid, setresgid, and
setgroups, which kills performance, since it results in something like
five RCU callbacks per impersonation round-trip.

Windows has something called an "impersonation token" that more or
less solves this problem.

>
> My guess would be, uid 10 is root in a container, and you want
> him to be able to send a request to a root daemon on the host, on
> behalf of uid 15 in the container, over which 10 has
> privilege.  (Which is sort of what we need for the cgmanager proxy;
> there we do it by checking checking that 10 is mapped to 0 in
> the requestor's uid_map, and that 15 is mapped in that uid_map)
> The credfd would be useful there, especially combined with a
> credfd_access(credfd, fd, perms) call.

This requires uid 15 to send 10 a credfd, right?

In general, making this same probably requires a way to make it safe
to call credfd_activate on an untrusted credfd.  You don't want to
expose yourself to ptrace or proc attacks from the credential
provider.  Nor do you want to suddenly get hit by rlimits, perhaps.
So maybe there really does need to be a separate subjective and
objective state.  Ugh.

Ganesha can avoid this because the caller of credfd_create is trusted.

>
> But I'd like to hear exactly how nfs and ganesha would use these.

knfsd will presumably not use it.  Ganesha will, and Jim can probably
comment further.  Samba might want to use it, too.

>
> What all would be assiciated with the credfd?  Everything that is
> in the kernel cred?

I assume so.

--Andy

>
>> Rules:
>>
>>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
>> current's userns.  credfd_activate is not intended to be a substitute
>> for setns.
>>  - credfd_activate will fail (-EPERM) if LSM does not allow the
>> switch.  This probably needs to be a new selinux action --
>> dyntransition is too restrictive.
>>
>>
>> Optional:
>>  - credfd_create always sets cloexec, because the alternative is silly.
>>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
>> don't want a privileged daemon to be ptraced while impersonating
>> someone else.
>>  - optional: both credfd_create and credfd_activate fail if
>> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
>>
>> The first question: does this solve Ganesha's problem?
>>
>> The second question: is this safe?  I can see two major concerns.  The
>> bigger concern is that having these syscalls available will allow
>> users to exploit things that were previously secure.  For example,
>> maybe some configuration assumes that a task running as uid==1 can't
>> switch to uid==2, even with uid 2's consent.  Similar issues happen
>> with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
>> longer really true.
>>
>> Alternatively, something running as uid == 0 with heavy capability
>> restrictions in a mount namespace (but not a uid namespace) could pass
>> a credfd out of the namespace.  This could break things like Docker
>> pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
>> I think that Docker is already totally screwed if a Docker root task
>> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
>> not entirely clear that the situation is any worse, even without
>> requiring CAP_SYS_ADMIN.
>>
>> The second concern is that it may be difficult to use this correctly.
>> There's a reason that real_cred and cred exist, but it's not really
>> well set up for being used.
>>
>> As a simple way to stay safe, Ganesha could only use credfds that have
>> real_uid == 0.
>>
>> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-26 Thread Serge Hallyn
Quoting Andy Lutomirski (l...@amacapital.net):
> Hi various people who care about user-space NFS servers and/or
> security-relevant APIs.
> 
> I propose the following set of new syscalls:
> 
> int credfd_create(unsigned int flags): returns a new credfd that
> corresponds to current's creds.
> 
> int credfd_activate(int fd, unsigned int flags): Change current's
> creds to match the creds stored in fd.  To be clear, this changes both
> the "subjective" and "objective" (aka real_cred and cred) because
> there aren't any real semantics for what happens when userspace code
> runs with real_cred != cred.

Is there a URL where I can find the motivation, and why the existing
features can't be used?

My guess would be, uid 10 is root in a container, and you want
him to be able to send a request to a root daemon on the host, on
behalf of uid 15 in the container, over which 10 has
privilege.  (Which is sort of what we need for the cgmanager proxy;
there we do it by checking checking that 10 is mapped to 0 in
the requestor's uid_map, and that 15 is mapped in that uid_map)
The credfd would be useful there, especially combined with a
credfd_access(credfd, fd, perms) call.

But I'd like to hear exactly how nfs and ganesha would use these.

What all would be assiciated with the credfd?  Everything that is
in the kernel cred?

> Rules:
> 
>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
> current's userns.  credfd_activate is not intended to be a substitute
> for setns.
>  - credfd_activate will fail (-EPERM) if LSM does not allow the
> switch.  This probably needs to be a new selinux action --
> dyntransition is too restrictive.
> 
> 
> Optional:
>  - credfd_create always sets cloexec, because the alternative is silly.
>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> don't want a privileged daemon to be ptraced while impersonating
> someone else.
>  - optional: both credfd_create and credfd_activate fail if
> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
> 
> The first question: does this solve Ganesha's problem?
> 
> The second question: is this safe?  I can see two major concerns.  The
> bigger concern is that having these syscalls available will allow
> users to exploit things that were previously secure.  For example,
> maybe some configuration assumes that a task running as uid==1 can't
> switch to uid==2, even with uid 2's consent.  Similar issues happen
> with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
> longer really true.
> 
> Alternatively, something running as uid == 0 with heavy capability
> restrictions in a mount namespace (but not a uid namespace) could pass
> a credfd out of the namespace.  This could break things like Docker
> pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
> I think that Docker is already totally screwed if a Docker root task
> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
> not entirely clear that the situation is any worse, even without
> requiring CAP_SYS_ADMIN.
> 
> The second concern is that it may be difficult to use this correctly.
> There's a reason that real_cred and cred exist, but it's not really
> well set up for being used.
> 
> As a simple way to stay safe, Ganesha could only use credfds that have
> real_uid == 0.
> 
> --Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Thoughts on credential switching

2014-03-26 Thread Andy Lutomirski
Hi various people who care about user-space NFS servers and/or
security-relevant APIs.

I propose the following set of new syscalls:

int credfd_create(unsigned int flags): returns a new credfd that
corresponds to current's creds.

int credfd_activate(int fd, unsigned int flags): Change current's
creds to match the creds stored in fd.  To be clear, this changes both
the "subjective" and "objective" (aka real_cred and cred) because
there aren't any real semantics for what happens when userspace code
runs with real_cred != cred.

Rules:

 - credfd_activate fails (-EINVAL) if fd is not a credfd.
 - credfd_activate fails (-EPERM) if the fd's userns doesn't match
current's userns.  credfd_activate is not intended to be a substitute
for setns.
 - credfd_activate will fail (-EPERM) if LSM does not allow the
switch.  This probably needs to be a new selinux action --
dyntransition is too restrictive.


Optional:
 - credfd_create always sets cloexec, because the alternative is silly.
 - credfd_activate fails (-EINVAL) if dumpable.  This is because we
don't want a privileged daemon to be ptraced while impersonating
someone else.
 - optional: both credfd_create and credfd_activate fail if
!ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).

The first question: does this solve Ganesha's problem?

The second question: is this safe?  I can see two major concerns.  The
bigger concern is that having these syscalls available will allow
users to exploit things that were previously secure.  For example,
maybe some configuration assumes that a task running as uid==1 can't
switch to uid==2, even with uid 2's consent.  Similar issues happen
with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
longer really true.

Alternatively, something running as uid == 0 with heavy capability
restrictions in a mount namespace (but not a uid namespace) could pass
a credfd out of the namespace.  This could break things like Docker
pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
I think that Docker is already totally screwed if a Docker root task
can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
not entirely clear that the situation is any worse, even without
requiring CAP_SYS_ADMIN.

The second concern is that it may be difficult to use this correctly.
There's a reason that real_cred and cred exist, but it's not really
well set up for being used.

As a simple way to stay safe, Ganesha could only use credfds that have
real_uid == 0.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Thoughts on credential switching

2014-03-26 Thread Andy Lutomirski
Hi various people who care about user-space NFS servers and/or
security-relevant APIs.

I propose the following set of new syscalls:

int credfd_create(unsigned int flags): returns a new credfd that
corresponds to current's creds.

int credfd_activate(int fd, unsigned int flags): Change current's
creds to match the creds stored in fd.  To be clear, this changes both
the subjective and objective (aka real_cred and cred) because
there aren't any real semantics for what happens when userspace code
runs with real_cred != cred.

Rules:

 - credfd_activate fails (-EINVAL) if fd is not a credfd.
 - credfd_activate fails (-EPERM) if the fd's userns doesn't match
current's userns.  credfd_activate is not intended to be a substitute
for setns.
 - credfd_activate will fail (-EPERM) if LSM does not allow the
switch.  This probably needs to be a new selinux action --
dyntransition is too restrictive.


Optional:
 - credfd_create always sets cloexec, because the alternative is silly.
 - credfd_activate fails (-EINVAL) if dumpable.  This is because we
don't want a privileged daemon to be ptraced while impersonating
someone else.
 - optional: both credfd_create and credfd_activate fail if
!ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).

The first question: does this solve Ganesha's problem?

The second question: is this safe?  I can see two major concerns.  The
bigger concern is that having these syscalls available will allow
users to exploit things that were previously secure.  For example,
maybe some configuration assumes that a task running as uid==1 can't
switch to uid==2, even with uid 2's consent.  Similar issues happen
with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
longer really true.

Alternatively, something running as uid == 0 with heavy capability
restrictions in a mount namespace (but not a uid namespace) could pass
a credfd out of the namespace.  This could break things like Docker
pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
I think that Docker is already totally screwed if a Docker root task
can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
not entirely clear that the situation is any worse, even without
requiring CAP_SYS_ADMIN.

The second concern is that it may be difficult to use this correctly.
There's a reason that real_cred and cred exist, but it's not really
well set up for being used.

As a simple way to stay safe, Ganesha could only use credfds that have
real_uid == 0.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-26 Thread Serge Hallyn
Quoting Andy Lutomirski (l...@amacapital.net):
 Hi various people who care about user-space NFS servers and/or
 security-relevant APIs.
 
 I propose the following set of new syscalls:
 
 int credfd_create(unsigned int flags): returns a new credfd that
 corresponds to current's creds.
 
 int credfd_activate(int fd, unsigned int flags): Change current's
 creds to match the creds stored in fd.  To be clear, this changes both
 the subjective and objective (aka real_cred and cred) because
 there aren't any real semantics for what happens when userspace code
 runs with real_cred != cred.

Is there a URL where I can find the motivation, and why the existing
features can't be used?

My guess would be, uid 10 is root in a container, and you want
him to be able to send a request to a root daemon on the host, on
behalf of uid 15 in the container, over which 10 has
privilege.  (Which is sort of what we need for the cgmanager proxy;
there we do it by checking checking that 10 is mapped to 0 in
the requestor's uid_map, and that 15 is mapped in that uid_map)
The credfd would be useful there, especially combined with a
credfd_access(credfd, fd, perms) call.

But I'd like to hear exactly how nfs and ganesha would use these.

What all would be assiciated with the credfd?  Everything that is
in the kernel cred?

 Rules:
 
  - credfd_activate fails (-EINVAL) if fd is not a credfd.
  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
 current's userns.  credfd_activate is not intended to be a substitute
 for setns.
  - credfd_activate will fail (-EPERM) if LSM does not allow the
 switch.  This probably needs to be a new selinux action --
 dyntransition is too restrictive.
 
 
 Optional:
  - credfd_create always sets cloexec, because the alternative is silly.
  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
 don't want a privileged daemon to be ptraced while impersonating
 someone else.
  - optional: both credfd_create and credfd_activate fail if
 !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
 
 The first question: does this solve Ganesha's problem?
 
 The second question: is this safe?  I can see two major concerns.  The
 bigger concern is that having these syscalls available will allow
 users to exploit things that were previously secure.  For example,
 maybe some configuration assumes that a task running as uid==1 can't
 switch to uid==2, even with uid 2's consent.  Similar issues happen
 with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
 longer really true.
 
 Alternatively, something running as uid == 0 with heavy capability
 restrictions in a mount namespace (but not a uid namespace) could pass
 a credfd out of the namespace.  This could break things like Docker
 pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
 I think that Docker is already totally screwed if a Docker root task
 can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
 not entirely clear that the situation is any worse, even without
 requiring CAP_SYS_ADMIN.
 
 The second concern is that it may be difficult to use this correctly.
 There's a reason that real_cred and cred exist, but it's not really
 well set up for being used.
 
 As a simple way to stay safe, Ganesha could only use credfds that have
 real_uid == 0.
 
 --Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-26 Thread Andy Lutomirski
On Wed, Mar 26, 2014 at 5:42 PM, Serge Hallyn serge.hal...@ubuntu.com wrote:
 Quoting Andy Lutomirski (l...@amacapital.net):
 Hi various people who care about user-space NFS servers and/or
 security-relevant APIs.

 I propose the following set of new syscalls:

 int credfd_create(unsigned int flags): returns a new credfd that
 corresponds to current's creds.

 int credfd_activate(int fd, unsigned int flags): Change current's
 creds to match the creds stored in fd.  To be clear, this changes both
 the subjective and objective (aka real_cred and cred) because
 there aren't any real semantics for what happens when userspace code
 runs with real_cred != cred.

 Is there a URL where I can find the motivation, and why the existing
 features can't be used?

It was an LSF talk.  There's this, though:

http://thread.gmane.org/gmane.linux.file-systems/79234

Essentially, it's a performance problem.  knfsd has override_creds,
and it can cache struct cred.  But userspace doing the same thing
(i.e. impersonating a user) has to do setresuid, setresgid, and
setgroups, which kills performance, since it results in something like
five RCU callbacks per impersonation round-trip.

Windows has something called an impersonation token that more or
less solves this problem.


 My guess would be, uid 10 is root in a container, and you want
 him to be able to send a request to a root daemon on the host, on
 behalf of uid 15 in the container, over which 10 has
 privilege.  (Which is sort of what we need for the cgmanager proxy;
 there we do it by checking checking that 10 is mapped to 0 in
 the requestor's uid_map, and that 15 is mapped in that uid_map)
 The credfd would be useful there, especially combined with a
 credfd_access(credfd, fd, perms) call.

This requires uid 15 to send 10 a credfd, right?

In general, making this same probably requires a way to make it safe
to call credfd_activate on an untrusted credfd.  You don't want to
expose yourself to ptrace or proc attacks from the credential
provider.  Nor do you want to suddenly get hit by rlimits, perhaps.
So maybe there really does need to be a separate subjective and
objective state.  Ugh.

Ganesha can avoid this because the caller of credfd_create is trusted.


 But I'd like to hear exactly how nfs and ganesha would use these.

knfsd will presumably not use it.  Ganesha will, and Jim can probably
comment further.  Samba might want to use it, too.


 What all would be assiciated with the credfd?  Everything that is
 in the kernel cred?

I assume so.

--Andy


 Rules:

  - credfd_activate fails (-EINVAL) if fd is not a credfd.
  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
 current's userns.  credfd_activate is not intended to be a substitute
 for setns.
  - credfd_activate will fail (-EPERM) if LSM does not allow the
 switch.  This probably needs to be a new selinux action --
 dyntransition is too restrictive.


 Optional:
  - credfd_create always sets cloexec, because the alternative is silly.
  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
 don't want a privileged daemon to be ptraced while impersonating
 someone else.
  - optional: both credfd_create and credfd_activate fail if
 !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).

 The first question: does this solve Ganesha's problem?

 The second question: is this safe?  I can see two major concerns.  The
 bigger concern is that having these syscalls available will allow
 users to exploit things that were previously secure.  For example,
 maybe some configuration assumes that a task running as uid==1 can't
 switch to uid==2, even with uid 2's consent.  Similar issues happen
 with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
 longer really true.

 Alternatively, something running as uid == 0 with heavy capability
 restrictions in a mount namespace (but not a uid namespace) could pass
 a credfd out of the namespace.  This could break things like Docker
 pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
 I think that Docker is already totally screwed if a Docker root task
 can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
 not entirely clear that the situation is any worse, even without
 requiring CAP_SYS_ADMIN.

 The second concern is that it may be difficult to use this correctly.
 There's a reason that real_cred and cred exist, but it's not really
 well set up for being used.

 As a simple way to stay safe, Ganesha could only use credfds that have
 real_uid == 0.

 --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-26 Thread Jeff Layton
On Wed, 26 Mar 2014 17:23:24 -0700
Andy Lutomirski l...@amacapital.net wrote:

 Hi various people who care about user-space NFS servers and/or
 security-relevant APIs.
 
 I propose the following set of new syscalls:
 
 int credfd_create(unsigned int flags): returns a new credfd that
 corresponds to current's creds.
 
 int credfd_activate(int fd, unsigned int flags): Change current's
 creds to match the creds stored in fd.  To be clear, this changes both
 the subjective and objective (aka real_cred and cred) because
 there aren't any real semantics for what happens when userspace code
 runs with real_cred != cred.
 
 Rules:
 
  - credfd_activate fails (-EINVAL) if fd is not a credfd.
  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
 current's userns.  credfd_activate is not intended to be a substitute
 for setns.
  - credfd_activate will fail (-EPERM) if LSM does not allow the
 switch.  This probably needs to be a new selinux action --
 dyntransition is too restrictive.
 
 
 Optional:
  - credfd_create always sets cloexec, because the alternative is
 silly.
  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
 don't want a privileged daemon to be ptraced while impersonating
 someone else.
  - optional: both credfd_create and credfd_activate fail if
 !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
 
 The first question: does this solve Ganesha's problem?
 
 The second question: is this safe?  I can see two major concerns.  The
 bigger concern is that having these syscalls available will allow
 users to exploit things that were previously secure.  For example,
 maybe some configuration assumes that a task running as uid==1 can't
 switch to uid==2, even with uid 2's consent.  Similar issues happen
 with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
 longer really true.
 
 Alternatively, something running as uid == 0 with heavy capability
 restrictions in a mount namespace (but not a uid namespace) could pass
 a credfd out of the namespace.  This could break things like Docker
 pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
 I think that Docker is already totally screwed if a Docker root task
 can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
 not entirely clear that the situation is any worse, even without
 requiring CAP_SYS_ADMIN.
 
 The second concern is that it may be difficult to use this correctly.
 There's a reason that real_cred and cred exist, but it's not really
 well set up for being used.
 
 As a simple way to stay safe, Ganesha could only use credfds that have
 real_uid == 0.
 
 --Andy


I still don't quite grok why having this special credfd_create call
buys you anything over simply doing what Al had originally suggested --
switch creds using all of the different syscalls and then simply caching
that in a normal fd:

fd = open(/dev/null, O_PATH...);

...it seems to me that the credfd_activate call will still need to do
the same permission checking that all of the individual set*id() calls
require (and all of the other stuff like changing selinux contexts,
etc).

IOW, this fd is just a handle for passing around a struct cred, but I
don't see why having access to that handle would allow you to do
something you couldn't already do anyway.

Am I missing something obvious here?

-- 
Jeff Layton jlay...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-26 Thread Andy Lutomirski
On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton jlay...@redhat.com wrote:
 On Wed, 26 Mar 2014 17:23:24 -0700
 Andy Lutomirski l...@amacapital.net wrote:

 Hi various people who care about user-space NFS servers and/or
 security-relevant APIs.

 I propose the following set of new syscalls:

 int credfd_create(unsigned int flags): returns a new credfd that
 corresponds to current's creds.

 int credfd_activate(int fd, unsigned int flags): Change current's
 creds to match the creds stored in fd.  To be clear, this changes both
 the subjective and objective (aka real_cred and cred) because
 there aren't any real semantics for what happens when userspace code
 runs with real_cred != cred.

 Rules:

  - credfd_activate fails (-EINVAL) if fd is not a credfd.
  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
 current's userns.  credfd_activate is not intended to be a substitute
 for setns.
  - credfd_activate will fail (-EPERM) if LSM does not allow the
 switch.  This probably needs to be a new selinux action --
 dyntransition is too restrictive.


 Optional:
  - credfd_create always sets cloexec, because the alternative is
 silly.
  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
 don't want a privileged daemon to be ptraced while impersonating
 someone else.
  - optional: both credfd_create and credfd_activate fail if
 !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).

 The first question: does this solve Ganesha's problem?

 The second question: is this safe?  I can see two major concerns.  The
 bigger concern is that having these syscalls available will allow
 users to exploit things that were previously secure.  For example,
 maybe some configuration assumes that a task running as uid==1 can't
 switch to uid==2, even with uid 2's consent.  Similar issues happen
 with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
 longer really true.

 Alternatively, something running as uid == 0 with heavy capability
 restrictions in a mount namespace (but not a uid namespace) could pass
 a credfd out of the namespace.  This could break things like Docker
 pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
 I think that Docker is already totally screwed if a Docker root task
 can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
 not entirely clear that the situation is any worse, even without
 requiring CAP_SYS_ADMIN.

 The second concern is that it may be difficult to use this correctly.
 There's a reason that real_cred and cred exist, but it's not really
 well set up for being used.

 As a simple way to stay safe, Ganesha could only use credfds that have
 real_uid == 0.

 --Andy


 I still don't quite grok why having this special credfd_create call
 buys you anything over simply doing what Al had originally suggested --
 switch creds using all of the different syscalls and then simply caching
 that in a normal fd:

 fd = open(/dev/null, O_PATH...);

 ...it seems to me that the credfd_activate call will still need to do
 the same permission checking that all of the individual set*id() calls
 require (and all of the other stuff like changing selinux contexts,
 etc).

 IOW, this fd is just a handle for passing around a struct cred, but I
 don't see why having access to that handle would allow you to do
 something you couldn't already do anyway.

 Am I missing something obvious here?

Not really.  I think I didn't adequately explain a piece of this.

I think that what you're suggesting is for an fd to encode a set of
credentials but not to grant permission to use those credentials.  So
switch_creds(fd) is more or less the same thing as switch_creds(ruid,
euid, suid, rgid, egid, sgid, groups, mac label, ...).  switch_creds
needs to verify that the caller can dyntransition to the label, set
all the ids, etc., but it avoids allocating anything and running RCU
callbacks.

The trouble with this is that the verification needed is complicated
and expensive.  And I think that my proposal is potentially more
useful.

A credfd is like a struct cred, but possession of a credfd carries the
permission to use those credentials.  So, for example, credfd_activate
to switch to a given uid might work even if setresuid to that uid
would be disallowed.  But, for this to be secure, the act of giving
someone a credfd needs to be explicit.  Programs implicitly send other
programs their credentials by means of f_cred all the time, and they
don't expect to allow the receiver to impersonate them.

credfd has other uses.  A file server, for example, could actually
delegate creation of the credfds to a separate process, and that
process could validate that the request is for a credfd that the file
server really should be able to obtain.  This would enable that
process to make sure that the user in question has actually
authenticated itself, so a file server compromise could only attack
users who connect instead of attacking any user on the system.  

Re: Thoughts on credential switching

2014-03-26 Thread Jeff Layton
On Wed, 26 Mar 2014 20:05:16 -0700
Andy Lutomirski l...@amacapital.net wrote:

 On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton jlay...@redhat.com
 wrote:
  On Wed, 26 Mar 2014 17:23:24 -0700
  Andy Lutomirski l...@amacapital.net wrote:
 
  Hi various people who care about user-space NFS servers and/or
  security-relevant APIs.
 
  I propose the following set of new syscalls:
 
  int credfd_create(unsigned int flags): returns a new credfd that
  corresponds to current's creds.
 
  int credfd_activate(int fd, unsigned int flags): Change current's
  creds to match the creds stored in fd.  To be clear, this changes
  both the subjective and objective (aka real_cred and cred)
  because there aren't any real semantics for what happens when
  userspace code runs with real_cred != cred.
 
  Rules:
 
   - credfd_activate fails (-EINVAL) if fd is not a credfd.
   - credfd_activate fails (-EPERM) if the fd's userns doesn't match
  current's userns.  credfd_activate is not intended to be a
  substitute for setns.
   - credfd_activate will fail (-EPERM) if LSM does not allow the
  switch.  This probably needs to be a new selinux action --
  dyntransition is too restrictive.
 
 
  Optional:
   - credfd_create always sets cloexec, because the alternative is
  silly.
   - credfd_activate fails (-EINVAL) if dumpable.  This is because we
  don't want a privileged daemon to be ptraced while impersonating
  someone else.
   - optional: both credfd_create and credfd_activate fail if
  !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
 
  The first question: does this solve Ganesha's problem?
 
  The second question: is this safe?  I can see two major concerns.
  The bigger concern is that having these syscalls available will
  allow users to exploit things that were previously secure.  For
  example, maybe some configuration assumes that a task running as
  uid==1 can't switch to uid==2, even with uid 2's consent.  Similar
  issues happen with capabilities.  If CAP_SYS_ADMIN is not
  required, then this is no longer really true.
 
  Alternatively, something running as uid == 0 with heavy capability
  restrictions in a mount namespace (but not a uid namespace) could
  pass a credfd out of the namespace.  This could break things like
  Docker pretty badly.  CAP_SYS_ADMIN guards against this to some
  extent.  But I think that Docker is already totally screwed if a
  Docker root task can receive an O_DIRECTORY or O_PATH fd out of
  the container, so it's not entirely clear that the situation is
  any worse, even without requiring CAP_SYS_ADMIN.
 
  The second concern is that it may be difficult to use this
  correctly. There's a reason that real_cred and cred exist, but
  it's not really well set up for being used.
 
  As a simple way to stay safe, Ganesha could only use credfds that
  have real_uid == 0.
 
  --Andy
 
 
  I still don't quite grok why having this special credfd_create call
  buys you anything over simply doing what Al had originally
  suggested -- switch creds using all of the different syscalls and
  then simply caching that in a normal fd:
 
  fd = open(/dev/null, O_PATH...);
 
  ...it seems to me that the credfd_activate call will still need to
  do the same permission checking that all of the individual set*id()
  calls require (and all of the other stuff like changing selinux
  contexts, etc).
 
  IOW, this fd is just a handle for passing around a struct cred,
  but I don't see why having access to that handle would allow you to
  do something you couldn't already do anyway.
 
  Am I missing something obvious here?
 
 Not really.  I think I didn't adequately explain a piece of this.
 
 I think that what you're suggesting is for an fd to encode a set of
 credentials but not to grant permission to use those credentials.  So
 switch_creds(fd) is more or less the same thing as switch_creds(ruid,
 euid, suid, rgid, egid, sgid, groups, mac label, ...).  switch_creds
 needs to verify that the caller can dyntransition to the label, set
 all the ids, etc., but it avoids allocating anything and running RCU
 callbacks.
 
 The trouble with this is that the verification needed is complicated
 and expensive.  And I think that my proposal is potentially more
 useful.
 

Is it really though? My understanding of the problem was that it was
the syscall (context switching) overhead + having to do a bunch of RCU
critical stuff that was the problem. If we can do all of this in the
context of a single RCU critical section, isn't that still a win?

As to the complicated part...maybe but it doesn't seem like it would
have to be. We could simply return -EINVAL or something if the old
struct cred doesn't have fields that match the ones we're replacing and
that we don't expect to see changed.

 A credfd is like a struct cred, but possession of a credfd carries the
 permission to use those credentials.  So, for example, credfd_activate
 to switch to a given uid might work even if setresuid to that uid
 would be disallowed.